new FamilySearch API

#11

In the new FamilySearch beta, every person is tagged with one or more (for combined records) "Person ID"s that contain letters and digits and a hyphen every four or so characters. Also, when navigating in Family Group Record mode, I have noticed that in the links to go from one family to another, the URL contains a numeric manid and/or womanid parameter or argument.

Is there a translation algorithm to go between the two forms of ID for a person? If so, can it be posted, either in the forum or later when the API is made available?

One example of how it would be useful would be if I'm looking at some paper printouts of ancestors, and I can see the alphanumeric PID, a published algorithm would allow me to go directly to the FGR page for the family, without having to do a search by number, then bring up the pedigree, then switch to FGR mode.

If an algorithm cannot be published, I would suggest it would be helpful to have a way in the API to go between the two.

I put the above suggestions in via "Send Us Feedback".

#12

gordon wrote:FamilySearch API should be released the following release after FamilySearch goes final. The API will include member authentication, read, search, and update all person information. Yes we expect there to be many desktop applications that "sync" with the new FamilySearch using the FamilySearch API. Please contact Gordon Clarke for additional informaiton. clarkegj@ldschurch.org

I had an idea hit me yesterday about the API. I hope its 'sync' capability includes a quick, efficient timestamp for the info about a person or group of person records. One of the main advantages of a local desktop application would be speed gains by caching the info locally rather than having to ask the servers every time. I can envision a user caching info on a few hundred or more persons. That would take a long time if everything had to be downloaded to find out what had changed.

I'm looking forward to seeing more info about the API. I hope to be able to work on a project using it.

tianon · #13

rmrichesjr wrote:I had an idea hit me yesterday about the API. I hope its 'sync' capability includes a quick, efficient timestamp for the info about a person or group of person records. One of the main advantages of a local desktop application would be speed gains by caching the info locally rather than having to ask the servers every time. I can envision a user caching info on a few hundred or more persons. That would take a long time if everything had to be downloaded to find out what had changed.

I'm looking forward to seeing more info about the API. I hope to be able to work on a project using it.

I realize this is a total necro-post (is it really 2011 already?), and I apologize, but I really just had to laugh here. A few hundred or more? We're talking about powers of two. Every time I go back one more generation in my personal tree, the number of people added is double the previous generation (two parents each), not to mention children, and I'd be extremely interested in caching at least 10 generations worth. Ideally, the entire tree could be cached, but that's not necessarily feasible.

Since we're talking about basically plain-text data here, it shouldn't be any kind of an issue to save 20, 30, or even 50 generations worth and still not have any sizable amount of data (well under a gigabyte).

#14

tianon wrote:I realize this is a total necro-post (is it really 2011 already?), and I apologize, but I really just had to laugh here. A few hundred or more? We're talking about powers of two. Every time I go back one more generation in my personal tree, the number of people added is double the previous generation (two parents each), not to mention children, and I'd be extremely interested in caching at least 10 generations worth. Ideally, the entire tree could be cached, but that's not necessarily feasible.

Since we're talking about basically plain-text data here, it shouldn't be any kind of an issue to save 20, 30, or even 50 generations worth and still not have any sizable amount of data (well under a gigabyte).

(Some of the moderators frown on resurrecting old threads, but this one's on topic, so I don't see a problem in this case.)

In the question of how much data it would make sense to cache, issues come up concerning how long a time you want to cache the data, the level of likelihood you are willing to accept that the data may become stale, and how much resource it takes to fetch the data.

I'm pretty confident it would not be practical for the server to track which clients have which data cached, so there's no practical way to do server-push coherency. That leaves it up to the client to re-fetch the data to check for staleness.

The likelihood of any person's cached record being stale increases with time since the data was fetched. The likelihood of there being stale data in a cache increases with the number of records in the cache. On modest time and size scales, I would think both effects would be roughly linear, so the likelihood of a cache having stale data would grow with the product of the time since fetch and the number of person records held.

It takes a lot of resource to fetch data from a large store. One observation is that a program called GetMyAncestors takes several minutes to download summary data on four generations, 400-450 people. Once you go back far enough, the tree becomes a DAG once you become your own distant cousin (same ancestor on different branches), but that effect is still rather small at 10 generations back, so 10 generations would be around 25k-29k people. If fetch time scales linearly, that would take several hours.

Let's say, hypothetically, that nFS store was optimized for fetching summary data (which would be very inadequate for serious work) rather than detailed data. I don't know what database technology nFS is using (other than it has been stated they use Linux as the OS and Java as the programming language), so I'll extrapolate from what I have seen with a hash-based key-value store on a (relatively small) six-drive RAID6. With rotating magnetic disks, the main factor governing fetch speed is disk seeks. I have seen a disk array that can handle 400MB/s sequential access drop to around 2-4MB/s when reading widely scattered small blocks. With the K-V store I worked with, once the index outgrew RAM, lookups dropped to around a thousand lookups per second. Even if you're caching only summary data, and even if we assume there's a couple thousand dollars worth of data center resources per simultaneously active user, it would take a long time to fetch 1GB of data--and that's if the client had enough downstream ISP bandwidth. SSDs will eliminate essentially all the seek time, but SSDs are still rather expensive per TB, and the client ISP bandwidth would still be a bottleneck.

I still think caching a few hundred, or at most a thousand or two, people is much more practical than caching 10 generations, let alone 20-50 generations. Also, I think that if advances in speed would make it practical to cache 10 generations or more, the same advances in speed would make real-time fetch at least fast enough, which would cancel the need for caching.

aebrown · #15

rmrichesjr wrote:I still think caching a few hundred, or at most a thousand or two, people is much more practical than caching 10 generations, let alone 20-50 generations. Also, I think that if advances in speed would make it practical to cache 10 generations or more, the same advances in speed would make real-time fetch at least fast enough, which would cancel the need for caching.

And remember that many people would feel very fortunate to have even a few hundred ancestors. Not everyone has lots of lines that have been researched extensively.

jonesrk · #16

rmrichesjr wrote:That leaves it up to the client to re-fetch the data to check for staleness.

One nice thing in the api is that each person has a version, so a client can just check the version to see if they need to re-fetch the entire person details, or if you still have the current data. I don't know if any of the programs out their are using that info or not.

rmrichesjr wrote: It takes a lot of resource to fetch data from a large store. One observation is that a program called GetMyAncestors takes several minutes to download summary data on four generations, 400-450 people. Once you go back far enough, the tree becomes a DAG once you become your own distant cousin (same ancestor on different branches), but that effect is still rather small at 10 generations back, so 10 generations would be around 25k-29k people. If fetch time scales linearly, that would take several hours.

Most of the time in that kind of a download is due to the throttling put in place from the api. After retrieving a small batch (if I remember correctly you can get larger batches of summary data and smaller batches of detailed data before you hit the throttling) then the app calling the api will be denied for a short time. Within a user interface that wouldn't be visible, but to download larger batches like GetMyAncestors does the pauses become significant.

Tech Forum

new FamilySearch API

Person ID vs. numeric manid and womanid