Page 1 of 1

Mining the Granite Mountain

Posted: Sat Apr 21, 2007 2:55 am
by vwalker
Hi folks. I'm Vic Walker, a new user on this web site, but someone who has been combining computers and genealogy with the Church for some time now.

I attended a meeting at my work the other day, where I saw a presentation on an artificial intelligence application that was designed to combine data from various databases to identify people. The system used various rules to assign scores as to the likelihood of person A being the same as person B. For example, if I have records for Robert Jones, Robert A. Jones, Robert Anson Jones, Robert A. Jones Jr., Bob Jones, Bobby Jones, Rob Jones and Robbie Jones, they may or may not be the same person. But if I knew that some of them had the same social security number, or address, or phone number, it is more likely that they are indeed for the same person. By combining data in various databases, it's possible to get very plausible matches between records.

Now, the application I saw was designed to identify potential health insurance fraud by providers and patients, but as I was watching this presentation, I was thinking to myself, "If we applied this kind of technology to the massive database of records that is being scanned, digitized and indexed from the Church's Granite Mountain archive, we could build a family tree that stretches back to Adam!"

My question is, is anyone thinking about applying this kind of technology to the Church's existing and future databases? And is this something we could discuss at the upcoming Tech Talk in Mountain View next week?

What do you think?

Thanks.

Vic

Posted: Sat Apr 21, 2007 10:01 am
by thedqs
That is a great idea, and even though I don't know if the church is actively trying this, once the API is released we could start a project that searches your family tree and trys to find duplicates on the system and then advise you on them.

newFamilySearch

Posted: Sat Apr 21, 2007 12:23 pm
by garysturn
newFamilySearch does something similar to what you have mentioned. It searches the entire FamilySearch database and finds possible links. The purpose of newFamilySearch is to combine all duplicte informtion to prevent duplication of effort and duplication of Temple work. As new data is added to FamilySearch from the Scanning and from people submitting their research, this program will search it and link it up as possible matches. Users will then compare the data found to determine if it is a match and if it is it can be combined as a source documment. There are a lot of discussions and links for more information in this Forum. Check this thread. And this thread.

Posted: Sat Apr 21, 2007 1:32 pm
by JamesAnderson
Here's yet another idea.

Have an icon (maybe a tree) next to a name in another database that has been 'combined' into a tree. That way, if something is found by someone and then added to a tree, everyone can see that it was added to a tree as a match.

There are two things that can be done here.

1. A green 'tree' icon can be used to indicate that a user has found that data to match someone in his tree, and has placed it in his/her tree.

2. a brown 'tree' icon can be used to indicate that the system has come up with possible matches for a particular name and data to a name and data in the trees area, this can signal that these are possible matches to things already in trees that the computer thinks is a match.

Once someone finds something and adds it to their tree, the tree icon is added to the same entry in the general pool database of all records in the search area to indicate that someone has a match. That way when people search for data, they will know if the item is clearly in a tree, is a possible match to something in someone's tree, or not yet in a tree at all (no icon).

I heard the presentation in January at the UGA conerence, and they said they might have a star system to give relevance, but this goes beyond that and might even prove more useful than just a relevance rating would be. We still should go with that relevance rating possibly, but add the two ideas above to indicate when peole have found the item matches so people can find out more right then about that person.