The Next Big (Church) Thing.........in genealogy

Discussions around Genealogy technology.
User avatar
huffkw
Member
Posts: 54
Joined: Sun Jan 21, 2007 6:34 pm
Location: Spanish Fork, Utah
Contact:

The Next Big (Church) Thing.........in genealogy

#1

Post by huffkw »

I hope someone is already looking ahead to The Next Big (Church) Thing in genealogy.
My vote would be for something the Church should be uniquely qualified to do – bring the Henry Ford principles of cooperation and mass production to the genealogy world so that new research can be completed at least 20 times faster, and with much higher quality. For example, by enabling the coordination of all the genealogy work in the nation, the basic genealogy for the entire United States could easily be done in a year, and that would include nearly all the nation’s source records being entered and linked together to form a well-documented national pedigree. This desirable goal is extremely unlikely to happen without a central organization supplying the necessary enabling software and administration, which the Church could easily supply. This project should also bring the Church very favorable press coverage.

Calculations: If 10 million US genealogists each put in 10 hours on this project in one year, that would reach a total of 100 million hours. That allows time to enter and interrelate 10 public records for each individual. Those individual records might include 6 census records, plus a birth, marriage, death, and burial record for each person. If 2 minutes are spent on each document, and there are 300 million deceased Americans, that would require 2 minutes * 10 documents * 300 million people = 6,000 million minutes = 100 million hours.

It is important to notice that the 100 million hours needed to do all the US genealogy is about the same amount of effort I estimate it will take to finish the removal of all duplicates in the (750 million name) New Family Search system. The difference is that the Church has only about 1% of the US genealogists. In other words, it could take the Church 100 years to do the same work the entire US genealogy corps could do in one year.

As a matter of priorities, it would make sense to put the Church anti-duplication project on the back burner for a time while the Church assisted the completion of a very high quality, heavily sourced genealogy for the entire US, including all the US ancestors of Church members.

It would then be possible to go back with this very high quality database and use it as the framework on which to consolidate the temple ordinance data, and as a reference file to resolve any issues of missing or conflicting data. The large number of original sources in the new data should help quickly resolve any questions about current Church data, which normally has no such source citations. This lack of citations and links to source records in Church data leaves open the possibility that there can be many unresolvable data disputes among members who share a common ancestor. The current Church ordinance data will probably overlap only a part of the entire nation’s genealogy, but for that portion, the Church data and the broader database of the US could reinforce each other.

(I have already tested computer programs to handle what I consider the important aspects of such a nationwide project).
User avatar
WelchTC
Senior Member
Posts: 2085
Joined: Wed Sep 06, 2006 8:51 am
Location: Kaysville, UT, USA
Contact:

#2

Post by WelchTC »

Isn't the Church already doing this with "FamilySearch Indexing" project?

Check out http://www.familysearchindexing.org/en/index.jsp. I have volunteered to do several batches of census and death records. How does your idea differ from that?

Tom
User avatar
huffkw
Member
Posts: 54
Joined: Sun Jan 21, 2007 6:34 pm
Location: Spanish Fork, Utah
Contact:

#3

Post by huffkw »

Don’t get me wrong, the "FamilySearch Indexing" project is a good project, as far as it goes. It is a great place to start. But from my viewpoint, creating an index to source documents is a far cry from setting the much more aggressive goal of actually having a national program to integrate those records into the finished, lineage-linked database of nationwide scope that I would like to see. That more aggressive goal would require different central software than I see available (or planned) anywhere. Someone would need to offer the appropriate central system for everyone in the US to use, not just LDS members. As it is, people might use Church online image data and the new indexes to put something together on their home computers, but their lack of a much more thorough-going system of cooperation means they would continue to be quite isolated and inefficient and would probably repeatedly do the same research over and over again, as in the past.

Maybe the Church has some worry about going the next steps to sponsor such a useful project, but it is hard for me to see a downside to it. I expect this important add-on could be done for a small fraction of what has already been invested in Church genealogy systems, especially since much work done already could be adapted for reuse in this new setting.
rmrichesjr
Community Moderators
Posts: 3827
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon, USA

#4

Post by rmrichesjr »

Your idea is a very interesting one. In case the Church doesn't choose to do something like this, I wonder whether there might be some other organization that might be interested in sponsoring it. Have you bounced the idea off some genealogical societies?

I wonder whether the two minutes per source record might be too low. Still, even if it took five or more minutes per record, a project like you suggest could potentially still do a lot of good.

While lurking in Usenet newsgroup soc.genealogy.computing, I have seen discussion of ambiguity. I recall reading a posting that described a situation where the researcher could not determine which of two or three people of similar ages with similar names in the same town or area was the true parent of the ancestor in question. This poster said such ambiguous situations are fairly common, even with full access to public documents. I think I see a possible advantage in such cases for the current approach of each person researching his or her own lines. The family researcher could have access to journals or other non-public documents that could likely resolve the ambiguity, while a project like you propose would not have access to non-public documents. What are your thoughts on that?

At times, I have toyed with a similar idea to what you're proposing. If the existing original source records could be digitized, indexed, and made searchable online, I wonder whether software could automate the searching of source records and building of family trees. At least the easy part of the search process seems to be based on some fairly simple rules. For example, if you have a marriage record, check for matching data in birth records, census records, etc. If a record is found that is a good enough match, and if the match is unique, then follow similar rules from that record to the next level of records. The software would be rule-based for making the leap from one record to the next. Expert system technology or fuzzy logic might be applicable. At least for easy cases, I sometimes toy with the idea that software like this could build a substantial pedigree in a few minutes. How's that for a crazy idea?
User avatar
daddy-o-p40
Member
Posts: 237
Joined: Wed Feb 21, 2007 1:22 pm
Location: USA
Contact:

#5

Post by daddy-o-p40 »

Tom, thanks for the post. Was not aware of the indexing project.
"What have I done for someone today?" Thomas Monson
russellhltn
Community Administrator
Posts: 34418
Joined: Sat Jan 20, 2007 2:53 pm
Location: U.S.

#6

Post by russellhltn »

I'd start off by saying that Record Search has not yet been added to nFS. Once that's been done, then maybe the infrastructure will be there to do something on the scale you're talking about.

Right now nFS only has Family Tree - a compilation of submissions. But Record Search will be adding the sources from the indexing project. There is thought about linking the two, but it's unclear just how that would be.
User avatar
huffkw
Member
Posts: 54
Joined: Sun Jan 21, 2007 6:34 pm
Location: Spanish Fork, Utah
Contact:

Not such a crazy idea

#7

Post by huffkw »

rmrichesjr:

Not such a crazy idea, I think. I will say more below. I also think that the appropriate new central system ought to be able to handle at least 5 different data assembly methods all at once, in parallel. (I will only address two here). The results produced by each method could later be cross-checked for differences and clues to solve problems. All methods could share the one general pool of raw data -- the source records and their indexes -- and each method could add whatever other data was available and seemed useful (existing PAF files, for example).

I want to emphasize again what a huge difference in the level of excitement and participation of the genealogy community I think it would make to announce this aggressive goal and project, while providing the software to actually do it. There is very little reason to attempt anything unusual if you know there is essentially no chance that anything good will come of it. But if people see that a process is working well and exceptionally efficiently, and the goal is in sight, then thousands will want to join in and claim part of the credit for the victory, and get some of the spoils -- the high quality finished lineage-linked data produced. I think it really could all be done in a year or two, even if it takes 5 or 10 times as much labor as my estimate.

Descendent-sequence research
My favorite method is not one usually heard about, but it could make a great deal of sense in the context of this large project. I believe that the ambiguity everybody struggles with is mostly NOT inherent in the data we have access to, but rather is mostly a consequence of how we choose to do things. If we insist on always doing pedigree-sequence research, then we will be endlessly mired in these ambiguity issues. This makes our typical pedigree-sequence research by far the least efficient method, in an overall sense, when compared to descendent-sequence research.

(If we working solely on our own then we care very little about overall efficiency, but if we are doing a big cooperative project, then the overall efficiency becomes very important, and we might be happy to change our methods to optimize that overall efficiency, since we all win by doing so. Gains of at least a 20-times improvement (2000%) in efficiency are possible through cooperation, so we ought to be willing to take a look, and maybe change our practices a bit.)

With pedigree-sequence research, the pattern is to try to find a whole series of needles in haystacks. Often, at each generation we jump backwards in our search, we are trying to find one or two people in a new set of public records, in a new location, perhaps with the records in a new and different language. Thus we may be endlessly faced with enormous learning curves for nearly every new person we seek. There is no serious chance for specialization (and its accompanying efficiency), and we tend to see only daunting new technical issues to solve at almost every jump.

Compare this with the mass-production and mass-cooperation methods of a descendent sequence method. Usually, I believe, if you start with a head of family (using his single surname to help limit most record searches), it is relatively easy to find his children, and then their children, and so on. Typically, you will be able to mine one set of public records for hundreds of related people. The family may stay in one place for generations, perhaps move once, and then stay in that new place again for generations. So huge blocks of names might be quickly and accurately interrelated, with very minimal occurrences of ambiguity. Almost every new name comes along with its own full family context, so there is little doubt who is connected to whom. A pedigree-sequence researcher may jump into an area and find there are some similar and confusing names in that general area, and so encounter ambiguity, but the descendent-sequence researcher will probably see no ambiguity problems at all, since he is already looking at only one particular local family.

But, you say, I want to know my pedigree, my ancestors, not my distant cousins! That is what everyone is trying to DIRECTLY achieve in their pedigree-sequence research. But here we run into a general economic principle that has never been applied to genealogy research, as far as I know. It is the Adam Smith idea that INDIRECT economic methods are usually the most efficient (and typically involve cooperation). His famous example was of making pins. If one person made one complete pin at a time, doing all the steps himself, it took a certain amount of time – all or a large part of a day for one pin. But if the pin-making were broken into sub-processes, with workers specializing, then the productivity exploded by some huge number. In Adam Smith’s study, the lone worker productivity was between 1 and 20 pins a day, but in an organized, cooperative group the per-worker productivity became “four thousand eight hundred pins in a day” (48,000 pins a day for 10 men). Those are the kind of productivity gains awaiting genealogists, if they will simply agree to cooperate in a large project. I am sticking with my 20-times number for genealogy, but much more is possible.

I hope you can see that historical genealogy research methods have hardly been affected at all by production-line methods. I think it is about time we left behind our cottage industry mindset and caught up with the industrial society. I think we would be absolutely astonished by the results. We would finally have adopted Henry Ford’s insights to organization.

Once masses of accurate, unambiguously connected, descendent-sequence researched, single-surname names, with source record references, are finished for an area, taking full advantage of any special skills developed to be successful with those local records, then these large blocks of many hundred or thousands of highly accurate names can be connected together through the women who move from a birth family to a marriage family. (Some new software features would help in this process). At that point, any needed pedigrees could be instantly traced, going backwards through the sets of interconnected descendent-sequence data.

Your suggested auto-connect method
I think it would be interesting to try your method on a suitable set of data. And it could certainly be a useful utility program to run with any database to help explore the data and resolve certain problems. But I fear that, as a general rule, it would suffer some of the same problems as the current pedigree-sequence research. First, there would typically be no way to predict in advance where the next jump backward might land you within the nation, so there would be a strong pressure to finish ALL data input for the whole nation before serious use of that algorithm could be expected to be very successful. Second, at each jump backward you would likely run into the same unnecessary ambiguities encountered by manual pedigree-sequence research.

One advantage here of the descendent-sequence research and data input is that whole geographical areas could be finished incrementally and checked off. The full access to all possible pedigrees (which you suggest) is still delayed until all the data has been interconnected and all possible manual checking and corroboration completed, but that should be a lot more solid and satisfying work product than relying only on the programmed guesses of a specialized search routine flying through a huge collection of raw data.

We should no longer view genealogy research with despair as an infinite and uncompletable task. Cooperation concepts (and appropriate implementing software) change the mathematics of genealogy research drastically. With the huge efficiencies inherent in the cooperation I suggest, we have plenty of time to do all data interconnections manually and carefully. There is no particular need to find a machine-match-merge solution to completing genealogy work on a national scale.
User avatar
huffkw
Member
Posts: 54
Joined: Sun Jan 21, 2007 6:34 pm
Location: Spanish Fork, Utah
Contact:

Record Search

#8

Post by huffkw »

RussellHltn:

I have spent several hours in demos and at genealogy conferences concerning nFS, and I even tried to do a few things on my own family line using it. I had only limited success because, for example, I could not even record my daughter’s marriage, presumably because of constraints on entering data about her husband, who is obviously not part of my pedigree. I think I can say with confidence that it is already a very complicated collection of rules and code – clever and perhaps even mind-boggling are the words that come to mind. I have worked on some big complex systems, and this certainly appears to already at least match any of those.

I really hope the decision is not made to try to make that one collection of code become “all things to all people,” as I keep hearing from different sources. What I hope is that this big project will be treated something like the Microsoft Office suite of programs. I think there is a very good reason why we have Word, Excel, Powerpoint, and Access as separate collections of code, but with the ability to easily translate data from one to the other for different operations. Imagine trying to code, maintain, and extend all those functions in one integrated piece of code, plus maybe stir in Photoshop and Acrobat just for good measure. And then plan to integrate eBay and Amazon in version 2. At some point early in the game logically inconsistent rules and functions start fighting with each other and you can’t even get the result you want.

If the nFS system does three or four important things well, and is optimized to do so, that ought to be good enough. I believe it is intended to be a temple work scheduler (and transmitter of related data to the temples, I assume), a way of checking on duplication of names going to temples, and a means to verify that all past work has been done correctly and thoroughly by merging all past ordinance data for the same person. That sounds like a good place to stop on functions and bells and whistles for one file and menu system.

The collection of rules and code to do the nationwide project as I have described it would need to be based on a very different set of assumptions and features to be optimized for that very different task. Rules that make sense in today’s nFS would probably cripple a system intended to serve a broader audience and a whole new set of data, such as the complete set of US source records in both image and text form. For example, if we have people outside the Church using this system to help create the nationwide database, as I certainly hope, would we keep them from adding in their daughter’s marriage, based on actual marriage source documents, because the husband was not in their list of accessible people? That would make no sense in that setting. There would be no reason to restrain them from entering and connecting data from ANY deceased woman’s marriage, using appropriate source documents.

I hope the choice will be made to put what is being called Record Search into a new framework which will be optimized for nationwide use, not just focused on uniquely LDS data needs, or shoe-horned into the already complex nFS database system. That would allow some exciting things to be done quickly. There would need to be no dependence at all on the current nFS data, since, for non-Church members, most linking of individuals could start with the public source records themselves, not with Church internal data.

As an important extra feature, the fully processed, mature data from the current nFS database might be transferable to the new database in some form that allows and encourages the linking of source records to that data, while leaving behind the long and intricate processing history of much of the nFS data.
jbh001
Senior Member
Posts: 856
Joined: Thu Mar 13, 2008 6:17 pm
Location: Las Vegas, NV

#9

Post by jbh001 »

huffkw wrote:I have spent several hours in demos and at genealogy conferences concerning nFS, and I even tried to do a few things on my own family line using it. I had only limited success because, for example, I could not even record my daughter’s marriage, presumably because of constraints on entering data about her husband, who is obviously not part of my pedigree.
The rules that constrained you are likely part of compliance with privacy legislation and regulations. To overcome that in any genealogy project would literally take an act of Congress. Therefore that is a poor example of the limitations of nFS.

A better example would be your attempts at working with records that privacy legislation and regulation no longer apply to. To fall outside the reach of privacy laws, the individuals you are working on need to have a birth date more than 95 years in the past if there is an available death date, or a birth date more than 110 years in the past if no death date is available.
jbh001
Senior Member
Posts: 856
Joined: Thu Mar 13, 2008 6:17 pm
Location: Las Vegas, NV

#10

Post by jbh001 »

huffkw wrote:For example, if we have people outside the Church using this system to help create the nationwide database, as I certainly hope, would we keep them from adding in their daughter’s marriage, based on actual marriage source documents, because the husband was not in their list of accessible people?
Creating any system capable of such error trapping would currently face almost insurmountable opposition from privacy advocates because to them it would be too Orwellian and too . . . um . . . (see Revelation 13:16-17). The potential to abuse such a database would be immense and make the current magnitude of identity theft problems look trivial.

I regret that our current society is too paranoid and too scheming to allow such a database as you envision, regardless of how useful it would be.
Post Reply

Return to “Family History”