Handwriting recognition software for indexing?

spencerschumann
New Member
Posts: 2
Joined: Mon Jul 31, 2017 3:40 am

Handwriting recognition software for indexing?

Postby spencerschumann » Mon Jul 31, 2017 5:58 am

I saw a snippet of video yesterday depicting pioneers pushing handcarts over the plains and through rivers. I marveled at how far we've come technologically since then. This journey was a tremendous, difficult, even life threatening task for the early saints, but today we can simply board a plane or hop in a car and comfortably make the same trip they took in a matter of hours. I liken that experience of the pioneers to the massive genealogical indexing labor undertaken by saints of our day. While it doesn't threaten our personal safety, it is nonetheless a long and difficult process, a seemingly endless journey of small steps as each record is carefully transcribed.

My personal interest in family history has been growing recently. I served a mission in Brazil nearly 20 years ago. While exploring my family tree, I was surprised to learn that my Grandfather had an aunt that moved to Brazil and raised a family there. I have very little information about this family other than some names and some rough dates. I saw firsthand the need for indexing as I tried to find more information about them. Searches turned up no direct hits, but I did find a large collection of handwritten birth records that hadn't been indexed. I tried to manually find information about this family in those records, but was unsuccessful. There's just too much data for one person to filter through.

Much like the leap from handcarts to planes, handwriting recognition (HWR, https://en.wikipedia.org/wiki/Handwriting_recognition, an advanced form of OCR) is the obvious technological step to hasten the work of indexing. I'm sure I'm not the only one to have thought of this, and I'm sure it has been used already in this work (for example, there's a mention of computer-indexed records at https://tech.lds.org/forum/viewtopic.php?f=58&t=28978). But today indexing is still primarily a manual task.

Despite being able to travel in speed and comfort now, we admire our pioneer ancestors for their strength and perseverance in completing the task given to them with the tools that were available at the time. In the same way, I expect that success in automatic indexing would not reduce our appreciation for the long hours of manual indexing labor that are being donated by people today.

I submit that the time is ripe for us to diligently work toward making automatic indexing a reality.

There have been many recent advances in AI, and we're seeing it used more and more in areas such as voice recognition and in the progress toward self driving cars. We have a wealth of tools available at our disposal that didn't exist just a few years ago. I am a software engineer. Around 15 years ago while in college I studied AI and machine learning, but I haven't done much with those subjects since then. I've recently come across several articles about AI that have sparked new ideas in my mind. I was awoken during the night last night, and as I lay awake I pondered on those ideas. I was then struck with the thought that those ideas could be applied directly to the work of indexing, and that perhaps it is part of my life's mission to apply the knowledge and skills I've been given to advance the work of family history in this way.

My next thought after this realization was that this effort needs more hands than mine alone. I wondered what would be the best way to organize a community around this effort, and I soon found this forum, which is full of other like-minded individuals with a variety of skills that can be applied to this task. Who else out there would like to work on this ambitious goal?

drepouille
Senior Member
Posts: 1473
Joined: Sun Jul 01, 2007 5:06 pm
Location: Plattsmouth, NE
Contact:

Re: Handwriting recognition software for indexing?

Postby drepouille » Mon Jul 31, 2017 8:13 am

Many printed obituary source documents available at FamilySearch contain the warning, "This record was indexed by a computer; there may be errors."
Dana Repouille, Plattsmouth, Nebraska

davesudweeks
Senior Member
Posts: 619
Joined: Sun May 09, 2010 8:16 pm
Location: Owasso, OK, USA

Re: Handwriting recognition software for indexing?

Postby davesudweeks » Mon Jul 31, 2017 10:09 am

drepouille wrote:Many printed obituary source documents available at FamilySearch contain the warning, "This record was indexed by a computer; there may be errors."


And there are. The software is pretty good with the letters, but getting the relationships correct with a complex document like an obituary fails more often than it succeeds. I'm ok with all that if the original is available to verify, but too many obituaries that have been indexed by computer are not available (or require a paid subscription) to view. :x

User avatar
sbradshaw
Senior Member
Posts: 3364
Joined: Mon Sep 26, 2011 8:42 pm
Location: Provo, UT
Contact:

Re: Handwriting recognition software for indexing?

Postby sbradshaw » Mon Jul 31, 2017 12:28 pm

I know that FamilySearch is looking into and experimenting with OCR and handwriting recognition, but I have no idea where they are in the process. If we want to organize a community effort, we'd want to be sure we're working with FamilySearch – otherwise a lot of effort will be wasted retreading the work that's already been done. Unfortunately, there aren't many developers who regularly visit the LDSTech forums (I don't know if I've ever seen any FamilySearch employees here), so the best bet would be sending feedback through the FamilySearch site.
Samuel Bradshaw • If you desire to serve God, you are called to the work.

spencerschumann
New Member
Posts: 2
Joined: Mon Jul 31, 2017 3:40 am

Re: Handwriting recognition software for indexing?

Postby spencerschumann » Mon Jul 31, 2017 5:13 pm

This is my first post here, so I wasn't sure what to expect. Thanks for the quick replies! I've posted a question on the FamilySearch site at http://gsfn.us/t/50zoz at sbradshaw's suggestion. I agree that avoiding duplicate work is important.

From what I can gather, OCR for printed text is getting good, but handwriting recognition, especially for cursive, lags behind. And unfortunately most of the handwritten records I've seen are written in organic, flowing script that I myself have a hard time reading.

It's too bad that the originals are often not freely available! But some are, and the ones that have already been indexed provide a wealth of training and testing data, which can be one of the big hurdles in making machine learning work.

Even before we can get to fully automated indexing, there's a helpful baby step: let the computer do its best first, then let a human review the output and correct any errors. Also, a computer could examine completed records to look for transcription errors. If the false positive rate is low enough, even this stage could be a helpful improvement over the fully manual approach.

User avatar
sbradshaw
Senior Member
Posts: 3364
Joined: Mon Sep 26, 2011 8:42 pm
Location: Provo, UT
Contact:

Re: Handwriting recognition software for indexing?

Postby sbradshaw » Mon Jul 31, 2017 8:09 pm

Commercial handwriting recognition gets better every year – though most consumer applications have the advantage of the person writing directly on a tablet or writing board, where hand movement can be tracked, and you don't have to deal with ink blots, fading text, or translucent pages with bleed-through from behind. Having a large corpus of human-verified text from similar documents would be key – FamilySearch's corpus is pretty good, but sometimes there are interpolations (like when the indexing project instructions have the indexer expand an abbreviation or skip punctuation). I'm certain that the technology will keep getting better every year, and eventually surpass the average human for interpreting text.

Another forum where you might get some traction is the FamilySearch Yammer community – I have seen employees interact with the community there.
Samuel Bradshaw • If you desire to serve God, you are called to the work.

russellhltn
Community Administrator
Posts: 22557
Joined: Sat Jan 20, 2007 2:53 pm
Location: U.S.

Re: Handwriting recognition software for indexing?

Postby russellhltn » Mon Jul 31, 2017 8:47 pm

spencerschumann wrote:let the computer do its best first, then let a human review the output and correct any errors.

I think that's actually a step backwards. From what I've seen, it's easier to "enter what you see" then to sit and compare two fields looking for mistakes.

I'm pretty sure FS uses multiple data-entry persons: accepting matches and sending discrepancies to arbitration. The computer might be useful as a "entry" person and accepted if validated by other human entries. But it depends on the accuracy. If the quality is sub-human it still might have much to add.
Have you searched the Wiki?
Try using a Google search by adding "site:tech.lds.org/wiki" to the search criteria.


Return to “Software Development”

Who is online

Users browsing this forum: No registered users and 1 guest