Indexing idea: handwriting samples per writer

Discussions around Genealogy technology.

Indexing idea: handwriting samples per writer

#1Postby rmrichesjr » Fri Nov 30, 2007 10:49 am

After doing a few pages of census indexing (5 pages from 1900), an idea hit me. When dealing with difficult handwriting, it would be helpful to have structured samples of how a given enumerator writes each letter, lowercase and uppercase, and digit.

On most of the few pages I have indexed, I have had to spend considerable time searching around the page to find examples of how the enumerator writes several different letters to try to decide which letter the enumerator intended by a particular strange-looking squiggle. In many cases, finding a particular example has been extremely important in correctly interpreting the handwriting.

I would propose a side/branch effort to capture a small number of samples of each letter and digit for each enumerator to make them available to indexers and perhaps arbitrators. I envision that a volunteer would go through a few or perhaps several pages by the same enumerator. Perhaps except for the few letters that are rarely used, this volunteer would capture a few examples of each letter and digit. These samples would be stored such that indexing volunteers could use the enumerator's name, the state, the year, and perhaps the county to pull up the samples. For cases where multiple enumerators in the same state, county, and year had the same name, the images of the enumerator's name could be used to select the right enumerator. Perhaps the creation of samples from a given enumerator could be triggered by indexing volunteer request. Or, perhaps the indexing volunteer could capture the samples himself/herself after finding particularly difficult handwriting.

I believe this would yield significant improvements in accuracy and efficiency of indexing volunteer time, which would increase the amount of end result achievable. I believe the amount of time saved will be far greater, perhaps several times greater, than the amount of time spent capturing the samples.

Comments?
rmrichesjr
Community Moderators
 
Posts: 935
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon

Good Idea

#2Postby The_Earl » Fri Nov 30, 2007 12:15 pm

I had the same thoughts for Ellis Island records. You can be pretty sure that 'blacksmilh' is not an occupation, and that likely the L you put in earlier is in fact a T. I often would go back to earlier records and revise them, or fill them in when I finally figured out what a letter was supposed to be.

After a while of doing the Ellis Island records, you get a feel for the recorders in your set, and our more experienced indexers could tell at a glance what took newer people a while to figure out.

I never came up with a good solution to passing that information along. Not having good access to the document scans, and lack of good tools on the indexing machines sort of made it difficult. I never got around to deciding how to publish the info. I didn't think that a printed doc would have the resolution to show the nuance in some of the characters, and an intranet site or such would require a network and server, a project that was much more involved in the lab that we were working on. An internet site was out of the question, as internet access was strictly and explicitly forbidden.

A user tool that would allow you to store and tag samples of handwriting was more what I ended up thinking would be most useful, but I could not get a simple way of capturing sections of the docs without resorting to Paint and screenshots. The machines we used were simply not powerful enough to make such a process fluid.

This was a lab at the UofU institute, so some of the solutions would not work for individuals indexing at home or at a FHC. On the other hand, home users probably have more powerful machines, better tools and internet access.

So, problem number one:
How do you get handwriting examples out of the indexing app?

Problem two:
Do you build a tool for a single user, or a published source for many to view?

A single use tool exacerbates the problems of getting samples out of the app, and requires manual steps to elevate that info to general use.

A published source could be created once by a knowledgeable group, but then you have to publish and distribute the info. As records and recorders change, this process needs to be repeated.

Thanks
Barrie
The_Earl
Member
 
Posts: 275
Joined: Wed Mar 21, 2007 8:12 am

#3Postby rmrichesjr » Fri Nov 30, 2007 12:47 pm

The Earl wrote:...

So, problem number one:
How do you get handwriting examples out of the indexing app?

Problem two:
Do you build a tool for a single user, or a published source for many to view?

...


For problem one, I was thinking of an enhancement to the indexing app. While capturing the examples, using an enhanced indexing app or a different program, the user would select a small region from the scan, large enough to hold the letter in question and few pixels around it, then click on which letter and case it should be stored as. It should certainly be possible from inside the app to copy and paste small pieces of the image.

For problem two, I was thinking of a general tool usable by all indexers. The handwriting samples would be captured and stored once for each enumerator or other hand-writer, because different people write things quite differently. The samples could be taken from a combination of the number of pages the enumerator wrote.

To use the samples, the indexer would enter the year, state, perhaps county, and enumerator name and select the enumerator. This would download the samples for that enumerator.

As one example, on one of the pages I did, the enumerator made 'M' and 'W' nearly identically. In the marital status column, it was tough to tell whether it should be married or widowed. Samples could be taken from the gender column, where it was either 'F' or 'M', and from other columns where "Wisconsin" was massively more likely than "Misconsin". With the two 'M' and 'W' samples available, the indexer could better and more quickly judge what was in the marital status column. Capturing the samples once per enumerator would save time on each and every page that enumerator wrote.
rmrichesjr
Community Moderators
 
Posts: 935
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon

#4Postby CJohnson-p40 » Fri Nov 30, 2007 3:29 pm

has any sort of advanced OCR system been setup to make things a bit faster?
CJohnson-p40
New Member
 
Posts: 13
Joined: Thu Nov 29, 2007 5:03 pm
Location: Taylorsville, UT

#5Postby rmrichesjr » Fri Nov 30, 2007 3:37 pm

CJohnson wrote:has any sort of advanced OCR system been setup to make things a bit faster?


If I am informed correctly, OCR of clean scans of clear typed material is still problematic, with error rates still pretty high. For cursive handwriting, especially some of the very difficult handwriting on the records, the success rate would have to be nearly zero.
rmrichesjr
Community Moderators
 
Posts: 935
Joined: Thu Jan 25, 2007 11:32 am
Location: Dundee, Oregon

#6Postby thedqs » Sat Dec 01, 2007 11:28 pm

OCR deals with printed characters, but there are some handwriting recognition programs in progress which at some future date might be helpful in this type of work.
- David
User avatar
thedqs
Community Moderators
 
Posts: 1034
Joined: Wed Jan 24, 2007 8:53 am
Location: Redmond, WA

#7Postby atticusewig » Mon Dec 10, 2007 3:05 pm

On alternative would be to have users of any
password-protected lds site to do the OCR for
you using CAPTCHA. You know how on some
websites it says type the letters in this box
to gain access to this website, this is CAPTCHA.
Now if we were to take handwriting samples
to fill the box, and then follow a majority rules
principle, we could have members doing the OCR
for us.

For example,

The cursive word Huggins is presented in
a box to all the people trying to login in to
their ward websites at say 8:34pm.
If there are a 1000 people and 970 of
them type in Huggins, most likely that
is the right answer.

For a little inconvenience, we add an
extra layer of security, and are able to
do Human-assisted OCR at the same time.

Setting up the system probably won't be
trivial though.

- Atticus
atticusewig
Member
 
Posts: 305
Joined: Fri Jan 19, 2007 9:48 am

#8Postby mkmurray » Mon Dec 10, 2007 4:45 pm

Cool idea. Yet how does the CAPTCHA know you entered the correct answer? Usually there is a correct answer to the CAPTCHA in order to prevent automated bots from being able to login and/or register.
User avatar
mkmurray
Senior Member
 
Posts: 3213
Joined: Tue Jan 23, 2007 9:56 pm
Location: Utah

#9Postby russellhltn » Mon Dec 10, 2007 5:10 pm

Not only would you not know if the answer was correct, the person doing the entry isn't' as experienced as the indexers who work on a whole page at a time.
russellhltn
Community Administrator
 
Posts: 15718
Joined: Sat Jan 20, 2007 2:53 pm
Location: U.S.

#10Postby thedqs » Mon Dec 10, 2007 6:22 pm

I have to aggree with Russell, although it would give you an idea of what people assume the name to be and then you could give the list of names to the indexer to see what they think.
- David
User avatar
thedqs
Community Moderators
 
Posts: 1034
Joined: Wed Jan 24, 2007 8:53 am
Location: Redmond, WA

Next

Return to Family History

Who is online

Users browsing this forum: No registered users and 0 guests