Page 1 of 1

The Challenges of Internationalization

Posted: Tue Jan 27, 2009 12:09 pm
by McDanielCA
The Challenges of Internationalization was originally posted on the main page of LDS Tech. It was written by Nancy Carter.

---------------------------------------------------------------------------------


Not only must the Church teach in all the languages of the earth, it must also build software for use by local leaders and members in all the languages. Building global software that adapts to various cultures and languages adds many challenges to the work. One of the most important and challenging parts is handling the names of members in a culturally appropriate way. For example, when you receive e-mail with your name incorrectly displayed, the tendency is to discount it as spam. If your product hard-codes American name customs, which are not appropriate in other cultures, it may be discounted, laughed at, or even be offensive. In one extreme case, it could have been harmful to a member. In a country with civil unrest, the member’s name had been written with characters from a different ethnic group. The membership record was sent back with a note explaining that this would identify him with the wrong ethnic group and his life would be in danger.

Handling names and dates correctly are two of the major issues that must be addressed when building global software. Externalizing strings, data corruption or characters not displaying properly in displayed strings are additional issues that must be overcome. While working on MLS for the last five years, I have come across these issues as well as others.

As mentioned, how names are handled is very important. “One size fits all” programming is not the answer since names vary greatly between cultures. For example, in Spanish and Portuguese cultures, individuals inherit two family names, one from their father and one from their mother. In places like India, Indonesia, and Ethiopia, some ethnic groups do not even have the concept of a family name and individuals may receive only a one-word name. In the Japanese culture, a pronunciation of the name must be presented along with the name itself, and all lists must be sorted by this pronunciation. In Hong Kong, a Romanized version of the name is also displayed with the Chinese name on passports and other national identification documents. The length of names varies drastically as well, with most Chinese and Korean names being only three or four characters long, while Polynesian names may be over 70 characters long.

The order that names are displayed in varies according to the locale as well. For example, Japanese certificates in MLS display the family name first and given name last: 岸本 伸欣. Chinese certificates are in the same order, but with no space between the names: 陳浩然.

Writing systems must also be considered when handling names. Many of the characters in Cyrillic and Latin look exactly the same. If Latin and Cyrillic characters are mixed in a name, it may not sort properly or may be difficult to find because users searching for the name will usually enter the search criteria in just one of the writing systems, not realizing that the name is actually a mix of the two. Using ICU4J (an open source library) we can detect which writing system is being entered. [MLS does not allow Cyrillic and Latin characters to be mixed.]

Different cultures order dates differently. For example, 06/07/2008 would seem quite clear as June 7, 2008 but for some countries it would be read as July 6, 2008. Java has standard functionality to handle dates for different cultures in different formats. However, we have the requirement to store partial dates to cover situations where people may only know their birth year and not the month or day. Date types in Java and other technologies usually do not allow for partial dates, which is why we built our own. We store dates as YYYYMMDD and format them for display according to the locale and type of date needed. For US English, an abbreviated date would display as 25 Nov 2008 and a certificate date as 25 November 2008. For Chinese, an abbreviated date would display as 2008-11-25 and a certificate date as 2008年 11月 25日.

A basic issue is the need for developers to externalize messages or other strings that will be shown to the user, so these can be translated, instead of hard coding them. There are products that will help the developer, such as the Externalize Strings Wizard in Eclipse or the SourceForge.net open source tool Shikari that can be used in Eclipse or as a standalone.

As I first started working on the translation issues for MLS, I found some of the translated strings had data corruption that looked like “여러분ì�€ 마갔. This can be duplicated by opening an XML file with non-ANSI characters in WordPad 5.1. This kind of corruption is created because WordPad reads it in as ANSI. A good text editor or IDE (integrated development environment) that uses UTF-8 will not cause these problems. I use Eclipse with the workspace text file encoding set to UTF-8.

International Testing Basics provides more information on how to test for data corruption and some of the terminology.

There is also the issue of characters showing up as boxes - Image. This is solved by installing the necessary fonts to display Unicode characters. For Windows XP, in the Regional and Language Options on the Language tab, both boxes must be checked for Supplemental language support and then the machine must be rebooted.

These are just some of the issues that must be considered when creating an internationalized product. Other issues like addresses and sorting must also be handled to have a product work as it should in other languages. If you would like to learn more, there are many resources available on the Internet; for instance, this excellent tutorial given by Addison Phillips.http://www.inter-locale.com/whitepaper/IUC-Intro-to-I18N-Tutorial.pdf


Nancy Carter is a software engineer for the Church.

Posted: Wed Jan 28, 2009 8:03 am
by WelchTC
This article just gives you a glimpse at how hard it can be to internationalize a program. Translation has similar challenges. As I'm involved with the Online Scriptures project, I've seen first hand the challenges in translating the scriptures in other languages. What makes it even more challenging is that languages are like living beings. They change. So meanings of words and phrases change over time. Also, certain worlds or concepts don't translate well.

Tom

Posted: Sun Feb 15, 2009 11:03 am
by sterlingb
tomw wrote:certain worlds or concepts don't translate well.

That would certainly make interstellar travel difficult.

I wonder if the church's means for handling i18n (you didn't mention different calendar types, but I'm sure you handle that too) is mature enough to contribute it? We have wiki's and such here, perhaps the church would also set up a read-only repository of source code it's willing to publicly release?

Posted: Mon Feb 16, 2009 8:57 am
by WelchTC
sterlingb wrote:That would certainly make interstellar travel difficult.
ha...whoops. Live long and prosper and may the force be with you. ;)
I wonder if the church's means for handling i18n (you didn't mention different calendar types, but I'm sure you handle that too) is mature enough to contribute it? We have wiki's and such here, perhaps the church would also set up a read-only repository of source code it's willing to publicly release?
We are moving more this direction each day. Keep your eyes on the wiki as we announce more source code that is available and more projects we want help with.

Tom