Nihonngo wo hanasemasu ka?

I saw this link in a post on Scoble’s blog today, and it made me think of all the work the MacBU does for its international software releases. Shall I tell you about it? (Too bad if you said “No!”)

In an earlier post, I told you how a large part of my day-to-day job over the past year has been driving our transition to Xcode. That’s not all I do, however. I’ve been very privileged to have been the US-based point-of-contact in MacBU engineering for our localization work ever since we started work on Office X. I work to coordinate the efforts of our localization teams in Dublin, Ireland and Chofu, Japan.

Internally, ‘localization’ refers to the whole process by which we take a piece of English software and create the Japanese or whatever version. This includes everything from the actual translation of the text elements to tweaking the layout of dialog box controls to renaming certain files on the disk images to actually assembling the disk images for the new languages. Additionally, localization involves ensuring that we don’t use any politically or socially incorrect terminology in any of the product languages, including English. We currently localize Office into Japanese, French, German, Spanish, Swedish, Italian, and Dutch.

My role is basically to provide developer expertise in this area to the teams in Ireland and Japan, as well as to assist our Redmond and Silicon Valley-based developers in making their feature designs localizable. Features are localizable when they don’t hardcode strings into the source (use resources instead like .strings files), don’t hardcode string substitutions (since the order of subjects, objects, and verbs changes from language to language), allows for dialog and control layouts to shift (German text tends to be up to 1/3 longer than English, so dialogs have to be able to reflow to a larger size), etc.

I don’t do any of the actual translations, however. I speak enough Japanese to wander around Tokyo and look for Studio Ghibli goods (I found a cel from the opening credits of Tonari no Totoro for my son’s room the last time I went to Tokyo), but not to do any official business work. I’ve forgotten almost all of my high-school French, and while my last name is German, I can’t even count to 10 in that language…

Anyway, back to localization. ‘Loc’ is a very important thing to get right. Look at this laundry list from Nicole Simon’s post that I linked to above:

There are not many tablet users in Germany, but many may be interested in using one. I could be a blogging about how happy the tablet makes me, but I have to admit, I cannot. Because it does not let me do the stuff I want to do – and therefor I do not expose my peers to it.

I will not blog much about Flickr, because it uses English and all the help and exposure I could give would be irrelevant because I would need to do translation as well.

I will not blog much about Google Spreadsheet, because the first thing people will notice is that you cannot even start to use it for German calculation.

I will not blog much about Office 2007, because it is unusable for me and I also assume that I will not be able to even install it without bigger problems on my machine.

If you look at my examples they may sound non relevant, but they are little step stones toward a bigger picture: We do get used to ignoring you over here because you don’t care about us. And while we are busy conducting our lives, you bring one firework after another – just on the other side of the earth ball.

Politics aside, we can’t afford (literally) to mess up the localized versions. We’ve got to get all these little things right — and I don’t think her list is so little. Consider Excel. We’ve got scads of code to do things like Japanese Imperial date formats, commas instead of periods for European decimals, translated function names for cell formulas, etc. And, all that data that gets saved into the file has to look right when opened in, say, English Excel. So we have to be able to map from SUM to SUMME and back for German Excel, for example.

The really tricky part of localization is that much of the work has to happen after we’ve locked down the visual aspect of the current release. It is expensive to keep translating and retranslating text of a feature that is in flux. Yet, over the last 10 years we’ve gone from a 6-month delay between English and localized releases to somewhere on the order of 2 weeks delay. This means that we’ve compressed the same amount of work into less time, so that the international Mac community gets their hands on the next version of Mac Office at almost the same time as English speakers do. That compression means that we don’t have much time to fix bugs in the main codebase that are only revealed by the localization process!

One of the ways we deal with that is a process called ‘pseudo-localization.” This has nothing to do with ‘pseudo-code’; instead, it is a way of forcing text into some translation automatically, yet still have that text be mostly readable. It works by taking the normal Roman alphabet and changing each of the characters into some similar character, perhaps one with an accent, or a copyright symbol instead of a C. We also pad each string with extra text to make it wider to check for dialog mis-layout and string insertions.

So “pseudo-localization” might become “[=== pséü?øløçålîzå†îøñ ===]” — still mostly humanly-readable, wider to force dialog layout, and bracketed so we can tell if a dev hardcoded string insertions. We can do this in an entirely automated fashion, and this technique lets us test perhaps 50% of Office as if it were localized, so that we can catch obvious dev mistakes right away. We thus reduce the risk of finding a bad localization bug so late in the release process that we can’t fix it safely. Pseudo-loc does sometimes generate bogus bugs (for example, if it auto-translates something into a value the code never expects, like “10″ into “[=== 10 ===]” and we are only expecting a series of digits) so we can’t use it for everything.

It takes a lot of work, and I’m very proud to be a small part of our international efforts. I know we don’t get it all right, but we try our darndest to make our software relevant, useful, and usable by the international community.

17 thoughts on “Nihonngo wo hanasemasu ka?”

  1. Thanks for the explanation. I know first hand how hard it is to work in 2 or 3 different languages on a project (I live and work in Tokyo…).
    However, Mac OS X makes it quite easy (or at least for “from scratch” projects) to have one binary and several localisations. None of Microsoft’s products support this, and I find it quite frustrating, because when I share my machine with my girlfriend, she logs into a Japanese environment, and I log into an English environment. Most (if not all) programs launch correctly in the appropriate language. But if you buy the Japanese version of Office 2004, no matter on which environment you launch it, it will be all Japanese.

    Since you are moving to Xcode, and thus I assume Mach-O binaries, will you adopt the “one binary, many languages” approach, or will I be forced to buy 2 different licences of Office 200x just to have the interface in English AND Japanese ?

    oh, and by the way, もちろん、日本語話せます。

    宜しくお願いします。

  2. That’s for this nice explanation. Localization is something really close to my heart (and I’ve localized a few apps myself so I *do* appreciate the ammount of work involved).
    As many others, I was also wondering whether the next version of Office will provide multilingual packaged apps with .lproj (Kurisu: VPC and Messenger are already packaged. When I launch the US version on my Mac, the GUI is in French).
    The fact that you have to rely on pseudo-localization to automate the process between versions of Office sounds weird to me. I would have thought that you’d have easier tools to re-use what has already been done in the past (glossaries, tables with specific resources in all the languages…).

  3. Thanks for your blog and for postings about backyard of MacBU. This post is very interesting for me, as i’m user from Czech republic (obscure country from central Europe using even more obscure characters :),and i’m emploey of one major czech Mac distributor. We (users and dealers) realy would like to have localized version of Office, even without “blessing” from MS, but every negotations ended with result, that it’s sold too few licenses here. So i hope, that with next version you will move to packages and it will be possible to make localization even unoficialy :)

    Anyway – thanks for you blog :).

  4. MaLer: that’s why I wish for a multi-lingual version of Office (with non-locked .nib :-\). That would be the open door for whoever whats to to add their own localization in Office.
    Last time I looked at Messenger though, most of the resouirces were locked, preventing further localization (or I messed-up something, which is equally possible ;-) ) :-\

  5. Ahoj MaLer,

    Nevim jestli pristi Office bude v Cestine, ale doufam, ze se bude dat objednat slovnik a gramatika…

    Like for Cortig, localization is an issue very close to my heart. And I think that the MacBU could think more International in two ways:
    - first, realize, like MaLer pointed out, that Office Mac will never have enough critical mass to support all the languages Office:Win (or OOo) has. Opening the door to the dictionary and grammar tools would be a first step (Aspell is a nice example of what open tools could add to Mac:Office)

    - seconde, and maybe the most important, get more feedback of international usage. For example, the way people in (continental) European business have to toggle all the time between languages would be a great hint to the MacBU to have an UI better suited that software designed by English speaking dev team, dealing with English Customers

    - and finally, yes, there are crashers, some quite old, in Office Mac in localized ressources. A famous example is word French proofing tools: just enter “Tu viens avec moi ou pas ?” followed by a … this will crash Word X as Word 2004

    But one thing for sure, is that Apple is worst than MS. Their are so many MacOS tools and services simply not supported in anything but US that we should get a discount ;-)

  6. Cortig, i should have called out more clearly that we do indeed recycle translations from previous versions. That would be a heck of a lot of duplicate work if we started from scratch each time. Pseudo-loc mostly helps us verify that code changes in existing features and new code in new features is properly ‘localizable’.

    As for the questions about multi-lingual bundles in the next version of Mac Office, I decline to comment. That product is in active development. Sorry to leave you hanging…

  7. Thanks anyways, it’s not like I was expecting a yes/no answer…

    I guess it’s just wishful thinking on our part, we the international bunch.

    But in any case, the way you describe your localisation process kind of makes me think the next version of Mac:Office will be pretty much the same on that front, since you clearly state in your original post that there always is a delay between the release of the English version and the other international versions. You do not need to breach any NDA to state the obvious…

    And let’s face it, none of the Windows versions are multilingual in terms of the UI anyway, which means most of the strings are probably still hard-coded, thus it would be a pain in the neck for you guys to replace all those with .lproj resource files everytime the Win:Office team hands you the code for the new versions. That also is proof enough that the next version of Mac:Office will not use the multilingual facilities of OS X : you guys are on a tight schedule, and this alone would push the next release one more year, at the very least !

    I just hope some day you guys will consider doing it, because it’s getting pretty expensive to purchase 2 licenses of the same program *each time* just to have the interface in a different language. And sorry, but Mac:Office ain’t cheap.

    Sorry if I sound a bit harsh, it’s just that I expect a lot from a company like Microsoft. No personal attacks intended. I appreciate the insight you give us ! it is a step in the right direction.

    cheers

  8. BAM, it’s of course possible to buy spellchecker for Office X (but not grammar check), our company is making it (check http://www.dtpobchod.cz). I’m affraid, that’s yet not possible to use system spellchecking services in Carbon apps (at least i’m not avare of any using it).

    BTW, there is one nice bug in Office, which makes Word document “jumping” to the cursor position, it’s necessary to switch off something in preferences. It seems to be languace or script specific.

  9. I had no idea that citing this post on my site would make it show up in the comments. Actually I’m not quite sure how it happened…

  10. It’s a feature in wordpress that you have enabled on your blog. See Options/Discussion, and then the first checkbox. “Attempt to notify any Weblogs linked to from the article (slows down posting.)”

  11. If you rely on “pseudo-localization” for testing Word with a variety of languages, it’s no surprise that an obvious bug such as the one with the non-breaking space and Postscript (Type 1) fonts does not get caught in time:

    http://www.betalogue.com/2006/06/20/erik-schwiebert-on-localized-
    versions-of-microsoft-office-for-mac-os-x/

    What I fail to comprehend, however, is why this bug is still not fixed TWO YEARS LATER (not in the English version of Word 2004 anyway).

    Care to comment?

Comments are closed.