Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in localization (8)

Wednesday
May172006

Motivations behind the "PeopleWords free invitations"

I have just recently upgraded PeopleWords (online platform for the translation business). Among various small fixes and improvements, PeopleWords now provides free invitations for the translators. If you're not familiar with the "invitation feature" of PeopleWords (it happens that some people are not), then just have a look at our white paper. In this post, I will explain the (commercial) motivations underlying this feature.

I have already explained (see my previous post) that there is a strong imbalance of risks in freelance translation jobs. The risk is way much stronger on the customer side rather than on the translator side (well, at least the "perceived" risk, because, in my experience, the risk is low anyway). Indeed, the translators "feel" intuitively that there is little risk for them to multiply their job sources (it does not really matter to know where the job comes from). On the contrary, customers are seeking stable and reliable translators and customers are quite reluctant to send their offers "in the wild".

As a direct consequence of this perception, there is a huge imbalance between the number of registered translators and the number of registered customers on PeopleWords. Basically, the number of registered translators is more than one order of magnitude greater than the number of registered customers. In my opinion, it's a really bad situation because it means that, on average, rather than relying on a dedicated platform (such as PeopleWords), customers rely on e-mail based processes to get their documents translated. As a customer, my experience indicates that managing freelancers by e-mail is just hell.

Did I say that to the Russian translator? or maybe it was the e-mail for the Spanish translator? Did I not pay already the Polish guy? Or maybe it was just the previous Polish job? How many Japanese documents do I have left untranslated? What was the price agreed initially for the Chinese translation? Was it consistent with the previous translation job that has been terminated last week?

Therefore, I have the feeling that there are many benefits for the translators to use a platform (as opposed to e-mails) even for their own personal customers. Yet, in such case, PeopleWords was taking a 10% fee that would have been considered unacceptable. The (free) invitations have been designed so that a translator can invite his own customers and leverages the PeopleWords platform without having to pay the regular 10% fee.

From the viewpoint of PeopleWords, why should I provide such a free service? If translators starts using PeopleWords for free, how am I going to buy the coffee that I need every morning? The immediate (but wrong) answer would be: Once the customer is registered on PeopleWords, he will start posting offers to see if he can get lower prices (thus quitting the translator that brought him to PeopleWords). This situation is very unlikely because of the risk imbalance mentioned here. Once the customer knows a good translator, he is not going to change to spare a few buckets, especially if it's the money of the company anyway.

But a more probable situation is: Once the customer is registered on PeopleWords, one day or another, he will need translations in languages that the original translator is not able to provide. In such cases, posting an offer on PeopleWords is the most straightforward option for the customer; and thanks to the 10% fee, I am able to buy some coffee the next day.

Thursday
Mar092006

A translator-friendly RESX file editor

A newer version of ResxEditor is now available, see my lastest blog post on this matter.

In a previous post, I was giving some details on the RESX format from a translator-friendly viewpoint. Actually, after proof-testing the XML concept with a few translators, I came up with the conclusion

The most brilliant Uzbek-Azeri translators do not speak XML. Do not seek any explanation, it's just a fact.

XML has a logic which is totally alien to the average translator. The answer to the question Why can't I freely insert < and > characters? simply does not match the average translator skills. Therefore, I have decided to come up with a more simple and elegant solution.

I have published a simple Resx Editor utility that comes as a stand-alone exe file. This application is free (yet not open-source, although I am considering the option) and will remain free.

Available features

The ResxEditor is a simple quasi-wysiwyg editor; at least the raw XML is kept hidden from the translator view. The features are limited to the bare minimum Open, Save and Save As, plus a text size adjustment option.

Features not available (also they should)

I have not included a Print feature (yet). Whether it is a must-have feature will depend on the feedback that I get. Actually, one of my objective is to keep this editor as simple and small as possible.

If there are features that you would really want to see in Resx Editor (or bugs that you really would not), feel free to post a comment.

Thursday
Mar022006

A translator guide to website translation

Since the publication of this post, I have released Resx Editor a free visual resource editor dedicated to translation works.

In this post, I give a short introduction about website translation. The targeted audience is non-technical translators. I will focus on the particular case of website translation when relying on Microsoft XML Resource files.

The big picture

Dynamic websites include many things beside pure textual content (programming source code, images, stylesheets, ...). In order to simplify the job of the translators, all the textual content can be isolated into resource files. The main idea behind resource files is to replace every textual item of the website by a resource idenfier. Intuitively, instead of having a webpage containing the text Hello World!, you have reference HelloWorld and multiple resource files. The English resource file contains HelloWorld="Hello World!", the French resource file contains HelloWorld="Bonjour tout le monde!", etc. By choosing the right resource file, the website appears in the corresponding language.

Basic concepts

  • identifier: a unique key associated to a textual item.

  • (localized) resource: the expression (the content) of a textual item expressed in a particular language.

  • (localized) resource file: a file containing a list of pairs identifier+resource.

Microsoft XML Resource Files

It exists many resource file formats, but I going to discuss the Microsoft XML Resource file format (RESX in short). This resource file is a XML format. Without digging into XML standard, it simply means that the content of the file look like

<?xml version="1.0" encoding="utf-8"?>
<root>
<data name="HelloWord" >
<value>Hello World!</value>
</data>
</root>

As you can see, the identifier is specified through a XML attribute (that's the terminology for the syntax somekey="MyKeyHere"). The resource is specified with a <value>My resource here</value>. Resource files are much more structured than classical, human readeable documents. Indeed, the webserver needs to be able to perform an exact matching between identifiers and the associated resources. Therefore, as a translator, you will have to be very careful when editing a resource file. You should not touch the XML markup, otherwise the resource file won't be readeable any more by the webserver. The only section that you can modify is what lies between the <value /> tags.

A more complete sample of RESX file:

<?xml version="1.0" encoding="utf-8"?>
<root>
<data name="HelloWord" >
<value>Hello World!</value>
</data>
</root>
<data name="GoodBye" >
<value>Goodbye!</value>
</data>
</root>
<data name="Thanks" >
<value>Thank you very much for reading this post!</value>
</data>
</root>

A bit of help from the web designers

Translating a website usually involves translating many small keywords like to, at, by, new, view. Those short English words are quite ambiguous. In order to simplify the translator life, a good website designer will include some additional indications within the resource file to facilitate the translation work. For this purpose, the RESX format includes an optional <comment /> tag. The previous XML sample can be modified in order to include a comment.

<?xml version="1.0" encoding="utf-8"?>
<root>
<data name="HelloWord" >
<value>Hello World!</value>
</data>
<comment>Don't forget to include the punctuation.</comment>
</root>

Do not translate those comments, you will be wasting your time. Those comments have just been included to make your life easier. Those comments are totally ignored by the webserver, their content will never appear on the website.

A bit of help from Notepad++

XML files are just plain text files (as opposed to rich text files such as Microsoft Word), yet due to the very sensitive nature of the XML markup (deleting a single > breaks the XML structure), you should better rely on dedicated tools to edit/modify RESX files. My personal suggestion is to use Notepad++, a very robust text editor that can handle XML files. Notepad++ is open source (you can download it and use it for free, even for commercial purposes).

Tip: Notepad++ does not immediately recognize RESX files as XML files. When you open a RESX file with Notepad++ go to Language→XML to select XML as the file language. You will benefit of a much cleaner view of the RESX file.

Top translation mistakes

Website translation is a job of precision. I am listing below a few probable errors that the unaware website translator might commit.

  • Spacing: "bonjour" is not the same as " bonjour" (notice the initial space).

  • Capitalization: "Delete" is not the same as "delete".

  • Punctuation: "Terminated." is not the same as "Terminated" (dummy parenthesis to keep the dot away).

  • HTML markup (caution, tricky): RESX file can contain HTML markup, but the symbols < and > are going to be encoded. The sign '<' (resp. '>') with appear encoded as '<' (resp. '>'). Do not touch the encoded HTML markup.

  • Weird symbols (tricky again): typically if you encounter something like Dear M. {0} the {0} is a substitute, (in present case, it's certainly a substitute for a user name). Do not touch any substitute.
Page 1 2