Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Thursday
Mar022006

## A translator guide to website translation

Since the publication of this post, I have released Resx Editor a free visual resource editor dedicated to translation works.

In this post, I give a short introduction about website translation. The targeted audience is non-technical translators. I will focus on the particular case of website translation when relying on Microsoft XML Resource files.

The big picture

Dynamic websites include many things beside pure textual content (programming source code, images, stylesheets, ...). In order to simplify the job of the translators, all the textual content can be isolated into resource files. The main idea behind resource files is to replace every textual item of the website by a resource idenfier. Intuitively, instead of having a webpage containing the text Hello World!, you have reference HelloWorld and multiple resource files. The English resource file contains HelloWorld="Hello World!", the French resource file contains HelloWorld="Bonjour tout le monde!", etc. By choosing the right resource file, the website appears in the corresponding language.

Basic concepts

• identifier: a unique key associated to a textual item.

• (localized) resource: the expression (the content) of a textual item expressed in a particular language.

• (localized) resource file: a file containing a list of pairs identifier+resource.

Microsoft XML Resource Files

It exists many resource file formats, but I going to discuss the Microsoft XML Resource file format (RESX in short). This resource file is a XML format. Without digging into XML standard, it simply means that the content of the file look like

<?xml version="1.0" encoding="utf-8"?><root><data name="HelloWord" >    <value>Hello World!</value>  </data></root>

As you can see, the identifier is specified through a XML attribute (that's the terminology for the syntax somekey="MyKeyHere"). The resource is specified with a <value>My resource here</value>. Resource files are much more structured than classical, human readeable documents. Indeed, the webserver needs to be able to perform an exact matching between identifiers and the associated resources. Therefore, as a translator, you will have to be very careful when editing a resource file. You should not touch the XML markup, otherwise the resource file won't be readeable any more by the webserver. The only section that you can modify is what lies between the <value /> tags.

A more complete sample of RESX file:

<?xml version="1.0" encoding="utf-8"?><root><data name="HelloWord" >    <value>Hello World!</value>  </data></root><data name="GoodBye" >    <value>Goodbye!</value>  </data></root><data name="Thanks" >    <value>Thank you very much for reading this post!</value>  </data></root>

A bit of help from the web designers

Translating a website usually involves translating many small keywords like to, at, by, new, view. Those short English words are quite ambiguous. In order to simplify the translator life, a good website designer will include some additional indications within the resource file to facilitate the translation work. For this purpose, the RESX format includes an optional <comment /> tag. The previous XML sample can be modified in order to include a comment.

<?xml version="1.0" encoding="utf-8"?><root><data name="HelloWord" >    <value>Hello World!</value>  </data>  <comment>Don't forget to include the punctuation.</comment></root>

Do not translate those comments, you will be wasting your time. Those comments have just been included to make your life easier. Those comments are totally ignored by the webserver, their content will never appear on the website.

A bit of help from Notepad++

XML files are just plain text files (as opposed to rich text files such as Microsoft Word), yet due to the very sensitive nature of the XML markup (deleting a single > breaks the XML structure), you should better rely on dedicated tools to edit/modify RESX files. My personal suggestion is to use Notepad++, a very robust text editor that can handle XML files. Notepad++ is open source (you can download it and use it for free, even for commercial purposes).

Tip: Notepad++ does not immediately recognize RESX files as XML files. When you open a RESX file with Notepad++ go to Language→XML to select XML as the file language. You will benefit of a much cleaner view of the RESX file.

Top translation mistakes

Website translation is a job of precision. I am listing below a few probable errors that the unaware website translator might commit.

• Spacing: "bonjour" is not the same as " bonjour" (notice the initial space).

• Capitalization: "Delete" is not the same as "delete".

• Punctuation: "Terminated." is not the same as "Terminated" (dummy parenthesis to keep the dot away).

• HTML markup (caution, tricky): RESX file can contain HTML markup, but the symbols < and > are going to be encoded. The sign '<' (resp. '>') with appear encoded as '<' (resp. '>'). Do not touch the encoded HTML markup.

• Weird symbols (tricky again): typically if you encounter something like Dear M. {0} the {0} is a substitute, (in present case, it's certainly a substitute for a user name). Do not touch any substitute.
Page 1 2