Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tuesday
Mar212006

## A few marketing tips for online freelance translators from a customer view point

Let me get the point clear: I am not a translator, I have never step a foot into a translation agency and I know nothing about the translation business. But as a simple customer, I have had a large amount of interactions with many freelance translators (most of this experience is related to the setup of the PeopleWords website).

Good online marketing is about sending positive signals to the customers. As a freelance translator, what signals are you sending to your customers?

If I am writing this small guide, it's because I have noticed that translators, in my experience, have, on average, really poor online marketing strategies. When I say "online marketing strategy", I mean What are you doing to convince a customer that you are an honest and brilliant translator. I have seen dozens of translators, often claiming years of experience for large and well-established companies, doing so ridiculous mistakes in their interactions with potential customers (i.e. myself) that I think a few "marketing" tips might not be totally unnecessary.

Frankly for a $100 online translation, I am never going to read your resume. Consider that for a$100 job, I am receiving a dozens of resume. Do you really think that a typical customer is going to read 20 pages (or more) of resume for a \$100 job? Additionally, there are so many resume just freely available on the web, what kind of proof is that? What tells me that you did not just get a random resume on the web and put your name on it? For online translation jobs, the usability of resumes is close to zero.

What is your real name, your address, etc? Most online translators seem to be very reluctant to disclose anything. Do not expect the customer to ask you such information, you have to disclose everything first. Just consider the customer position: if you have to choose between 1) a "real" person with a "real" name and a "real" address; 2) a fly-by-night anonymous login. Who would you choose? By the way, what are the risks of disclosing such information anyway? If you are afraid of being visible on the web, surrender now all hopes to become a successful online translator. Also avoid any john.smith@hotmail.com, john.smith@yahoo.com and john.smith@gmail.com e-mail addresses. Those e-mail providers are widely known to be totally anonymous. You need a trustful e-mail address (see point below).

Not having your own website (or blog)

A website or a blog (with some content in it) is really a strong signal for the customer. It means that you have a persistent online existence. Persistence means that you did not appear last week and consequently that you will most-probably still exist next week. The number one quality of freelance translator homepage is not shiny designs (who care's if it's just plain text) but bilingual content. Your page must be available at least in two languages. What a better proof that you're not a soon-to-be-vanished crook? Setting up an homepage requires only a few hours of work. Yet, my guess would be that more than 95% of the freelance translators do not have a personal homepage.

Poorly written communications

As a French customer, I can't judge whether you're writing good Chinese or not. I have no way to check your Chinese writing skills. Therefore, I will judge your skills based on what you will be writing to me. If your communications are constantly full of spelling mistakes, how can I trust you not having the same amount of spelling mistakes in the translated documents? My experience is that more than half of translators do not pay any attention to the spelling mistake in their communications. Spelling mistakes are a strong negative signal for the customer.

Unfocused job application

This point is connected to the resume discussion here above. A customer posting online a translation job is most likely to get at least a dozen of competitive translation offers. Therefore, your answer must be sharp and focused. Do not cut-and-paste a 10 line presentation of yourself, it's almost as pointless as sending your resume. In your answer, you must prove to the customer that you have some understanding of his context and that your experience matches his documents. In the customer's mind, such an answer sends a highly positive signal that you've already started to work on his case (which is not totally untrue).

As final note, remember that the customer choices are more a matter of trust than a matter of price.

Thursday
Mar092006

## A translator-friendly RESX file editor

A newer version of ResxEditor is now available, see my lastest blog post on this matter.

In a previous post, I was giving some details on the RESX format from a translator-friendly viewpoint. Actually, after proof-testing the XML concept with a few translators, I came up with the conclusion

The most brilliant Uzbek-Azeri translators do not speak XML. Do not seek any explanation, it's just a fact.

XML has a logic which is totally alien to the average translator. The answer to the question Why can't I freely insert < and > characters? simply does not match the average translator skills. Therefore, I have decided to come up with a more simple and elegant solution.

I have published a simple Resx Editor utility that comes as a stand-alone exe file. This application is free (yet not open-source, although I am considering the option) and will remain free.

Available features

The ResxEditor is a simple quasi-wysiwyg editor; at least the raw XML is kept hidden from the translator view. The features are limited to the bare minimum Open, Save and Save As, plus a text size adjustment option.

Features not available (also they should)

I have not included a Print feature (yet). Whether it is a must-have feature will depend on the feedback that I get. Actually, one of my objective is to keep this editor as simple and small as possible.

If there are features that you would really want to see in Resx Editor (or bugs that you really would not), feel free to post a comment.

Monday
Mar062006

## Best practice for website design, sandboxing with ASP.Net

Why should I care?

The web makes application deployment easier, but there is no magic web-effect that would prevent web designers of commiting the very same mistakes that regular developers commit while designing classical applications. In order to minimize the risks, I have found the notion of website sandboxing as a must-have for web designers.

What is sandboxing?

A sandbox is a place full of sand where children cannot cause any harm even if they intend to.

Replace children by visitors or testers and you have a pretty accurate description of what is website sandboxing. More technically, a website sandbox is simply a copy of your web application. A sandbox differs from a mirror from a data management viewpoint. The sandbox databases are just dummy replications of the running databases. The sandbox holds absolutely no sensitive data. Moreover, the sandbox databases might be just cleared at the end of each release cycle.

What are the benefits of sandboxing?

The first obvious advantage is that you can have shorter release cycles (ten times a day if you really need that), to actually test your website in realistic conditions. If you're not convinced just look how the full the ASP.Net forums are from messages with title like "Everything was fine until I published my website!"

The second, maybe less obvious advantage, is that you (and your testers) have no restriction in your testing operations. Just go and try to corrupt the sandbox data by exploiting some potential bug, what do you risk? Not much. Check what's happen if add some wrongly encoded data in you website new publishing system. Etc ... The sandbox let you perform all the required testing operations without interfering with your visitors on the main website.

The third, certainly obscure, advantage is that if you do not have a sandbox, other people will use your primary website as a sandbox. This is especially true if you are exposing any kind of web interface (GET, POST, SOAP, XML-RPC, whatever) because people will use your website to debug their own code.

Connecting all sandboxes together

Some webmasters might hesitate in letting their sandbox worldwide accessible. Personnally, unless having a very good reaon I would strongly advise to do so (see third advantage, here above). What do you have to lose? Expose your bugs? That's precisely the purpose of the sandbox anyway. Moreover many professional websites already have their own public sandboxes.

For example, PeopleWords.com (online translation services) has links toward PayPal.com whereas sandbox.peoplewords.com relies on sandbox.paypal.com.

You can design your website in such a manner than your sandbox hyperlinks other sandboxes. Also the notion of sandboxing is not restricted to web pages, but includes web services too.

ASP tips

• The only difference between you real website and your sandbox is the content of the web.config file. If your website and sandbox differs by more than their configuration files, you should maybe consider refactoring your website because it means that your deployement relies on error-prone operations.

• Dupplicate you website logo into mylogo.png and mylogo-sandbox.png and include a LogoPath key in your web.config file to reference the image. The mylogo-sandbox.png image must include a very visible sandbox statement. By using distinct logos, you can later avoid possible confusions between the sandbox and the website.

• By convention, the sandbox is usually located into sandbox.mydomain.com or www.sandbox.mydomain.com.

• Do not forget to replicate the databases (but without including the actual content). You should not rely on the primary website database.
Thursday
Mar022006

## A translator guide to website translation

Since the publication of this post, I have released Resx Editor a free visual resource editor dedicated to translation works.

In this post, I give a short introduction about website translation. The targeted audience is non-technical translators. I will focus on the particular case of website translation when relying on Microsoft XML Resource files.

The big picture

Dynamic websites include many things beside pure textual content (programming source code, images, stylesheets, ...). In order to simplify the job of the translators, all the textual content can be isolated into resource files. The main idea behind resource files is to replace every textual item of the website by a resource idenfier. Intuitively, instead of having a webpage containing the text Hello World!, you have reference HelloWorld and multiple resource files. The English resource file contains HelloWorld="Hello World!", the French resource file contains HelloWorld="Bonjour tout le monde!", etc. By choosing the right resource file, the website appears in the corresponding language.

Basic concepts

• identifier: a unique key associated to a textual item.

• (localized) resource: the expression (the content) of a textual item expressed in a particular language.

• (localized) resource file: a file containing a list of pairs identifier+resource.

Microsoft XML Resource Files

It exists many resource file formats, but I going to discuss the Microsoft XML Resource file format (RESX in short). This resource file is a XML format. Without digging into XML standard, it simply means that the content of the file look like

<?xml version="1.0" encoding="utf-8"?><root><data name="HelloWord" >    <value>Hello World!</value>  </data></root>

As you can see, the identifier is specified through a XML attribute (that's the terminology for the syntax somekey="MyKeyHere"). The resource is specified with a <value>My resource here</value>. Resource files are much more structured than classical, human readeable documents. Indeed, the webserver needs to be able to perform an exact matching between identifiers and the associated resources. Therefore, as a translator, you will have to be very careful when editing a resource file. You should not touch the XML markup, otherwise the resource file won't be readeable any more by the webserver. The only section that you can modify is what lies between the <value /> tags.

A more complete sample of RESX file:

<?xml version="1.0" encoding="utf-8"?><root><data name="HelloWord" >    <value>Hello World!</value>  </data></root><data name="GoodBye" >    <value>Goodbye!</value>  </data></root><data name="Thanks" >    <value>Thank you very much for reading this post!</value>  </data></root>

A bit of help from the web designers

Translating a website usually involves translating many small keywords like to, at, by, new, view. Those short English words are quite ambiguous. In order to simplify the translator life, a good website designer will include some additional indications within the resource file to facilitate the translation work. For this purpose, the RESX format includes an optional <comment /> tag. The previous XML sample can be modified in order to include a comment.

<?xml version="1.0" encoding="utf-8"?><root><data name="HelloWord" >    <value>Hello World!</value>  </data>  <comment>Don't forget to include the punctuation.</comment></root>

Do not translate those comments, you will be wasting your time. Those comments have just been included to make your life easier. Those comments are totally ignored by the webserver, their content will never appear on the website.

A bit of help from Notepad++

XML files are just plain text files (as opposed to rich text files such as Microsoft Word), yet due to the very sensitive nature of the XML markup (deleting a single > breaks the XML structure), you should better rely on dedicated tools to edit/modify RESX files. My personal suggestion is to use Notepad++, a very robust text editor that can handle XML files. Notepad++ is open source (you can download it and use it for free, even for commercial purposes).

Tip: Notepad++ does not immediately recognize RESX files as XML files. When you open a RESX file with Notepad++ go to Language→XML to select XML as the file language. You will benefit of a much cleaner view of the RESX file.

Top translation mistakes

Website translation is a job of precision. I am listing below a few probable errors that the unaware website translator might commit.

• Spacing: "bonjour" is not the same as " bonjour" (notice the initial space).

• Capitalization: "Delete" is not the same as "delete".

• Punctuation: "Terminated." is not the same as "Terminated" (dummy parenthesis to keep the dot away).

• HTML markup (caution, tricky): RESX file can contain HTML markup, but the symbols < and > are going to be encoded. The sign '<' (resp. '>') with appear encoded as '<' (resp. '>'). Do not touch the encoded HTML markup.

• Weird symbols (tricky again): typically if you encounter something like Dear M. {0} the {0} is a substitute, (in present case, it's certainly a substitute for a user name). Do not touch any substitute.
Friday
Feb102006

## When numerical precision can hurt you

The objective was to cure a very deadly disease and the drug was tested on mice. The results were impressive since 33% of the mice survived while only 33% died (the last mouse escaped and its outcome was unknown).

Numerical precision depends on the underlying number type. In .Net, there are 3 choices float (32bits), double (64bits) and decimal (128bits). Performance left aside, more precision cannot hurt, right?

My answer is It depends. If the only purpose of your number is to be processed by a machine, then fine, more precision never hurts. But what if a user is supposed to read that number? I did actually encounter this issue while working on a project of mine Re-Dox, reduced design of experiments (an online analytical software). In terms of usability, provide the maximal numerical precision to the user is definitively a very poor idea. Does adding twelve digits to the result of 10/3 = 3.333333333333 makes it more readeable? definitively not.

A very insteresting issue while design analytical software (i.e. software performing some kind of data analysis) is to choose the right number of digits. Smart rounding can be defined as an approach that seeks to provide all significant, but only significant, digits to the user. Although, the notion of "significant" digits is very dependant of the context and carries a lot of uncertainties. Therefore, for the software designer, smart rounding is more likely to be a tradeoff between usability and user requirements.

Providing general rules for smart rounding is hard. But here are the two heuristics that I am using. Both of them rely on user inputs to define the level of precision required. Key insight: since it's usually not possible to know the accuracy requirements beforehand, the only reliable source of information is the actual user inputs.

Heuristic 1 = the number of digits in your outputs must not exceed the number of digits of user input by more than 1 or 2. Ex: If the user input 0.123 then provides a 4 or 5 digits rounding. Caution, do not take the user inputs "as such", because they can include a lot of dummy digits (ex: the user can cut and past values that look like 10.0000, where the digits is zero and implicitely not significant). The underlying idea is "no algorithm ever creates any information, an algorithm only transform the information".

Heuristic 2 = increase the number of digits of the heuristic 1 by a number equal to CeillingOf(log10(N)/2) where N is the number of data inputs. Actually, this formula is simply an interpretation of the Central Limit Theorem (Wikipedia) for the purpose of smart-rounding. Why the need for such bizarre heuristic? The underlying idea is slightly more complicated here. Basically, no matter how you combine the data inputs, the rate of accuracy improvement is bounded. The bound provided here corresponds (somehow) to an "optimistic" approach where the accuracy increase at the maximal possible speed.

Page 1 ... 28 29 30 31 32