Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in practices (34)

Monday
Sep182006

Over the Internet, your name is your personal trademark

I have been dealing with freelancers for various tasks (translations, graphists, development), and it's still unbelievable that most freelancers do not pay any attention to maintain a consistant name in their communications. Let me clarify this point: I do not care to know of the exact legal name of any freelancer I am dealing with. But how can I even recognize the person if messages never get signed twice with same name?

Over the Internet, your name is your personal trademark. If you're not careful, people will simply not remember who you are and this rule isn't restricted to freelancers. The most usual consequence over poor name branding is that people will filter out your communication attempts (email, intant messenger and the like) as spam. A clear naming policy means that your name must be explicited and obvious in all your communications ranging from Skype to regular postal mails.

In my experience, the most common inconsistant naming case are the following:

  • The e-mail must completely match the person name. If your name is John Smith then your email must be john.smith@yourcompay.com not joe@yourcompany.com or jsmi@yourcompany.com. If you have a lengthy name so will be your e-mail address.

  • The Skype/MSN/Whatever username must completely match the person name too. I have found the instant messaging practices to be even worse. Fantasy names like batman4ever are not uncommon. A good practice is to use your e-mail as a instant messenger id.

  • Lack of personal home page: Google is the de facto yellow pages. When somebody type your name then he must get your home page. If your name is really John Smith then it's going to be tough. Well, in such case, just add some random nickname between the first and the last name to become distinguishable.

It may appear obvious to some of us, but it seems that many people do not realize that the lack of consistency in their communications does have a strong (negative) impact on the people they are communicating with, especially if the name is the only concrete reference to the person

Monday
Mar062006

Best practice for website design, sandboxing with ASP.Net

Why should I care?

The web makes application deployment easier, but there is no magic web-effect that would prevent web designers of commiting the very same mistakes that regular developers commit while designing classical applications. In order to minimize the risks, I have found the notion of website sandboxing as a must-have for web designers.

What is sandboxing?

A sandbox is a place full of sand where children cannot cause any harm even if they intend to.

Replace children by visitors or testers and you have a pretty accurate description of what is website sandboxing. More technically, a website sandbox is simply a copy of your web application. A sandbox differs from a mirror from a data management viewpoint. The sandbox databases are just dummy replications of the running databases. The sandbox holds absolutely no sensitive data. Moreover, the sandbox databases might be just cleared at the end of each release cycle.

What are the benefits of sandboxing?

The first obvious advantage is that you can have shorter release cycles (ten times a day if you really need that), to actually test your website in realistic conditions. If you're not convinced just look how the full the ASP.Net forums are from messages with title like "Everything was fine until I published my website!"

The second, maybe less obvious advantage, is that you (and your testers) have no restriction in your testing operations. Just go and try to corrupt the sandbox data by exploiting some potential bug, what do you risk? Not much. Check what's happen if add some wrongly encoded data in you website new publishing system. Etc ... The sandbox let you perform all the required testing operations without interfering with your visitors on the main website.

The third, certainly obscure, advantage is that if you do not have a sandbox, other people will use your primary website as a sandbox. This is especially true if you are exposing any kind of web interface (GET, POST, SOAP, XML-RPC, whatever) because people will use your website to debug their own code.

Connecting all sandboxes together

Some webmasters might hesitate in letting their sandbox worldwide accessible. Personnally, unless having a very good reaon I would strongly advise to do so (see third advantage, here above). What do you have to lose? Expose your bugs? That's precisely the purpose of the sandbox anyway. Moreover many professional websites already have their own public sandboxes.

For example, PeopleWords.com (online translation services) has links toward PayPal.com whereas sandbox.peoplewords.com relies on sandbox.paypal.com.

You can design your website in such a manner than your sandbox hyperlinks other sandboxes. Also the notion of sandboxing is not restricted to web pages, but includes web services too.

ASP tips

  • The only difference between you real website and your sandbox is the content of the web.config file. If your website and sandbox differs by more than their configuration files, you should maybe consider refactoring your website because it means that your deployement relies on error-prone operations.

  • Dupplicate you website logo into mylogo.png and mylogo-sandbox.png and include a LogoPath key in your web.config file to reference the image. The mylogo-sandbox.png image must include a very visible sandbox statement. By using distinct logos, you can later avoid possible confusions between the sandbox and the website.

  • By convention, the sandbox is usually located into sandbox.mydomain.com or www.sandbox.mydomain.com.

  • Do not forget to replicate the databases (but without including the actual content). You should not rely on the primary website database.
Thursday
Mar022006

A translator guide to website translation

Since the publication of this post, I have released Resx Editor a free visual resource editor dedicated to translation works.

In this post, I give a short introduction about website translation. The targeted audience is non-technical translators. I will focus on the particular case of website translation when relying on Microsoft XML Resource files.

The big picture

Dynamic websites include many things beside pure textual content (programming source code, images, stylesheets, ...). In order to simplify the job of the translators, all the textual content can be isolated into resource files. The main idea behind resource files is to replace every textual item of the website by a resource idenfier. Intuitively, instead of having a webpage containing the text Hello World!, you have reference HelloWorld and multiple resource files. The English resource file contains HelloWorld="Hello World!", the French resource file contains HelloWorld="Bonjour tout le monde!", etc. By choosing the right resource file, the website appears in the corresponding language.

Basic concepts

  • identifier: a unique key associated to a textual item.

  • (localized) resource: the expression (the content) of a textual item expressed in a particular language.

  • (localized) resource file: a file containing a list of pairs identifier+resource.

Microsoft XML Resource Files

It exists many resource file formats, but I going to discuss the Microsoft XML Resource file format (RESX in short). This resource file is a XML format. Without digging into XML standard, it simply means that the content of the file look like

<?xml version="1.0" encoding="utf-8"?>
<root>
<data name="HelloWord" >
<value>Hello World!</value>
</data>
</root>

As you can see, the identifier is specified through a XML attribute (that's the terminology for the syntax somekey="MyKeyHere"). The resource is specified with a <value>My resource here</value>. Resource files are much more structured than classical, human readeable documents. Indeed, the webserver needs to be able to perform an exact matching between identifiers and the associated resources. Therefore, as a translator, you will have to be very careful when editing a resource file. You should not touch the XML markup, otherwise the resource file won't be readeable any more by the webserver. The only section that you can modify is what lies between the <value /> tags.

A more complete sample of RESX file:

<?xml version="1.0" encoding="utf-8"?>
<root>
<data name="HelloWord" >
<value>Hello World!</value>
</data>
</root>
<data name="GoodBye" >
<value>Goodbye!</value>
</data>
</root>
<data name="Thanks" >
<value>Thank you very much for reading this post!</value>
</data>
</root>

A bit of help from the web designers

Translating a website usually involves translating many small keywords like to, at, by, new, view. Those short English words are quite ambiguous. In order to simplify the translator life, a good website designer will include some additional indications within the resource file to facilitate the translation work. For this purpose, the RESX format includes an optional <comment /> tag. The previous XML sample can be modified in order to include a comment.

<?xml version="1.0" encoding="utf-8"?>
<root>
<data name="HelloWord" >
<value>Hello World!</value>
</data>
<comment>Don't forget to include the punctuation.</comment>
</root>

Do not translate those comments, you will be wasting your time. Those comments have just been included to make your life easier. Those comments are totally ignored by the webserver, their content will never appear on the website.

A bit of help from Notepad++

XML files are just plain text files (as opposed to rich text files such as Microsoft Word), yet due to the very sensitive nature of the XML markup (deleting a single > breaks the XML structure), you should better rely on dedicated tools to edit/modify RESX files. My personal suggestion is to use Notepad++, a very robust text editor that can handle XML files. Notepad++ is open source (you can download it and use it for free, even for commercial purposes).

Tip: Notepad++ does not immediately recognize RESX files as XML files. When you open a RESX file with Notepad++ go to Language→XML to select XML as the file language. You will benefit of a much cleaner view of the RESX file.

Top translation mistakes

Website translation is a job of precision. I am listing below a few probable errors that the unaware website translator might commit.

  • Spacing: "bonjour" is not the same as " bonjour" (notice the initial space).

  • Capitalization: "Delete" is not the same as "delete".

  • Punctuation: "Terminated." is not the same as "Terminated" (dummy parenthesis to keep the dot away).

  • HTML markup (caution, tricky): RESX file can contain HTML markup, but the symbols < and > are going to be encoded. The sign '<' (resp. '>') with appear encoded as '<' (resp. '>'). Do not touch the encoded HTML markup.

  • Weird symbols (tricky again): typically if you encounter something like Dear M. {0} the {0} is a substitute, (in present case, it's certainly a substitute for a user name). Do not touch any substitute.
Sunday
Feb052006

Refactoring and logistics ("L'intendance suivra!")

With Eclipse and VS2005, refactoring is now a standard feature of modern IDEs. No more than few minutes are now sufficient to drastically change the internal structure of a software library. Yet, if software logistics cannot keep the pace then productivity bottlenecks of software evolution remain unchanged. De Gaulle said L'intendance suivra! (which could be poorly translated by "Logistics always keep up!"). Yet many european wars have been lost due to poor logistics, and, back to the discussion, I believe that logistics is no less important in software matters than it is in wars.

By software logistics I mean all processes required to keep the whole thing running when changing the structure of a component; and, in particular, the issues related to upstream and downstream dependencies (deployement issues are left to a later post).

Upstream dependencies include all the components that you are relying on. Well, changing your code does not impact the components you are relying on, right? Wrong. Just consider serialization as a simple counter-example (your code evolves and let all existing data unreadable). Upstream issues are not serialization-specific, the very same problem exists with structured database storage (yes, versioning SQL queries is a pain too). More the generally, I am not aware of any refactoring tool providing any support to tackle data persistence issues. Although version tolerance features do exists in .Net 2.0 (but the framework is still lacking very simple features such as field renaming). I am really looking forward the tools that will (one day) provide persistence-aware refactoring capabilities.

Downstream dependencies include all the components that rely on your's. It's clear that any change you do (at least for any publicly exposed method/object) can break the downstream dependencies. The don't break anything approach is a killing approach in terms of software evolution (the most interesting case being don't fix bugs, our customers rely on them). But on the other hand, assuming that L'intendance suivra! ("we don't care, RTFM!") is not realistic. IMO, the present best practice is to provide an awful lot of documentation to facilitate the upgrade (see Upgrading WordPress for a good example). Yet providing such documentation is expensive and the ROI is low (because contrary to the code that has been improved, such documentation will not be leveraged in future developments). The solution, IMO again, lies in better refactoring tools that include some downstream-awareness. It would be possible to generate some "API signature patch" (the generation being a process as automated as possible) that could be applied by the client on its own code. Well, I am also really looking forward for such kind of tools.

Page 1 ... 3 4 5 6 7