Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Thursday
Oct122006

## Finally, I am going to be a theology teacher

According to Jeff Atwood (Coding Horror), software development is a religion. Wow, frankly I hope not, because I am quite ignorant when it comes to theological matters. A major issue with human/social studies and practices is that it is so hard to get close to anything that would be considered as scientific knowledge and not pure erratic opinions. Yet, it's not because it's hard to produce science that it isn't worth trying.

The recipe of scientific knowledge production includes a very fundamental ingredient: reproducibility. If somebody else can't, providing enough time and efforts, reproduce what you have observed then there is no hope to produce any science. For software development, the current situation is quite troublesome. Indeed, the pace of change in software development is totally dwarfing the speed of scientific production. By the time, a scientist would be able to produce a rigorous scientific analysis of software development (5 to 10 years), the software world itself would have changed so much that the study, at the time it gets published, would be already totally obsolete.

The second factor that complicates the production of knowledge is that software engineering actors do have very strong interests to defend biased positions. Science involves publishing both positive but also negative results (see the Journal of Negative Results - Ecology and Evolutionary Biology for example). Do you think that Microsoft, Google or [insert here your favorite IT company] are really going to publish anytime soon reports explaining how and why they failed miserably while developing some particular products? From a marketing viewpoint, you need to appear brilliant no matter how bad (or good) things might be in the office.

Tuesday
Oct032006

## The unteachable parts of software engineering

Having the responsibility to handle the software engineering course at the ENS in spring 2007, I have started to think about the desirable qualities that make the difference between an average developer and a brilliant one. Indeed, I can't think of a better goal for this course but to actually try to develop such qualities.

### What makes a "good" software developer?

It's an obvious fact that many qualities and skills are required to make a good developer. Smart and Get things done are often cited as the top criterions to decide whether a candidate should be recruited or not.

... a passionate curiosity for software related matter ...

I would complete those two criterions with a third one that I consider to be no less important: a passionate curiosity for software-related matters. Most well-known antipatterns such as programming by permutation, golden hammering, re-inventing the wheel, ... are caused by a lack of curiosity. There is far too much to know of the subject of software engineering to trust any particular school diploma (or certification) to be sufficient to produce even a "passable" developer.

Additionally, the software world is fast-paced. Hardware and software get obsolete alike. Development methods evolve. Better tools are released continuously. The sheer (intellectual) complexity of those evolutions is beyond what a single individual can possibly handle. Curiosity when leveraged through teamwork is a strong driver to actually maintain the development practices and tools as close as possible to a state-of-the-art level.

Finally, on the long run, I do not see how it is possible to stay motivated (and therefore productive) if there is no eager interest in mastering this constant flow of evolutions.

### How to train such a developer?

I have listed smart, get things done and passionate curiosity as being the top qualities for a software engineer. But this list is more than troublesome for a teacher: it's not even clear whether any of those qualities can be actually taught

Concerning the first criterion, i.e. smart, I have simply surrendered all "academic" ambitions. The course schedule is far too short; beside I have the chance of having ENS students who are already very smart (though entrance exams have already filtered out students who were not so smart). Yet, the course can them the opportunity to apply their intelligence to a large variety of fuzzy problems that are inherent to software development.

The second criterion get things done is probably the most actionable element of the three. The French education system (highly selective and highly individualistic) usually produces people that are relatively weak against this criterion, the ENS students being no exception (quite the opposite in fact). I have planned to incorporate a student software project within the course mostly to push student develop a get things done mentality .

As for the first criterion, I haven't much ambition to actually transmit a passionate curiosity the students. In this respect, my first ambition will be to avoid the issue that usually plagues software engineering courses: an overwhelming boredom. I do not think it is really possible to create curiosity ex-nihilo if people have no interest beforehand. But, assuming that students are at least somehow interested (well, if not, it's going to be tough for the students and me alike), an "expand your horizons" strategy for the course might give them materials to apply and develop their curiosities.

There are two kinds of students: those who are too weak to be taught anything and those who are so strong that the teacher is totally unnecessary.

Bottom-line: out of three main qualities to make a good developer, the ambitions associated with the software engineering course I am thinking of are pretty weak. Well, considering the importance of the subject, I still believe it's a worthy attempt (stay tuned, more on later posts ... )

Saturday
Sep232006

## Antipatterns of Software engineering courses

I am very honored to be in charge of the Sofware engineering and distributed applications course at the Ecole normale superieure (ENS). This will be my first official teaching assignment, and I will be affected to brilliant Licence 3 students (it's pretty tough to get through the entrance exam of the ENS).

Software engineering is a difficult topic to teach. I have been browsing the web just to have an outlook at what people are usually doing in a software engineering course, and I must confess that I haven't found anything very satisfying so far (although the MIT experience is definitively worth reading). In a nutshell, I would say that software engineering is the art of producing great software with limited resources. In respect to this definition, it seems that there are many ways to spend hours teaching things that totally miss the point. I have listed in this post what I think to be the 3 most common ways of disgusting students from all software engineering matters.

### Coding antipattern

Here is the 300-pages API. This API will be the subject of your exam. Documents will not be authorized.

The second fastest way on earth to get your 30h of teaching ready is simply to teach some particular programming language and its associated technologies (think Apache/PHP/MySql, stay tuned the fastest way will be detailed next). Just take any reference documentation, and you get enough material to talk for 30h. But your course is both deadly boring and super-short-lived. Some people assume that their students have no prior knowledge about any language. I won't (it's stated in the course-prerequisite btw). My objective is not to teach the syntax of any programming language, because I believe students are smart enough to learn that by themselves. If not then, I would say that it was a bad move to take the software engineering course in the first place.

### ISO-certified development antipattern

If you don't assess your customer requirements through a 17-phase process, then you're not ISO-171717.

Just by looking at Learning Tree - Software Engineering course (one of the top result of Google for "software engineering"), I think Wow, those people are attempting to kill their attendants with boredom. Look at the table of content: Life cycle phase contains (sic) 1) Understanding the problem 2) Developing the solution 3) Verifying the product 4) Maintaining the system. I can't remember of any software project that ever happened that way but who cares because the content is going to be nothing more than obvious statements anyway.

I call this kind of teaching ISO-certified development, the worst part being certainly the enumeration of software life-cycle models (which are, according to Learning Tree: Waterfall, V, Phased, Evolutionary and Spiral). This approach is basically the extreme opposite of the coding course. You're not teaching anything technical, instead you end-up with long, super-detailed, over-boring descriptions of business practices that do not even exists as such in the real world anyway.

### Green-field projects antipattern

Our project was to develop a video game. Alice wrote the scenario, Bob took care of the rules and I did the graphics. The rest has been left unfinished.

The fastest way on earth to get your 30h software teaching course ready is the green-field project: just say Decide among yourselves what you're gonna do, I will be the judge of the software you produced and then, spend the next 30h doing groupwork (groupwork is like teamwork but instead of having a team, you have a random bunch of people, i.e. a group). Don't take me wrong, I think that projects do have a huge pedagogic value, yet the probability of re-discovering good software engineering ideas just by doing random groupwork is low.

There are also other criticisms to the green-field projects as usually practiced. The first poor practice is to let the students come up with their own project ideas. For various reasons, students have a tendency to favor projects that are quite ill-adapted (such as video games); then it's really hard to scale the project in such a way that it matches the time to be invested in the course. The second poor practice is to let the students come up with their own internal organization. I have never encountered any company where all employees are equal, I do not see why it should be the case in a student software project (more on the subject in a subsequent post).

Monday
Mar062006

## Best practice for website design, sandboxing with ASP.Net

Why should I care?

The web makes application deployment easier, but there is no magic web-effect that would prevent web designers of commiting the very same mistakes that regular developers commit while designing classical applications. In order to minimize the risks, I have found the notion of website sandboxing as a must-have for web designers.

What is sandboxing?

A sandbox is a place full of sand where children cannot cause any harm even if they intend to.

Replace children by visitors or testers and you have a pretty accurate description of what is website sandboxing. More technically, a website sandbox is simply a copy of your web application. A sandbox differs from a mirror from a data management viewpoint. The sandbox databases are just dummy replications of the running databases. The sandbox holds absolutely no sensitive data. Moreover, the sandbox databases might be just cleared at the end of each release cycle.

What are the benefits of sandboxing?

The first obvious advantage is that you can have shorter release cycles (ten times a day if you really need that), to actually test your website in realistic conditions. If you're not convinced just look how the full the ASP.Net forums are from messages with title like "Everything was fine until I published my website!"

The second, maybe less obvious advantage, is that you (and your testers) have no restriction in your testing operations. Just go and try to corrupt the sandbox data by exploiting some potential bug, what do you risk? Not much. Check what's happen if add some wrongly encoded data in you website new publishing system. Etc ... The sandbox let you perform all the required testing operations without interfering with your visitors on the main website.

The third, certainly obscure, advantage is that if you do not have a sandbox, other people will use your primary website as a sandbox. This is especially true if you are exposing any kind of web interface (GET, POST, SOAP, XML-RPC, whatever) because people will use your website to debug their own code.

Connecting all sandboxes together

Some webmasters might hesitate in letting their sandbox worldwide accessible. Personnally, unless having a very good reaon I would strongly advise to do so (see third advantage, here above). What do you have to lose? Expose your bugs? That's precisely the purpose of the sandbox anyway. Moreover many professional websites already have their own public sandboxes.

For example, PeopleWords.com (online translation services) has links toward PayPal.com whereas sandbox.peoplewords.com relies on sandbox.paypal.com.

You can design your website in such a manner than your sandbox hyperlinks other sandboxes. Also the notion of sandboxing is not restricted to web pages, but includes web services too.

ASP tips

• The only difference between you real website and your sandbox is the content of the web.config file. If your website and sandbox differs by more than their configuration files, you should maybe consider refactoring your website because it means that your deployement relies on error-prone operations.

• Dupplicate you website logo into mylogo.png and mylogo-sandbox.png and include a LogoPath key in your web.config file to reference the image. The mylogo-sandbox.png image must include a very visible sandbox statement. By using distinct logos, you can later avoid possible confusions between the sandbox and the website.

• By convention, the sandbox is usually located into sandbox.mydomain.com or www.sandbox.mydomain.com.

• Do not forget to replicate the databases (but without including the actual content). You should not rely on the primary website database.
Friday
Feb102006

## When numerical precision can hurt you

The objective was to cure a very deadly disease and the drug was tested on mice. The results were impressive since 33% of the mice survived while only 33% died (the last mouse escaped and its outcome was unknown).

Numerical precision depends on the underlying number type. In .Net, there are 3 choices float (32bits), double (64bits) and decimal (128bits). Performance left aside, more precision cannot hurt, right?

My answer is It depends. If the only purpose of your number is to be processed by a machine, then fine, more precision never hurts. But what if a user is supposed to read that number? I did actually encounter this issue while working on a project of mine Re-Dox, reduced design of experiments (an online analytical software). In terms of usability, provide the maximal numerical precision to the user is definitively a very poor idea. Does adding twelve digits to the result of 10/3 = 3.333333333333 makes it more readeable? definitively not.

A very insteresting issue while design analytical software (i.e. software performing some kind of data analysis) is to choose the right number of digits. Smart rounding can be defined as an approach that seeks to provide all significant, but only significant, digits to the user. Although, the notion of "significant" digits is very dependant of the context and carries a lot of uncertainties. Therefore, for the software designer, smart rounding is more likely to be a tradeoff between usability and user requirements.

Providing general rules for smart rounding is hard. But here are the two heuristics that I am using. Both of them rely on user inputs to define the level of precision required. Key insight: since it's usually not possible to know the accuracy requirements beforehand, the only reliable source of information is the actual user inputs.

Heuristic 1 = the number of digits in your outputs must not exceed the number of digits of user input by more than 1 or 2. Ex: If the user input 0.123 then provides a 4 or 5 digits rounding. Caution, do not take the user inputs "as such", because they can include a lot of dummy digits (ex: the user can cut and past values that look like 10.0000, where the digits is zero and implicitely not significant). The underlying idea is "no algorithm ever creates any information, an algorithm only transform the information".

Heuristic 2 = increase the number of digits of the heuristic 1 by a number equal to CeillingOf(log10(N)/2) where N is the number of data inputs. Actually, this formula is simply an interpretation of the Central Limit Theorem (Wikipedia) for the purpose of smart-rounding. Why the need for such bizarre heuristic? The underlying idea is slightly more complicated here. Basically, no matter how you combine the data inputs, the rate of accuracy improvement is bounded. The bound provided here corresponds (somehow) to an "optimistic" approach where the accuracy increase at the maximal possible speed.

Page 1 2 3