Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in software (13)

Saturday
Sep232006

Antipatterns of Software engineering courses

I am very honored to be in charge of the Sofware engineering and distributed applications course at the Ecole normale superieure (ENS). This will be my first official teaching assignment, and I will be affected to brilliant Licence 3 students (it's pretty tough to get through the entrance exam of the ENS).

Software engineering is a difficult topic to teach. I have been browsing the web just to have an outlook at what people are usually doing in a software engineering course, and I must confess that I haven't found anything very satisfying so far (although the MIT experience is definitively worth reading). In a nutshell, I would say that software engineering is the art of producing great software with limited resources. In respect to this definition, it seems that there are many ways to spend hours teaching things that totally miss the point. I have listed in this post what I think to be the 3 most common ways of disgusting students from all software engineering matters.

Coding antipattern

Here is the 300-pages API. This API will be the subject of your exam. Documents will not be authorized.

The second fastest way on earth to get your 30h of teaching ready is simply to teach some particular programming language and its associated technologies (think Apache/PHP/MySql, stay tuned the fastest way will be detailed next). Just take any reference documentation, and you get enough material to talk for 30h. But your course is both deadly boring and super-short-lived. Some people assume that their students have no prior knowledge about any language. I won't (it's stated in the course-prerequisite btw). My objective is not to teach the syntax of any programming language, because I believe students are smart enough to learn that by themselves. If not then, I would say that it was a bad move to take the software engineering course in the first place.

ISO-certified development antipattern

If you don't assess your customer requirements through a 17-phase process, then you're not ISO-171717.

Just by looking at Learning Tree - Software Engineering course (one of the top result of Google for "software engineering"), I think Wow, those people are attempting to kill their attendants with boredom. Look at the table of content: Life cycle phase contains (sic) 1) Understanding the problem 2) Developing the solution 3) Verifying the product 4) Maintaining the system. I can't remember of any software project that ever happened that way but who cares because the content is going to be nothing more than obvious statements anyway.

I call this kind of teaching ISO-certified development, the worst part being certainly the enumeration of software life-cycle models (which are, according to Learning Tree: Waterfall, V, Phased, Evolutionary and Spiral). This approach is basically the extreme opposite of the coding course. You're not teaching anything technical, instead you end-up with long, super-detailed, over-boring descriptions of business practices that do not even exists as such in the real world anyway.

Green-field projects antipattern

Our project was to develop a video game. Alice wrote the scenario, Bob took care of the rules and I did the graphics. The rest has been left unfinished.

The fastest way on earth to get your 30h software teaching course ready is the green-field project: just say Decide among yourselves what you're gonna do, I will be the judge of the software you produced and then, spend the next 30h doing groupwork (groupwork is like teamwork but instead of having a team, you have a random bunch of people, i.e. a group). Don't take me wrong, I think that projects do have a huge pedagogic value, yet the probability of re-discovering good software engineering ideas just by doing random groupwork is low.

There are also other criticisms to the green-field projects as usually practiced. The first poor practice is to let the students come up with their own project ideas. For various reasons, students have a tendency to favor projects that are quite ill-adapted (such as video games); then it's really hard to scale the project in such a way that it matches the time to be invested in the course. The second poor practice is to let the students come up with their own internal organization. I have never encountered any company where all employees are equal, I do not see why it should be the case in a student software project (more on the subject in a subsequent post).

Monday
Mar062006

Best practice for website design, sandboxing with ASP.Net

Why should I care?

The web makes application deployment easier, but there is no magic web-effect that would prevent web designers of commiting the very same mistakes that regular developers commit while designing classical applications. In order to minimize the risks, I have found the notion of website sandboxing as a must-have for web designers.

What is sandboxing?

A sandbox is a place full of sand where children cannot cause any harm even if they intend to.

Replace children by visitors or testers and you have a pretty accurate description of what is website sandboxing. More technically, a website sandbox is simply a copy of your web application. A sandbox differs from a mirror from a data management viewpoint. The sandbox databases are just dummy replications of the running databases. The sandbox holds absolutely no sensitive data. Moreover, the sandbox databases might be just cleared at the end of each release cycle.

What are the benefits of sandboxing?

The first obvious advantage is that you can have shorter release cycles (ten times a day if you really need that), to actually test your website in realistic conditions. If you're not convinced just look how the full the ASP.Net forums are from messages with title like "Everything was fine until I published my website!"

The second, maybe less obvious advantage, is that you (and your testers) have no restriction in your testing operations. Just go and try to corrupt the sandbox data by exploiting some potential bug, what do you risk? Not much. Check what's happen if add some wrongly encoded data in you website new publishing system. Etc ... The sandbox let you perform all the required testing operations without interfering with your visitors on the main website.

The third, certainly obscure, advantage is that if you do not have a sandbox, other people will use your primary website as a sandbox. This is especially true if you are exposing any kind of web interface (GET, POST, SOAP, XML-RPC, whatever) because people will use your website to debug their own code.

Connecting all sandboxes together

Some webmasters might hesitate in letting their sandbox worldwide accessible. Personnally, unless having a very good reaon I would strongly advise to do so (see third advantage, here above). What do you have to lose? Expose your bugs? That's precisely the purpose of the sandbox anyway. Moreover many professional websites already have their own public sandboxes.

For example, PeopleWords.com (online translation services) has links toward PayPal.com whereas sandbox.peoplewords.com relies on sandbox.paypal.com.

You can design your website in such a manner than your sandbox hyperlinks other sandboxes. Also the notion of sandboxing is not restricted to web pages, but includes web services too.

ASP tips

  • The only difference between you real website and your sandbox is the content of the web.config file. If your website and sandbox differs by more than their configuration files, you should maybe consider refactoring your website because it means that your deployement relies on error-prone operations.

  • Dupplicate you website logo into mylogo.png and mylogo-sandbox.png and include a LogoPath key in your web.config file to reference the image. The mylogo-sandbox.png image must include a very visible sandbox statement. By using distinct logos, you can later avoid possible confusions between the sandbox and the website.

  • By convention, the sandbox is usually located into sandbox.mydomain.com or www.sandbox.mydomain.com.

  • Do not forget to replicate the databases (but without including the actual content). You should not rely on the primary website database.
Friday
Feb102006

When numerical precision can hurt you

The objective was to cure a very deadly disease and the drug was tested on mice. The results were impressive since 33% of the mice survived while only 33% died (the last mouse escaped and its outcome was unknown).

Numerical precision depends on the underlying number type. In .Net, there are 3 choices float (32bits), double (64bits) and decimal (128bits). Performance left aside, more precision cannot hurt, right?

My answer is It depends. If the only purpose of your number is to be processed by a machine, then fine, more precision never hurts. But what if a user is supposed to read that number? I did actually encounter this issue while working on a project of mine Re-Dox, reduced design of experiments (an online analytical software). In terms of usability, provide the maximal numerical precision to the user is definitively a very poor idea. Does adding twelve digits to the result of 10/3 = 3.333333333333 makes it more readeable? definitively not.

A very insteresting issue while design analytical software (i.e. software performing some kind of data analysis) is to choose the right number of digits. Smart rounding can be defined as an approach that seeks to provide all significant, but only significant, digits to the user. Although, the notion of "significant" digits is very dependant of the context and carries a lot of uncertainties. Therefore, for the software designer, smart rounding is more likely to be a tradeoff between usability and user requirements.

Providing general rules for smart rounding is hard. But here are the two heuristics that I am using. Both of them rely on user inputs to define the level of precision required. Key insight: since it's usually not possible to know the accuracy requirements beforehand, the only reliable source of information is the actual user inputs.

Heuristic 1 = the number of digits in your outputs must not exceed the number of digits of user input by more than 1 or 2. Ex: If the user input 0.123 then provides a 4 or 5 digits rounding. Caution, do not take the user inputs "as such", because they can include a lot of dummy digits (ex: the user can cut and past values that look like 10.0000, where the digits is zero and implicitely not significant). The underlying idea is "no algorithm ever creates any information, an algorithm only transform the information".

Heuristic 2 = increase the number of digits of the heuristic 1 by a number equal to CeillingOf(log10(N)/2) where N is the number of data inputs. Actually, this formula is simply an interpretation of the Central Limit Theorem (Wikipedia) for the purpose of smart-rounding. Why the need for such bizarre heuristic? The underlying idea is slightly more complicated here. Basically, no matter how you combine the data inputs, the rate of accuracy improvement is bounded. The bound provided here corresponds (somehow) to an "optimistic" approach where the accuracy increase at the maximal possible speed.

Page 1 2 3