Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in dotnet (15)

Friday
Sep222006

From RAD to test driven ASP.NET website

Both unit testing and R.A.D. (Rapid Application Development) impacted quite deeply my insights over software development. Yet, I have found that combining those two approaches within a single ASP.NET project is not that easy especially if you want to keep the best of both worlds. There are at least א (alef zero) methods to get the problem solved. Since my blog host does not provide yet that much disk storage, I will only describe here 2.5 design patterns that I have found to be useful while developing ASP.NET websites.

  • R.A.D: all your 3-tier layers (presentation, business logic and data access) get compacted into a single ASPX page. ASP.Net 2.0 makes this approach very practical thanks to a combination of GridView, DetailsView and SqlDataSource.

  • CRUD layer: The business logic and the data access are still kept together but removed from the ASPX page itself. This design pattern (detailed below) replaces the SqlDataSource by a combination of ObjectDataSource and library classes.

  • Full blown 3-tier: Like, above but the business logic and the data access gets separated. Compared to the CRUD layer approach, you get extra classes reflecting the database objects.

R.A.D.


Using combinations of GridView, DetailsView and SqlDataSource, you can fit your 3-layers a single ASPX page, the business logic being implemented in code-behind whenever required. This approach does not enable unit-testing but if your project is fairly simple, then R.A.D. works with a dramatic productivity. For example, PeopleWords.com has been completely developed through R.A.D (with a single method added to the Global.asax file that logs all thrown exceptions). I would maybe not defend R.A.D. for large/complex projects, but I do think its very well suited for small projects and/or drafting more complicated ones.

The forces behind R.A.D. :

  • (+) Super-fast feature delivery: a whole website can designed very quickly.

  • (+) The code is very readable (not kidding!): having your 3-layers in the same place makes the interactions quite simple to follow.

  • (+) Facilitate the trial & error web page design process: It's hard to get a web page very usable at once, R.A.D. let you re-organize web page very easily.

  • (-) If the same SQL tables get reused between ASPX pages, then data access code tends to be replicated each time (idem for business logic).

  • (-) No simple way to perform unit testing.

  • (-) No structured way to document your code.

"CRUD layer" design pattern


The main purpose of the "CRUD layer" is to remove the business logic and the data access from the ASPX page to push it into an isolated .Net library. Yet, there are several major constraints associated to this process. The first constraint is to ensure an easy migration from R.A.D. to "CRUD Layer". Indeed, R.A.D. is usually the best prototyping solution. Therefore the main issue is not to implement from scratch but to improve the quality of the R.A.D. draft implementation. The second constraint is to maintain business logic and data access design as plain as possible (verifying the YAGNI principle among others). The CRUD layer is an attempt to address those two issues.

The CRUD layer consists in implementing a class

public class FooCrudLayer
{
public FooCrudLayer() { ... } // empty constructor must exists
public DataTable GetAllBar( ... ) // SELECT method
{
DataTable table = new DataTable();

using (SqlConnection conn = new SqlConnection( ... ))
{
conn.Open();
SqlCommand command = new SqlCommand( ... , conn);

using (SqlDataReader reader = command.ExecuteReader())
{
table.Load(reader);
}
}

return table;
}

public void Insert( ... ) { ... } // INSERT method
public void Delete( ... ) { ... } // DELETE method
public void Update( ... ) { ... } // UPDATE method
}

Notice that the select method returns a DataTable whereas most ObjectDataSource tutorials would return a strongly-typed collection List<Bar>. The presented approach has two benefits. First, the migration from the SqlDataSource is totally transparent (including the sorting and paging capabilities of the GridView).

<ObjectDataSource Id="FooSource" runat="server"
TypeName="FooCrudLayer"
SelectMethod="GetAllBar"
InsertMethod="Insert"
DeleteMethod="Delete"
UpdateMethod="Update" >

... (parameters definitions)

</ObjectDataSource>

Second, database objects are not replicated in the .Net code. Indeed, replicating the database objects at the .Net level involves some design overhead because both sides (DB & .Net) must be kept synchronized.

The forces behind the CRUD layers

  • (+) Very small design overhead compared to R.A.D.

  • (+) Easy migration from R.A.D.

  • (+) Unit testable and documentable.

  • (-) No dedicated abstraction for business logic.

  • (-) Limited design scalability (because it's not always possible to avoid code replication).

Full blown 3-tiers


I won't say much about this last option (this post is already far too long), but basically, the 3-tier design involves to strong type the database objects in order to isolate business logic from data access. This approach comes as a large overhead compared from the original RAD approach. Yet, if your business logic gets complex (for example workflow-like processes) then there is no escape, layers have to be separated.

Sunday
Sep172006

Don't booby trap your ASP.Net Session state

The ASP.Net Session state is a pretty nice way to store (limited) amount of data on the server-side. I am not going to introduce the ASP.Net session itself in this post; you can refer to the previous link if you have no idea what I am talking about.

Although many articles can be found over the web arguing that the major issue with the session state is scalability, don't believe them! Well, as long that you're not doing crazy things like storing your whole DB in the session state which is not too hard to avoid. Additionnally, 3rd party providers (see ScaleOut for example) seem to offer pretty much plug&play solutions if you're in desperate need for more session space (disclaimer: I haven't tested myself any of those products).

In my (limited) experience, I have found that the main issue with the Session State is the lack of strong typing. Indeed, lack of strong-typing has a lot of painful consequences such as

  • Hard collisions: The same session key gets used to store 2 distinct objects of different .Net types. This situation is not too hard to debug because the web application will crash in a sufficiently brutal way to be quickly detected.

  • Smooth session collisions: The same session key gets used to store 2 distinct objects of the same type. The situation is really worse than the previous case because, if you're unlucky, your web app will not crash. Instead your users will just experience very weird app behaviors from time to time. The multi-tab browsing users will be among the first victims.

  • No documentation: static field get documented, right? There is no good reason to discard Session State documentation.

  • No refactoring: Errare Humanum Est, session keys do get poorly or inconsistently named like any other class/method/whatever. No automated refactoring means that the manual refactoring attempts will introduce bugs with a high probability.

A simple approach to solve most of those issues consists of strong typing your sessions keys. This can be easily achieved with the following pattern:

partial class MyPage : System.Web.UI.Page
{
// Ns = the namespace, SKey = SessionKey
const string Foo1SKey = "Ns.MyPage.Foo1";
const string Foo2SKey = "Ns.MyPage.Foo2";
....
}

Instead of explicitly typing the session key, the const string fields get used instead. If you need to share session keys between several pages then a dedicated class (gathering the session keys) can be used following the same lines. This pattern basically solves all the issues evocated here above.

  • Collisions get avoided because of the combination of namespace and class prefixes in the session key definitions (*).

  • Documentation is straightforward, just add a <summary/> documentation tag to the const string definition.

  • Refactoring is easy, just refactor the field or change its value (depending on the refactoring intent).

(*) It would have been possible to prefix directly in the code all session keys by the very same combination of namespace and class name, but it becomes a real pain if you start using the session frequently in your code.

Monday
Mar062006

Best practice for website design, sandboxing with ASP.Net

Why should I care?

The web makes application deployment easier, but there is no magic web-effect that would prevent web designers of commiting the very same mistakes that regular developers commit while designing classical applications. In order to minimize the risks, I have found the notion of website sandboxing as a must-have for web designers.

What is sandboxing?

A sandbox is a place full of sand where children cannot cause any harm even if they intend to.

Replace children by visitors or testers and you have a pretty accurate description of what is website sandboxing. More technically, a website sandbox is simply a copy of your web application. A sandbox differs from a mirror from a data management viewpoint. The sandbox databases are just dummy replications of the running databases. The sandbox holds absolutely no sensitive data. Moreover, the sandbox databases might be just cleared at the end of each release cycle.

What are the benefits of sandboxing?

The first obvious advantage is that you can have shorter release cycles (ten times a day if you really need that), to actually test your website in realistic conditions. If you're not convinced just look how the full the ASP.Net forums are from messages with title like "Everything was fine until I published my website!"

The second, maybe less obvious advantage, is that you (and your testers) have no restriction in your testing operations. Just go and try to corrupt the sandbox data by exploiting some potential bug, what do you risk? Not much. Check what's happen if add some wrongly encoded data in you website new publishing system. Etc ... The sandbox let you perform all the required testing operations without interfering with your visitors on the main website.

The third, certainly obscure, advantage is that if you do not have a sandbox, other people will use your primary website as a sandbox. This is especially true if you are exposing any kind of web interface (GET, POST, SOAP, XML-RPC, whatever) because people will use your website to debug their own code.

Connecting all sandboxes together

Some webmasters might hesitate in letting their sandbox worldwide accessible. Personnally, unless having a very good reaon I would strongly advise to do so (see third advantage, here above). What do you have to lose? Expose your bugs? That's precisely the purpose of the sandbox anyway. Moreover many professional websites already have their own public sandboxes.

For example, PeopleWords.com (online translation services) has links toward PayPal.com whereas sandbox.peoplewords.com relies on sandbox.paypal.com.

You can design your website in such a manner than your sandbox hyperlinks other sandboxes. Also the notion of sandboxing is not restricted to web pages, but includes web services too.

ASP tips

  • The only difference between you real website and your sandbox is the content of the web.config file. If your website and sandbox differs by more than their configuration files, you should maybe consider refactoring your website because it means that your deployement relies on error-prone operations.

  • Dupplicate you website logo into mylogo.png and mylogo-sandbox.png and include a LogoPath key in your web.config file to reference the image. The mylogo-sandbox.png image must include a very visible sandbox statement. By using distinct logos, you can later avoid possible confusions between the sandbox and the website.

  • By convention, the sandbox is usually located into sandbox.mydomain.com or www.sandbox.mydomain.com.

  • Do not forget to replicate the databases (but without including the actual content). You should not rely on the primary website database.
Sunday
Feb052006

Refactoring and logistics ("L'intendance suivra!")

With Eclipse and VS2005, refactoring is now a standard feature of modern IDEs. No more than few minutes are now sufficient to drastically change the internal structure of a software library. Yet, if software logistics cannot keep the pace then productivity bottlenecks of software evolution remain unchanged. De Gaulle said L'intendance suivra! (which could be poorly translated by "Logistics always keep up!"). Yet many european wars have been lost due to poor logistics, and, back to the discussion, I believe that logistics is no less important in software matters than it is in wars.

By software logistics I mean all processes required to keep the whole thing running when changing the structure of a component; and, in particular, the issues related to upstream and downstream dependencies (deployement issues are left to a later post).

Upstream dependencies include all the components that you are relying on. Well, changing your code does not impact the components you are relying on, right? Wrong. Just consider serialization as a simple counter-example (your code evolves and let all existing data unreadable). Upstream issues are not serialization-specific, the very same problem exists with structured database storage (yes, versioning SQL queries is a pain too). More the generally, I am not aware of any refactoring tool providing any support to tackle data persistence issues. Although version tolerance features do exists in .Net 2.0 (but the framework is still lacking very simple features such as field renaming). I am really looking forward the tools that will (one day) provide persistence-aware refactoring capabilities.

Downstream dependencies include all the components that rely on your's. It's clear that any change you do (at least for any publicly exposed method/object) can break the downstream dependencies. The don't break anything approach is a killing approach in terms of software evolution (the most interesting case being don't fix bugs, our customers rely on them). But on the other hand, assuming that L'intendance suivra! ("we don't care, RTFM!") is not realistic. IMO, the present best practice is to provide an awful lot of documentation to facilitate the upgrade (see Upgrading WordPress for a good example). Yet providing such documentation is expensive and the ROI is low (because contrary to the code that has been improved, such documentation will not be leveraged in future developments). The solution, IMO again, lies in better refactoring tools that include some downstream-awareness. It would be possible to generate some "API signature patch" (the generation being a process as automated as possible) that could be applied by the client on its own code. Well, I am also really looking forward for such kind of tools.

Saturday
Jan072006

Hungarian notation and thread safety

Joel Spolsky had a very good Making Wrong Code Look Wrong article where he rehabilitates the hungarian notation for certain dedicated purposes (tips: no, hungarian is not about junking your code with variable prefix such as string, int and array). Joel Spolsky presents the idea of prefixing unsafe (i.e. user provided) strings in the context of web-based application with us, standing for unsafe string. Such practice makes a lot of sense in situations where things have to be right by design (security is a typical example because no security holes are going to pop-up against typical non-hostile users).

Thread safety (TS) issues

Beside security, thread safety, for multithreaded applications, is another area that has to be right by design otherwise you are going to hit Heisenbugs. In .Net/C#, the most simple way to deal with thread safety is to rely on lock statements (some people would object that locking is unsafe by design and that transactions/agents/whatever should be used instead, that might be true but this is certainly beyond the scope of this post). Actually, I have found that designing multithreaded applications is not that hard but it requires some strong methodology to avoid things to get wrong by design.

TS level 1 : dedicated locks

A very common beginner mistake at multithreading to re-use some random object to ensure the locking scheme, worst of all, using the this statement.

public class Bar
{
public void FooThreadSafe()
{
lock(this) { } // bravo, you just shoot your feet
}
}

Although, it may looks like a 12bytes overhead, a lock requires a dedicated object field in the class.

public class Bar
{
private object fooLock = new Lock();
public void FooThreadSafe()
{
lock(fooLock) { } // much better
}
}

The majar advantage of the dedicated lock instance approach is documentation. The purpose of dedicated object documentation is to explain which objects should be protected and when (Intellisense-like IDE feature makes this approach even more practical). A secondary advantage is encapsulation. The library end-user (i.e. the intern next door) is not aware of the "dedicated lock instance" stuff and might re-use whatever lockYXZ variable available for his own dark purposes. Don't offer him the chance to do that, mark the field as private.

TS level 2 : hungarian locks

Beside unprotected (concurrent) accesses, the second most important issue in multithreaded applications is deadlocks. Against deadlocks, there is only one known solution (to my knowledge) : complete and permanent ordering of the locks. In other words, locks should always be taken in same order. You might think, "Ok, let's just add a line in the dedicated lock object documentation to specify that." Well, I have tried, it just does not work. Because, over time, your application evolves, the intern next-door don't care about your documentation, new locks are added to fit some other requirements and the initial complete ordering is no more (although it should, but developpers are just so-not-failproof humans). The problem with those deadlocks is that you have no way to detect them just by looking at a piece of code. You have to look at the whole application code to ensure consistency. Here comes the hungarian rescue : the hungarian dedicated lock naming convention.

The locks should be taken in the order specified by the hungarian prefixes.

Let's look at a small example, if you have two dedicated lock instances called lock1Bar and lock2Foo (lockX being the hungarian prefix), then you know that lock1Bar should be taken before lock2Foo. Any exception to this rule is a guaranteed-or-reimbursed deadlock. Additionally, refactoring tools make it so easy to rename all your variables as many time as you need that there is really no practical obstacle to implement such policy.

Page 1 2 3