Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta

Entries in patterns (11)

Monday
Sep142009

O/C mapper - object to cloud

When we started to port our forecasting technology toward the cloud, we decided to create a new open source project called Lokad.Cloud that would isolate all the pieces of our cloud infrastructure that weren't specific of Lokad.

The project has been initially subtitled Lokad.Cloud - .NET execution framework for Windows Azure, as the primary goal of this project was to provide some cloud equivalent of the plain old Windows Services. We did quickly end-up with QueueServices which happens to be quite handy to design horizontally scalable apps.

But more recently, the project has taken a new orientation, becoming more and more an O/C mapper (object to cloud) inspired by the terminology used by O/R mappers. When it comes to horizontal scaling, a key idea is that data and data processing cannot be considered in isolation anymore.

With classic client-server apps, persistence logic is not supposed to invade your core business logic. Yet, when your business logic happens to become so intensive that it must be distributed, you end-up in a very cloudy situation where data and data processing becomes closely coupled in order to achieve horizontal scalability.

That, being said, close coupling between data and data processing isn't doomed to be an ugly mess. We have found that obsessively object-oriented patterns applied to Blob Storage can made the code both elegant and readable.

Lokad.Cloud is entering its beta stage with the release of the 0.2.x series, check it out.

Tuesday
May192009

FIPFO - First In Probably First Out

The FIFO (First In First Out) is a very well known concept in computer science. In one of my previous post, I used the word FIPFO to refer to First In Probably First Out to refer to the cloud equivalent of the FIFO.

Indeed, the basic idea behind that term is that you can't scale much pure FIFOs due to synchronization constraints. Yet, if you just loosen a little bit the semantic, that is to say, FIPFO, then you have an infinitely scalable data structure.

Considering its simplicity and usefulness, I believe that FIFPO will be ubiquitous in future cloud computing applications.

Matthieu, a colleague at Lokad, was asking me if FIPFO was a well-known concept or yet another wacky term that I had just made-up on my blog. Well, sorry, I can't really remember. FIPFO seems just to be a very appropriate way to describe data structure such as the Windows Azure Queue, but I am not sure if I read about it elsewhere now.

According to Google, there is just a single other result at the time for FIPFO by another person who came up with this term a few days before my initial post while describing the Drupal API. Yet, I had never read the Drupal forums before, so I guess we did separately end up with the same idea and terminology.

Monday
Nov192007

Delete-proof data paging

In order to retrieve a large amount of data from a SQL table, you need to resort to a data paging scheme. Conceptually, a typical paged SQL query looks like (the syntax is approximate and vaguely inspired from MS SQL Server 2005)

SELECT Foo.Bar
FROM Foo
WHERE RowNumber() BETWEEN @Index AND @Index + @PageSize
ORDER BY Foo.Id;

The queries are made iteratively until no rows get returned any more. Yet, this approach fails both if rows are added or deleted in the table Foo during the iteration.

If rows are inserted, then the RowNumbers() will be impacted. Some rows will see their number to be incremented.

  • The newly inserted rows maybe missed.

  • Certain rows are going to be retrieved twice.

In the overall, the situation isn't bad, because, after all, all rows that were present when the retrieval iteration started will be retrieved.

In the other hand, if rows are deleted, then some rows will get their RowNumbers() decremented. As a consequence, certain rows will never get retrieved. This issue is quite troublesome, because you would not expect deleted rows to prevent the retrieval of other (valid) rows.

One workaround is to this situation would be to add some IsDeleted flag to the table (instead of actually deleting rows, flags would be changed from false to true); and to purge the database only once a while. Yet, this solution has many drawbacks: table schema must be changed and all existing queries that target this table must add the extra condition IsDeleted = false.

A more subtle approach to the problem consists in changing the definition of the iterating index. Instead of using an index that correspond to a generate RowNumbers(), let directly use @IndexId, the greatest identifier retrieved so far. Thus, the paged query becomes

SELECT TOP @PageSize
FROM Foo
WHERE Foo.Id > @IndexId
ORDER BY Foo.Id

With this new approach, we are now certain that no rows will be skipped during the retrieval process. Plus, we haven't made any change to the table schema.

Tuesday
Jun122007

Few traps for former C++ developers coding in C#

I have been proofreading a couple of time C# source code written by former C++ developers. Also C# and C++ have a lot in common at the syntax level, the conceptual framework underlying the two languages is really different. This post is a negligible attempt to point out the most common issues.

C# is garbage collected. You should not need any destructor. If you are using destructors, then it's probably wrong 99% of the time. If you have doubts, ignore destructors, it's the way to go with garbage collected languages.

Pre-processor statements are deprecated. Although C# does have statements like #if #else #endif, the pre-processor statements are de-facto a deprecated construct in C#. Like destructors, they are some very specific cases where those statements can be used. Yet, 99% of the time, using those statements would just be a design mistake.

C# comes with native guidelines. Microsoft has published very extensive guidelines about variable, method, class naming. Every single C# line you write is expected to be compliant with those guidelines.

Friday
Sep222006

From RAD to test driven ASP.NET website

Both unit testing and R.A.D. (Rapid Application Development) impacted quite deeply my insights over software development. Yet, I have found that combining those two approaches within a single ASP.NET project is not that easy especially if you want to keep the best of both worlds. There are at least א (alef zero) methods to get the problem solved. Since my blog host does not provide yet that much disk storage, I will only describe here 2.5 design patterns that I have found to be useful while developing ASP.NET websites.

  • R.A.D: all your 3-tier layers (presentation, business logic and data access) get compacted into a single ASPX page. ASP.Net 2.0 makes this approach very practical thanks to a combination of GridView, DetailsView and SqlDataSource.

  • CRUD layer: The business logic and the data access are still kept together but removed from the ASPX page itself. This design pattern (detailed below) replaces the SqlDataSource by a combination of ObjectDataSource and library classes.

  • Full blown 3-tier: Like, above but the business logic and the data access gets separated. Compared to the CRUD layer approach, you get extra classes reflecting the database objects.

R.A.D.


Using combinations of GridView, DetailsView and SqlDataSource, you can fit your 3-layers a single ASPX page, the business logic being implemented in code-behind whenever required. This approach does not enable unit-testing but if your project is fairly simple, then R.A.D. works with a dramatic productivity. For example, PeopleWords.com has been completely developed through R.A.D (with a single method added to the Global.asax file that logs all thrown exceptions). I would maybe not defend R.A.D. for large/complex projects, but I do think its very well suited for small projects and/or drafting more complicated ones.

The forces behind R.A.D. :

  • (+) Super-fast feature delivery: a whole website can designed very quickly.

  • (+) The code is very readable (not kidding!): having your 3-layers in the same place makes the interactions quite simple to follow.

  • (+) Facilitate the trial & error web page design process: It's hard to get a web page very usable at once, R.A.D. let you re-organize web page very easily.

  • (-) If the same SQL tables get reused between ASPX pages, then data access code tends to be replicated each time (idem for business logic).

  • (-) No simple way to perform unit testing.

  • (-) No structured way to document your code.

"CRUD layer" design pattern


The main purpose of the "CRUD layer" is to remove the business logic and the data access from the ASPX page to push it into an isolated .Net library. Yet, there are several major constraints associated to this process. The first constraint is to ensure an easy migration from R.A.D. to "CRUD Layer". Indeed, R.A.D. is usually the best prototyping solution. Therefore the main issue is not to implement from scratch but to improve the quality of the R.A.D. draft implementation. The second constraint is to maintain business logic and data access design as plain as possible (verifying the YAGNI principle among others). The CRUD layer is an attempt to address those two issues.

The CRUD layer consists in implementing a class

public class FooCrudLayer
{
public FooCrudLayer() { ... } // empty constructor must exists
public DataTable GetAllBar( ... ) // SELECT method
{
DataTable table = new DataTable();

using (SqlConnection conn = new SqlConnection( ... ))
{
conn.Open();
SqlCommand command = new SqlCommand( ... , conn);

using (SqlDataReader reader = command.ExecuteReader())
{
table.Load(reader);
}
}

return table;
}

public void Insert( ... ) { ... } // INSERT method
public void Delete( ... ) { ... } // DELETE method
public void Update( ... ) { ... } // UPDATE method
}

Notice that the select method returns a DataTable whereas most ObjectDataSource tutorials would return a strongly-typed collection List<Bar>. The presented approach has two benefits. First, the migration from the SqlDataSource is totally transparent (including the sorting and paging capabilities of the GridView).

<ObjectDataSource Id="FooSource" runat="server"
TypeName="FooCrudLayer"
SelectMethod="GetAllBar"
InsertMethod="Insert"
DeleteMethod="Delete"
UpdateMethod="Update" >

... (parameters definitions)

</ObjectDataSource>

Second, database objects are not replicated in the .Net code. Indeed, replicating the database objects at the .Net level involves some design overhead because both sides (DB & .Net) must be kept synchronized.

The forces behind the CRUD layers

  • (+) Very small design overhead compared to R.A.D.

  • (+) Easy migration from R.A.D.

  • (+) Unit testable and documentable.

  • (-) No dedicated abstraction for business logic.

  • (-) Limited design scalability (because it's not always possible to avoid code replication).

Full blown 3-tiers


I won't say much about this last option (this post is already far too long), but basically, the 3-tier design involves to strong type the database objects in order to isolate business logic from data access. This approach comes as a large overhead compared from the original RAD approach. Yet, if your business logic gets complex (for example workflow-like processes) then there is no escape, layers have to be separated.