Tuesday, February 14, 2006 


For a long time, SQL has been regarded by some technology expert as a disaster, or a barrier that prevents us from getting a lot of benefits from the relational algebra in software applications.
Problems such redundant queries, nullable fields, not supporting objects, were widely discussed with no proposed solution for them. Three manifestos were written to describe what we need in the new Query language that can solve the problems of SQL, but they stopped at the stage of giving heuristics to solve the problem, without giving the solution itself.

Microsoft guys are working on a new API called the LINQ. LINQ stands for Language Integrated Query. This is a new extension to the .NET languages that is mainly responsible for interacting with data- all sorts of data, such as databases, XML data, arrays and collections, registry data, and allows for interactions and transformation between all these data and objects in code.
LINQ itself can be regarded as a sign of the deficiency of SQL, where some technology experts see new query languages and data representations -such as XQuery, Object databases, XML- a sign for the problems of SQL, and see that these new data representations and query language try to give solutions for some of these problems, but they useless solutions. I think that LINQ is the best proposed solution till now.

LINQ allows for very good set of features, such as compiler support, compile time type checking, Interaction between structured data and objects, interaction between nullable and unnullable types, and much more.
From what has been declared till now, it seems to me that LINQ is a solution to solve some errors- not all the errors- that were already known in SQL and XML with nobody knowing how to solve them.

For example, this is an ordinary SQL select statement"
SELECT CompanyName FROM Customers WHERE (City = 'London')

There's a problem with the flow of data flow in SQL. You have to evaluate the From statement first before you can process the SELECT statement. In other words, the scope of data is flowing upwards.
In LINQ, scope of data is flowing downwards, and this is a natural thing. the above statement would be written like this in LINQ:
var q = From C in Customers Where C.City == "London" select C.CompanyName;

In the above code, we notice a new data type introduced, which is var. var is an object that gets its data type based on the value you assign to it.

LINQ also solves some of the problems known with the W3C DOM- such as support for namespaces, memory consumption, and document centricity- with XLINQ, which is part of LINQ responsible for the interaction with XML.

LINQ is a major design change-or addition- to the .NET programming languages, a change which is oriented in the direction of describing what we want to be done with the data, instead of how to do it.

Now we come to the downside. When interacting with databases, the compiler would change this LINQ commands to a SQL queries to be executed on the database. I think this is a point of weakness, or at least, a point where future development is possible. SQL contains problems itself on the database side, which now are not solved. An example of these problems is redundancy. SQL allows us to get a given result by more than one query- each having its own execution path. It is the responsibility of the database engine to look at the query and optimize the query to a form that takes the shortest execution path. This introduces a new problem, where the database engine has to use some sort of execution path caching to avoid defining an optimized execution path for each received query. Unfortunately, not all database engines succeed in doing this effectively.
In an amazing experiment, a technology expert wrote two queries that both return the same result, and executed both of them on a database - he didn't mention the name of the DBMS. One of the queries was executed in 2 seconds, and the other was executed in 2500 seconds!

So the way you write SQL queries may affect the performance of your application unless you are working on a powerful database. This is an example for a problem that LINQ doesn't address.

LINQ would be released with C# 3.0. Dan Fernandez has a post in his blogs about LINQ that contains a sample code. You can read it here.
Andres Hejlsberg has a video speaking about LINQ here.