Many to many relationships in LINQ-to-SQL

LINQ is a new component in .NET 3.5 that has a lot of .NET developers very excited. It provides a simple and flexible method of querying object-encapsulated data. The LINQ to SQL provider in particular has been getting a lot of attention since it can be used to query SQL Server databases using the same simple syntax that is used to query collections in memory - and furthermore can send updates made to the data back to the database. This feature, combined with very slick integration into Visual Studio 2008, provides a very strong motivation for developers to make the leap to .NET 3.5.

Unfortunatley, LINQ to SQL has a few limitations, the most obvious of which is the inability to directly handle the common representation of a many-to-many relationship. In this series, I look at the issue and some of the ways of circumventing it.

Let's look at this simple example from a simplified version of the Northwind database:

A simple schema representing a many-to-many relationship between Orders and Products

Since LINQ defines only one-to-one and one-to-many relationships, you end up with two classes in your object model representing the two participating domain objects (Order and Product), and a third representing the relationship between them. Often we'll want to write something like:

foreach (Product p in order.Products) { ...

- as we would with a one-to-many relationship - but, since there isn't a Products property of the Order object, we need to write:

foreach (Product p in order.Order_Details.Select(od => od.Product)) { ...

- which isn't as clear as we'd like it to be.

In this simple case, we are able to alter a partial implementation of the Order class (right click on it in the design view and select "View Code"). To this class declaration we can add the following code:

public IEnumerable Products
{
    get { return Order_Details.Select(od => od.Product); }
}

This is ideal for most purposes. We can now write foreach (Product p in order.Products) whenever we need to iterate through all the products in an order. We'll need to add code to the Products class as well if we need to query all the orders that include a certain product, but that's not too much hassle.

This solution is so simple that a lot of users are asking "Why doesn't LINQ do this already?". That's a good question. The use of a junction table containing nothing but two foreign keys (or sometimes two foreign keys plus a unique ID for the relationship) to represent a many-to-many relationship is practically ubiquitous. Why would a development tool as paradigm-shifting as LINQ omit this simple feature?

There's a simple answer: Not all many-to-many relationships have a junction table as simple as the one above. What if we need to track some property of the relationship other than the two participating entities?

A simple schema representing a many-to-many relationship between Orders and Products, including a property of the relationship - Quantity

As you can see, this example is a little less forgiving. We still want to be able to write foreach (Product p in order.Products), but what about the Quantity column of the Order_Detail table?

The problem is that the quantity of a product associated with an order is neither a property of the product, nor a property of the order. It's a property of the relationship between the product and the order. We could define a new type (possibly an anonymous type in C#) to contain all the fields of Product, plus all the relevant properties of the relationship (and another new type if we need to handle the relationship the other way around), but that doesn't really sound like a stellar plan. A Product is a Product, and having different types for dataContext.Products and order.Products isn't very productive.

In addition, having order.Products return an IEnumerator might let us perform our foreach, but we lose a number of other LINQ features - most notably the ability to handle inserts and deletes in the object model. What if we want to add a product to an order?

The relational model is very expressive, and it is important that our tools for interacting with it are no less expressive. By automating the simple case, we can see that we lose a great deal when the schema is not so simple.

That isn't to say that the simple solution I outlined above isn't useful. Being able to DataBind order.Products with a few lines of code is really handy - but you'll need to delve deeper to handle more complicated cases.

I'll be taking a more detailed look at ways to handle other situations later.

Very nice article

Iain - thanks for sharing your ideas - very thoughtful insight!
U've pointed out some very good moments here

Submitted by Roger (not verified) on 10 July, 2008 - 11:16.
Thanks for your overview

What I have found that works really well is to remove all the relationships from all the entities(tables). And then manage the relationships myself. This works very well. As I am working in a thoroughly disconnected environment, I almost prefer this to having LINQ keep track of the state of all my objects with the connections left open the entire time. I am really accustomed to opening the connection, taking care of business, and moving on with my bad self! If you would like I can show you an example of this...let me know.

Submitted by aoc (not verified) on 16 October, 2008 - 03:49.
Disconnected >.<

The issue occurs when some muckety-muck connects to your database from some source other than your app. Next thing you know, your "relational integrity" is now compromised because an update wasn't made through your app which would check all these things.

Submitted by Anonymous (not verified) on 12 November, 2008 - 14:29.
I have to agree

I'm not generally supportive of the old-school DBA with his massive library of sprocs, but you can't really take the disconnected approach if your database is going to be shared between apps. (One might add that you don't know if your database is going to be shared between apps, and thus you can't use the disconnected approach regardless.)

Some might argue that's the reason more heavyweight technologies like the Entities Framework exist.

My focus, however, is generally on rapid prototyping - delivering a working unit of software as quickly as possible. LINQ-to-SQL in particular has really been excellent for those purposes - I can set up my data model with a database designer, set up relationships, then once I've created my data context I can create mock data in my code wherever a control needs it. I can interact with my mock data and "real" data (from the db) in the same way. That really speeds up development.

If my data model changes during development (and believe me, it does), I just wipe my dbml file, pull the tables in again, and everything that ought to still work still works.

If I was taking the disconnected approach, every time the data model changed I'd have to rewrite my queries, my integrity rules, etc. All for the sake of not handling data integrity in the database?

Submitted by Iain on 27 November, 2008 - 13:32.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre>
  • Lines and paragraphs break automatically.

More information about formatting options