Best Practices for Linq Enumerables and Queryables

LINQ expressions and their associated extension methods have greatly improved developer productivity ever since their introduction in .NET 3.5. Unfortunately, like any code abstraction, it can hide execution details that can come back to haunt you in performance problems and odd behavior.

I haven’t seen this information anywhere else, so I’m sharing my best practices for using LINQ and the C# language extensions.

(UPDATE: BTW, feel free to disagree with this, as long as you are thinking about your code. Maybe this should be called “Things to Think About with Linq Enumerables, etc.”, but it’s more fun to have an opinion.)

If you aren’t familiar with LINQ in C#, check out the classic 101 LINQ Samples in C#.

Also, click here for a great article on LINQ and Counting.

1. Use “Fluent C#” rather than the language extensions.

This is more of a personal preference. You can use the C# language extensions to write something that looks like a query. It’s usually pretty readable and reminds you of SQL.

var waCustomerIDs =
    from c in customers
    where c.Region == "WA"
    select c.ID;

The language extensions actually compile down to extension methods, and the actual code is more like this:

var waCustomerIDs = customers
    .Where (c => c.Region == "WA")
    .Select (c => c.ID);

Notice how this is a chain of method calls with lots of embedded anonymous lambda expressions. If you use the language extensions, the syntactic sugar masks the underlying flavor of the code. In other words, you are hiding the implementation from the next person to read the code.

I believe that obvious code is better than pretty code (or clever code), because it allows the reader to better understand what is going on in the code, which helps in debugging and performance evaluation.

2. “Seal” LINQ chains as soon as they have been built (if they are going to be reused).

One of the beauties of LINQ is that the IEnumerable and IQueryable interfaces are composable. You can build chains of filters and maps and evaluate them after the chain is built. In fact, that is what the extension methods actually do:

var allCustomers = customers;
var waCustomers = allCustomers.Where (c => c.Region == "WA");
var waCustomerIDs = waCustomers.Select (c => c.ID);

The danger here is that if you use your chain in more than one place, the filter logic will get evaluated more than once. That means your source IEnumerable may be scanned multiple times, or in the case of an IQueryable, you may hit your database more than once. This can cause unexpected performance problems.

var waCustomers = customers.Where (c => c.Region == "WA");

// this will scan the entire list of customers twice
var waCustomerIDs = waCustomers.Select (c = c.ID);
var waCustomerNames = waCustomers.Select (c = c.Name);

The solution to this is to “seal” your chain by converting it ToList as soon as you finish building the chain. (UPDATE: When I say “built”, I mean when you have added all of the filtering, sorting and other stuff to the chain. If you plan on adding more to the chain, you might want to defer execution until then. It depends on your situation.)

Then the data is resident in memory as a new filtered, sorted, etc. List. Since List is also an Enumerable, you can continue to build the chain with additional filters. You can think of ToList as a checkpoint to tell the system to evaluate the chain into a new chain in memory.

var waCustomers = customers.Where (c => c.Region == "WA")
   .ToList();

// NOW waCustomers is a List of customers that has already been filtered

// this will only scan the list of WA customers twice
var waCustomerIDs = waCustomers.Select (c = c.ID);
var waCustomerNames = waCustomers.Select (c = c.Name);

UPDATE: note that if you aren’t going to reuse your LINQ chains, then converting it to a list is probably a waste. Also, if your list is going to have a lot of data, you might not want to pull everything into memory at once. In that case, you should really use IEnumerable, but you should be extra careful to make sure you aren’t exposing any unintended behavior.

var waCustomers = customers.Where (c => c.Region == "WA");

// no need to close up this chain, because you're only using it once and you aren't returning it
foreach (var customer in waCustomers)
    Console.WriteLine(customer.Name);

3. Only return “open” chains if you want to allow for composition.

A corollary of #2 is that your methods (particularly public API methods) should never return an “open” (“unsealed”) LINQ chain. If you return an open chain, you have no idea how the caller is going to use the chain, and your code may get called in unexpected ways at unexpected times.

public IEnumerable<Customer> GetCustomersForState(string state)
{
     return customerIDs = customers
        .Where (c => c.Region == state);
}

// this will scan the entire list of customers twice
// AND the caller won't know it
var waCustomers = GetCustomersForState("WA");
var waCustomerIDs = waCustomers.Select (c = c.ID);
var waCustomerNames = waCustomers.Select (c = c.Name);

The exception to this rule is when you want to allow for composition. Returning an open IQueryable may allow the caller to add additional filtering or sorting to your query, and then evaluate the query against a data source like SQL Server. The query is then built in the client and the logic is sent to the server for evaluation.

public IQueryable<Customer> GetCustomersForState(string state)
{
     return customerIDs = customers
        .Where (c => c.Region == state);
}

var waCustomers = GetCustomersForState("WA");
// waCustomers has not been evaluated and is still an IQueryable

var waPrimeMembers = waCustomers.Where(c => c.Status == "Prime");
// waPrimeMembers has not been evaluated and is still an IQueryable

var results = waPrimeMembers.ToList();
// NOW we hit the data source and filter on Region AND Status at the same time.

4. Make sure the semantics for an “open” chain return are clear

I would say you should always seal your IEnumerables before returning them. Sometimes you can leave IQueryable unsealed when returning them, but make sure that your API semantics and documentation make it clear about the behavior of the returned object.

5. If you are returning an open chain, make sure the inner code is free of side-effects

One of the dangers of LINQ is that the filter lambdas are actually anonymous functions that are called back during evaluation. If you don’t seal your chain, then you aren’t in control of when the lambdas get executed. This means your lambda will be executed in an unknown context.

public IEnumerable<Customer> GetCustomersForState(string state)
{
     return customerIDs = customers
        .Where (c => /* THIS WILL GET CALLED AT AN UNEXPECTED TIME */);
}

So make sure that your lambda expressions don’t have any side-effects. At one point I had to debug a crazy bug where:

  • An unsealed IEnumerable was returned from a WCF method.
  • WCF unwound the stack and reverted the security principal back to the service.
  • WCF attempted to serialize the IEnumerable to the return message.
  • The IEnumerable chain was evaluated and one of the filter lambda expressions was called.
  • The lambda expression evaluated a property on one of the objects.
  • The object’s property get performed an security access check.
  • Since the security context was now the service and not the caller, the check failed.

This was solved simply by sealing the result with a ToList before returning.

So be careful when returning open chains. Better yet, seal everything before you let anyone else see it.

BTW – these best practices also apply to the Linq.js javascript library.

7 thoughts on “Best Practices for Linq Enumerables and Queryables”

  1. Maybe be explicit and take advantage of strong typing where you can – e.g. methods with sealed, in-memory semantics always have return type of array so they are necessarily sealed. Returning IEnumerable for deferred execution could be made more explicit by convention – maybe suffixing the method name with “Deferred” (similarly to the convention of suffixing Task-returning methods with “Async”). Returning IQueryable already pretty explicitly claims “deferred”, but it couldn’t hurt to suffix these methods as well. What do you think?

  2. There is some interesting discussion on this over at http://news.ycombinator.com/item?id=4427605

    For clarity – many of the code snippets above are contrived to illustrate some points and probably don’t look like your production code (at least I hope not).

    Also – the point of the best practices is for you to understand the code that you are writing and the impact small syntax changes have on the actual execution of the code. I wouldn’t take any of the points above as an absolute law.

    For example (as in the HN comments):
    - no, you wouldn’t convert your enumerable to a list just to do a modification on the list (i’ll update the article to make that clear)
    - yes, sometimes you want to defer execution of your query as long as possible (particularly with LINQ-to-SQL), but at other times it buys you nothing
    - yes, in a lot of cases the language extensions are significantly more readable, but they do hide the implementation details (again more of a personal preference for code reviews)

  3. A coworker sent me this article today. Pretty good summation of the potential pitfalls of using IEnumerables build via ‘yield return’.

    I think the single biggest thing to take away from this article is that if you are doing things in a chained LINQ statement that depend on having a certain context when executing, i.e., elevation to a service account, having a connection to a database, etc., you have to make sure that your chain is sealed before the point at which that context is no longer available.

    A classic example of this is elevation:

    IEnumerable files;
    using( Elevation e = new Elevation(ServiceAccount))
    {
    files = AccessPrivateResourcesAndReturnAnUnsealedChain();
    }

    foreach( Thing file in files )
    {
    // Reading the files actually happens here instead of
    // in the Elevation block.
    //
    // FAIL.
    }

    Coalescing that IEnumerable or “sealing the chain” inside that Elevation block would prevent this failure.

  4. Regarding returning “open” IEnumerables: I’ve found some LINQ implementations(NHibernate for example) will implicitly cast IQueryable to IEnumerable and force the IQueryable to enumerate before returning. Just another thing to keep an eye on!

Leave a Reply