Udostępnij za pośrednictwem


Continuing to an outer loop

When you have a nested loop, sometimes you want to “continue” the outer loop, not the inner loop. For example, here we have a sequence of criteria and a sequence of items, and we wish to determine if there is any item which matches every criterion:

match = null;
foreach(var item in items)
{
foreach(var criterion in criteria)
{
if (!criterion.IsMetBy(item))
{
// No point in checking anything further; this is not
// a match. We want to “continue” the outer loop. How?
}
}
match = item;
break;
}

There are many ways to achieve this. Here are a few, in order of increasing awesomeness:

Technique #1 (ugly): When you have a loop, the “continue” statement is essentially a “goto” which branches to the bottom of the loop, just before it does the loop condition test and pops back up to the top of the loop. (Apparently when you spell “goto” as “continue”, the “all gotos are all evil all the time” crowd doesn’t bother you as much.) You can of course simply make this explicit:

match = null;
foreach(var item in items)
{
foreach(var criterion in criteria)
{
if (!criterion.IsMetBy(item))
{
goto OUTERCONTINUE;
}
}
match = item;
break;
OUTERCONTINUE:
;
}

Technique #2 (better): When I see a nested loop, I almost always consider refactoring the interior loop into a method of its own.

match = null;
foreach(var item in items)
{
if (MeetsAllCriteria(item, critiera))
{
match = item;
break;
}
}

where the body of MeetsAllCriteria is obviously

foreach(var criterion in criteria)
if (!criterion.IsMetBy(item))
return false;
return true;

Technique #3 (awesome): The “mechanism” code here is emphasizing how the code works and hiding the meaning of the code. The purpose of the code is to answer the question “what is the first item, if any, that meets all the criteria?” If that’s the meaning of the code, then write code that reads more like that meaning:

var matches = from item in items
where criteria.All(
criterion=>criterion.IsMetBy(item))
select item;
match = matches.FirstOrDefault();

That is, search the items for items that meet every criterion, and take the first of them, if there is one. The code now reads like what it is supposed to be implementing, and of course, if there are no loops in the first place then there’s no need to “continue” at all!

The biggest thing I love about LINQ is that it enables a whole new way to think about problem solving in C#. It’s gotten to the point now that whenever I write a loop, I stop and think about two things before I actually type “for”:

* Have I described anywhere what this code is doing semantically, or does the code emphasize more what mechanisms I am using at the expense of obscuring the semantics?

* Is there any way to characterize the solution to my problem as the result of a query against a data source, rather than as the result of a set of steps to be followed procedurally?

Comments

  • Anonymous
    January 10, 2010
    [OK, I understand that solving the initial question is not the point of the article ;) ] Do you have problems with the following code, that has been written, tried and run by generations of programmers : match = null; foreach(var item in items) {  foreach(var criterion in criteria)  {    HasMatch=true;    if (!criterion.IsMetBy(item))    {      // No point in checking anything further; this is not      // a match. We want to “continue” the outer loop. How?     HasMatch=false;     break;    }  }  if(HasMatch) {     match = item;     break;  } } BTW, does the compiler optimizes out the building of the list of matches that you don't care about ?

  • Anonymous
    January 10, 2010
    I love the concepts of LINQ and find it very useful for doing many the small tasks involved with datasets in a very quick and easy to understand manner.  I really like that you can read a function and know what it is trying to accomplish rather than worrying about how... most of the time. My personal favorites are test grading queries: var numberCorrect = test.Questions.Count(question => question.SelectedAnswer == question.CorrectAnswer); I see you use the from/where/select syntax a lot, do you prefer it over the functional syntax?

  • Anonymous
    January 10, 2010
    This post brings to mind some performance considerations I came across while comparing LINQ to nested foreach loops, usually involving joining two lists. To get to the root of it, nested foreach statements seemed to perform better on smaller lists than LINQ; it seemed there would always be a few milliseconds consumed by LINQ, I suppose in some overhead. It was as the lists got progressively larger that LINQ shined; the loops would require significantly more time (geometrically or exponentially, I'm unsure of the appropriate term), whereas the LINQ results would only consume a minimal additional amount of time. On the one hand, I'm not sure if performance differences on smaller lists would make me consider giving up the syntax of LINQ, because it really isn't all that much time and I'm not really making performance-critical applications. On the other hand, if I'm having to join large lists together (where LINQ really shines), perhaps I need to reconsider exactly how I'm dealing with data in my code. I should point out here that I have not actually been able to replicate my earlier findings on my machine here at work. At home, the performance differences were very consistent. Here, it's a bit hit-or-miss. I can only say there are differences between the two machines and the software used, but my home machine is actually the more powerful of the two. It's... odd.  We shipped some performance improvements (I think in a service pack, though I don't recall the details) to the way that where-followed-by-select works; that might account for some of the difference. -- Eric

  • Anonymous
    January 11, 2010
    I never understood the hate for goto. I would argue that the first method is as good as the second and definitely better than the one suggested by Mauvaisours. Of course the third method is the best but if goto is so bad then why is it in C# (and how is it worse than Java's break label)? Note that the C# goto was carefully designed to be less hazardous than other languages' gotos. Labels have similar scoping rules as locals, which means that you can "goto" within a block, or go from an inner block to an outer block, but you cannot branch in to the middle of a try block, a loop, and so on. That makes it easier to analyze the preconditions of a given hunk of code. -- Eric

  • Anonymous
    January 11, 2010
    Welcome, algorithms guy, to the specification pattern.

  • Anonymous
    January 11, 2010
    I've started to consider 'foreach' to be a code smell, unless it's in a library class. Once you've got the extensions to IEnumerable<T>, and added a relatively small set of extra methods, 'foreach' becomes unnecessary in 'proper' code. That sounds odd, but for the most part pulling out a looping method makes most code easier to read.

  • Anonymous
    January 11, 2010
    Steve Cooper: I hope you're not suggesting that we should be using .Foreach() extension methods to perform side effects. I mean, you can suggest that foreach is unnecessary, but it's "bad" to use functional notation to produce non-functional output.

  • Anonymous
    January 11, 2010
    "BTW, does the compiler optimizes out the building of the list of matches that you don't care about ?" No, but since LINQ uses lazy evaluation as much as possible, the list of matches you don't care about isn't actually built at all anyway. (So the compiler doesn't optimize it, LINQ does.)

  • Anonymous
    January 11, 2010
    For Technique #3 one can also use the IEnumerable.Any method which does a boolean check to see if the IEnumerable is empty or not.  That alleviates checking for null, and also makes it easier to work with "value types" (where the default value is non-null). Etan Bukiet

  • Anonymous
    January 11, 2010
    Other languages support labelled continue's. Why doesn't C#? For example, OuterLoop: for ...  {  // for, foreach, while, etc    for ... {  // Inner loop        if (...)            continue OuterLoop;    } }

  • Anonymous
    January 11, 2010

  • Make entire look and condition maching a new function
  • Avoid extra complexity for such a simple task non recursive task (avoid Linq) Keep it as simple as possible so that the code does not fail the 'Monday morning without sleep the night before' understandability test.  This makes the next guy inheriting the code not waste time and money trying to understand using 2 lines of Linq to replace 4 lines of regular code.
  • Anonymous
    January 11, 2010
    The LINQ implementation is definitely concise, and to those who've taken the time to learn LINQ it would certainly be a faster read. How does the performance compare for traditional looping approaches and the concise LINQ implementation?

  • Anonymous
    January 11, 2010
    Anything wrong with: item myItem = items.Where(criterion=>criterion.IsMetBy(item)).Select(item).FirstOrDefault(); ?? This isn't right. The lambda parameter "criterion" in your example is not of type Criterion, it's of type Item. Remember, the query is "find the first item that meets all the criteria." -- Eric   I use this method often when I am looking for a single record... Dave

  • Anonymous
    January 11, 2010
    The comment has been removed

  • Anonymous
    January 11, 2010
    While the LINQ solution is great for readability and with foreach doesn't applies, if you switch the inner loop to a 'for', you don't have to use break/continue...  criteriaValid=true;
     for (int index=0; index < criteria.Count && criteriaValid; index++)
     {
       if (!criteria[index].IsMetBy(item))
       {
         criteriaValid = false;
       }
     } You can even then act accordingly on the outher loop using 'criteriaValid'. Just my 2 cents

  • Anonymous
    January 11, 2010
    It appears the LINQ code is requesting a list of matches, and then choosing one.  That result may be different than the first found match.   Is the “matches” list ever realized?  A big, or slow, list of items will perform badly If IsMetBy() has side effects, can wee see the whole list of items being tested? If IsMetBy() has no side effects, can the compiler “see” the optimization?

  • Anonymous
    January 11, 2010
    For me, the interesting part of this post is the use of the All method as basically a short-cut AND for a list of predicates.  I never thought of using the All function for that purpose and had created my own predicate concatenation methods.  Now I realize that All can be used for AND and Any can be used for OR. Thanks again, Eric, for a thought provoking post!

  • Anonymous
    January 11, 2010
    Inspired by this article, I just replaced a thorny nested foreach loop with a query in my project. You rock! Quick question: I don't think there's any way to remove this loop, but I could be wrong: IEnumerable <Node> EnumerateAncestors (Node node) {    for ( Node n = node; n != null; n = n.Parent () )        yield return n; }

  • Anonymous
    January 11, 2010
    I would far and away prefer technique 2.   Technique 3 may be easier understand in terms of why it gets the right answer, but it's terrible in terms of how it gets to that answer.  Get an object representing all matches and then take just the first one?  Terribly inefficient! Yes, I know that LINQ is lazy, and the LINQ code in technique 3 isn't significantly slower than looping, but it looks as if it would perform poorly, and that's a problem.  This is my fundamental problem with LINQ.  We developers are bad enough about writing efficient code -- LINQ just makes it that much harder to recognize efficient code when we see it. All coding is working with abstractions. How does your argument not also apply to, say, local variables in C? Used to be before C that you got to decide which locals were enregistered and which were not; now the C compiler does it for you and you cannot recognize what is efficient use of registers and what is not. Would you rather write everything in assembler? -- Eric

  • Anonymous
    January 11, 2010
    I have to disagree with Alan: Clarity and robustness are usually far more important than micromanaging efficiently. And if you must micromanage efficiency, we have C and C++ which force you to do so. No one ever chose a language such as C# because garbage collection is more efficient at managing memory than direct memory management. Just by choosing C#, you've decided to trust (or be indifferent to) the language/environment developers to do the grunt work acceptably well. Once you trust the environment to do memory management, you might as well trust it to do queries. In more than 30 years of application development, I don't think I've ever seen a substantial performance problem caused by sub-optimal micro-management. Every one of the substantial performance problems has been caused by architectural or design failures. And I've been persuaded that while LINQ by no means makes it impossible to create a bad architecture or design, it does make it easier overall to implement a good architecture and design, and makes those elements much clearer.

  • Anonymous
    January 11, 2010
    Larry H.: Your EnumerateAncestors is an unfold operation; using Rx: return EnumerableEx.Generate<Node, Node>(node, n => n != null, n => n, n => n.Parent()); (not sure if type inference would work on that)

  • Anonymous
    January 11, 2010
    The comment has been removed

  • Anonymous
    January 12, 2010
    @Alan/@DRBlaise - I think that LINQ will (over the long run) absolutely result in more efficient and scalable code than traditional looping/iteration constructs. Here's why. As Moore's law morphs from "more cpu cycles / unit-time" to "more cores per chip" - the importance and value of a query abstraction layer like LINQ will increase. The ability to take a query and parallelize it when written in procedural, a-follows-b form is incredibly hard. It generally requires human understanding of the unstated conditions and constraints - a machine transformation is extremely difficult, if not impossible. By contract, a semantic query is much easier to parallelize with something like PLINQ: var matches = from item in items.AsParallel()              where criteria.All(                criterion=>criterion.IsMetBy(item))              select item; match = matches.FirstOrDefault(); Just as .NET already makes it possible to optimize your code at runtime (via JIT) for a particular computing instruction set (x86, x65, etc) - I imagine that with things like PLINQ and TPL it will soon be able to optimize it for a particular multi-processing architecture (# of cpu cores). In fact I would be surprised if there aren't significant efforts going on now to figure out ways to make it even easier to solve such problems more transparently (i.e. without manually injecting the AsParallel() call). I think the improvements in rapid development, maintainability, and comprehension that LINQ adds to C#/.NET development will be paying dividends to those developers that understand it and apply it in the right contexts.

  • Anonymous
    January 12, 2010
    For the concise crowd: var match = items.FirstOrDefault(item => item.All(criterion => criterion.IsMetBy(item)));

  • Anonymous
    January 12, 2010
    Let me try that again: var match = items.FirstOrDefault(item => criteria.All(criterion => criterion.IsMetBy(item)));

  • Anonymous
    January 12, 2010
    The comment has been removed

  • Anonymous
    January 12, 2010
    Some interesting observations there Anthony P. Could you clarify your comment below? "Introduce a nested loop where I was going through that same list of 80000 integers and comparing it with another list of 66,667 integers (or whatever the number). All integers that were equal between the two lists were added to a third. Incidentally, there were 53,333 matches. It tooks the nested loops 33,319 milliseconds to arrive at that result. LINQ did it in 53." Is 53 meant to be 53 milliseconds (3 orders of magnitude faster?) or 53,000 milliseconds (approaching half the speed?).

  • Anonymous
    January 12, 2010
    This is the one and only time I've been jealous of PHP developers.  As I remember, in PHP the syntax is simply 'continue 2' (or 3,45 etc...).  Now THAT's a nifty language feature.

  • Anonymous
    January 12, 2010
    Sebastian: You're not precisely replacing a foreach with a query per se, but quite interesting nonetheless. Thanks!

  • Anonymous
    January 12, 2010
    @Trees, 53 milliseconds. Blazingly fast when compared to nested for loops in the admittedly contrived scenario of linking thousands of ints to other thousands of ints.

  • Anonymous
    January 13, 2010
    The comment has been removed

  • Anonymous
    January 13, 2010
    The comment has been removed

  • Anonymous
    January 13, 2010
    The comment has been removed

  • Anonymous
    January 13, 2010
    Well, yes, it was stated that it was a nested loop producing a third list, and a LINQ join version producing the third list. I do not believe I left any of that out of my original description? But you're right, if LINQ actually is going about it in a different way than the nested loops between lists, then that explains the difference but also contradicts what Eric said "Behind the scenes, it's doing all the same loops as the original program, it's just providing a more convenient syntax." I guess the truth is it's doing the same loops as an entirely more efficient version of the original might have done had I implemented such a feature!

  • Anonymous
    January 13, 2010
    I personally like option 2 and the option as proposed by Mauvaisours (at the start of the comments) more than option 3. Yes, option 3 makes it clear what you want and not how you want to get it, but precisely because of that does it hide that it's going to do a lot of work when executed. What you are doing is an O(|items|*|criteria|) operation. Having clear loops makes it easy to recognise you'll be busy executing that loop for a while if |items| and/or |criteria| is very large. The LINQ hides that to some degree. It's just like you expect the 'get' part of a property to return fast, while a GetSpecificStuff() tells you that this might take a while to return.

  • Anonymous
    January 13, 2010
    The comment has been removed

  • Anonymous
    January 13, 2010
    The comment has been removed

  • Anonymous
    February 26, 2010
    Wouldn't the easiest solution be to loop over criteria outside and items inside?

  • Anonymous
    November 04, 2011
    The comment has been removed