Back to Basics - Life After If, For and Switch - Like, a Data Structures Reminder

June 13, 2008 Comment on this post [42] Posted in Back to Basics | Learning .NET | Programming | Source Code

Massives Ifs are Sometimes Maps

He had some code that parsed a ridiculous XML document that came back from a Web Server. That the format of the XML was insane wasn't his fault, to be sure. We all have to parse crap sometimes. He had to check for the existence of a certain value and turn it into an Enum.

if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("fog") >= 0)
{
    wt = MyEnum.Fog;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("haze") >= 0)
{
    wt = MyEnum.Haze;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("dust") >= 0)
{
    wt = MyEnum.Haze;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("rain") >= 0)
{
    wt = MyEnum.Rainy;
}

...and this went on for 40+ values. There's a few problems with this.

First, he's using IndexOf() and ToLower() to when he's trying to say "ignoring case, does this string contain this other string?" Using ToLower() for a string comparison is always a bad idea, and not just because of the Turkish i problem (details here, here, here and here). Be sure to check out the Recommendations for Strings in .NET.

We could make this simpler and more generic with a helper method that we'll use later. It expresses what we want to do pretty well. If we were using .NET 3.5 we could make this an extension method, but he's on 2.0.

private static bool ContainsIgnoreCase(string s, string searchingFor)
{
    return s.IndexOf(searchingFor, StringComparison.OrdinalIgnoreCase) >= 0;
}

Second, he's indexing into the Attributes collection over and over again, and he's hasn't "else cased" the other ifs, so every indexing into Attributes and every other operation runs every time. He can do the indexing once, pull it out, then check his values.

string ws = xmlNode.Attributes["someAttr"].Value;
if (ContainsIgnoreCase(ws, "cloud"))
    wt = MyEnum.Cloudy;
else if (ContainsIgnoreCase(ws, "fog"))
    wt = MyEnum.Fog;
else if (ContainsIgnoreCase(ws, "haze"))
    wt = MyEnum.Haze;
else if (ContainsIgnoreCase(ws, "dust"))
    wt = MyEnum.Dust;
else if (ContainsIgnoreCase(ws, "rain"))
    wt = MyEnum.Rainy;
else if (ContainsIgnoreCase(ws, "shower"))

...and again, as a reminder, this goes on for dozens and dozens of lines.

We were talking this through step by step to explain my "from point a to point d" wait of thinking. I tend to skip b and c, so it's useful to show these incremental steps, kind of like showing all your work in Math class when doing long division.

At this point, I pointed out that he was clearly mapping the strings to the enums. Now, if the mapping was direct (and it's not for a variety of horrible many-to-one reasons that are spec-specific as well as that this a "contains" operations rather than a direct equality comparison) he could have parsed the string an enum like this where the last parameter ignores case:

wt = (MyEnum)Enum.Parse(typeof(MyEnum), ws, true);

However, his mapping has numerous exceptions and the XML is messy. Getting one step simpler, I suggested making a map. There's a lot of folks who use Hashtable all the time, as they have for years in .NET 1.1, but haven't realized how lovely Dictionary<TKey, TValue> is.

var stringToMyEnum = new Dictionary<string, MyEnum>()
              {
                   { "fog", MyEnum.Fog},
                   { "haze", MyEnum.Fog},
                   { "fred", MyEnum.Funky},
                    //and on and on
              };

foreach (string key in stringToMyEnum.Keys)
{
    if (ContainsIgnoreCase(ws, key))
    {
        wt = stringToMyEnum[key];
        break;
    }
}

Because his input data is gross, he spins through the Keys collection and calls ContainsIgnoreCase on each key until settling on the right Enum. I suggested he clean up his input data to avoid the for loop, turning the whole operation into a simple mapping. At this point, of course, the Dictionary can be shoved off into some global readonly static resource.

string ws = xmlNode.Attributes["someAttr"].Value;
//preproccessing over ws to tidy up
var stringToMyEnum = new Dictionary<string, MyEnum>()
              {
                   { "fog", MyEnum.Fog},
                   { "haze", MyEnum.Fog},
                   { "fred", MyEnum.Funky},
                    //and on and on
              };
wt = stringToMyEnum[key];

When Switches Attack

Often large switches "just happen." What I mean is that one introduces a switch to deal with some uncomfortable and (apparently) unnatural mapping between two things and then it just gets out of hand. They tell themselves they'll come back and fix it soon, but by then it's grown into a hairball.

My buddy had a method that was supposed to remove an icon from his WinForms app. The intent is simple, but the implementation became another mapping between a fairly reasonable Enum that he couldn't control, and a number of named icons that he could.control.

The key here is that he could control the icons and he couldn't easily control the enum (someone else's code, etc). That the mapping was unnatural was an artifact of his design.

The next thing he knew he was embroiled in a switch statement without giving it much thought.

private void RemoveCurrentIcon()
{ 
    switch (CurrentMyEnumIcon)
    {
        case MyEnum.Cloudy:
            CustomIcon1 twc = (CustomIcon1)FindControl("iconCloudy");
            if (twc != null)
            {
                twc.Visible = false;
                this.Controls.Remove(twc);
            }
            break;
        case MyEnum.Dust:
            CustomIcon2 twd = (CustomIcon2)FindControl("iconHaze");
            if (twd != null)
            {
                twd.Visible = false;
                this.Controls.Remove(twd);
            }
            break;
            //etc...for 30+ other case:

There's a few things wrong here, besides the switch is yucky. (Yes, I know there are uses for switches, just not here.) First, it's useful to remember when dealing with a lot of custom classes, in this case CustomIcon1 and CustomIcon2 that they likely have a common ancestor. In this case a common ancestor is Control.

Next, he can control the name of his controls. For all intents in his WinForms app, the Control's name is arbitrary. Because his controls map directly to his Enum, why not literally map them directly?

private void RemoveCurrentIcon(MyEnumIcon CurrentMyEnumIcon)
{
    Control c = FindControl("icon" + CurrentMyEnumIcon);
    if(c != null)
    {
        c.Visible = false;
        Controls.Remove(c);
    }
}

The recognition of the shared base class along with the natural Enum to Control name mapping turns 160 lines of switch into a single helper method.

It's very easy, seductive even, to try to explain to the computer to "do it like this" using if, for and switch. However, using even the basic declarative data structures to describe the shape of the data can help you avoid unnecessary uses of the classic procedural keywords if, for and switch.

FINAL NOTE: We worked for a while and turned and 8100 line long program into 3950 lines. I think we can squeeze another 1500-2000 lines out of it. We did the refactoring with the new Resharper 4.0 and a private daily build of CodeRush/Refactor!Pro. I'll do some writeups on both wonderful tools in the coming weeks.

Technorati Tags: Refactoring,CodeRush,Resharper,BackToBasics

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

About Newsletter

Hosting By

Hosted on Linux using .NET in an Azure App Service

Comment on this post [42]

Share on BlueSky or use the Permalink and post anywhere!

June 13, 2008 14:58

Great post. You should do a series of these.

Shawn

June 13, 2008 15:03

Thanks! I think I will. I've put this into the "back to basics" category.

Scott Hanselman

June 13, 2008 15:38

This is a great topic for a series... anything to increase the general level of programmer competency is a good thing. Another way to look at your first example is moving from a "code" implementation to a "data" implementation. It's generally easier to update data to add new types or change mappings than it is to change the code that does the same thing.

http://technorati.com/people/technorati/JaredReisinger

June 13, 2008 15:40

nice.

i ended up with a big honking switch, just today that had a lot of:

case MyEnum.Cloudy: return "I think it will rain";
case MyEnum.Drizzle: return "there's some light drizzle already";
}

but maybe declaring a dictionary would've been more succinct.

secretGeek.

June 13, 2008 15:52

good read scott... it is awesome to read about advanced developers taking a little time to teach the new guys some tricks/basics

NotMyself

June 13, 2008 17:26

Using a mapping perhaps ought to suggest using a functional construct as well, e.g. this does the work of the for loop:


            wt = (from key in stringToMyEnum.Keys where ContainsIgnoreCase(ws, key) select stringToMyEnum[key]).SingleOrDefault();

막스 Max C

June 13, 2008 18:27

Great article. Thanks Scott.

Leo

June 13, 2008 19:20

Great article... these concepts are frustrating for me because I know of them, know how to implement them, know when to use them. The crux is that I only see this solution after I've implemented the naive way.

I personally struggle getting away from the {if, for, switch } coding thought process. Even when I refactor code to get it the way you describe, my mind is still thinking "switch". How do you suggest getting past this block? Or do you think it's reasonable to code the switch and then refactor?

I liken this dilema to learning a new language (verbal or code I guess). I'm a fluent English speaker, but I do know some French. When I come up with a French sentence I'm constructing it from English... I'm not thinking in French. Then again, I'm not a fluent French speaker. Je ne parle pas francais. Do you think this analogy works?

Scott

Scott Muc

June 13, 2008 19:31

It's all about maintainability, right? If the first version of your code only has three conditions, then mapping it to a custom data structure...or even just a dictionary pointing to delegates is going to be overkill IMHO. It's sort of like premature refactoring.

I think it's only when you know there will be growth in a particular area like that is it worth it to implement this type of thing up front....that or you know that it will be much more difficult to change it to a more extensible method down the road (which is not often for this particular problem IMO).

James

jbrechtel

June 13, 2008 20:30

Does VB9 provide equivalent dictionary initializer syntax to:

var stringToMyEnum = new Dictionary<string, MyEnum>() { { "fog", MyEnum.Fog}, { "haze", MyEnum.Fog}, { "fred", MyEnum.Funky} };

Nick

June 13, 2008 21:20

In response to Scott Muc's question about whether it is reasonable to code the switch and then refactor it, I would have to say yes. Sometimes you need to get the idea down before you can see what it is. After that though you need to be able to bring an editor's eye to what you have written to see if it can be better expressed. This takes practise and a willingness to be critical of your own work but it is also how we learn.

I think the language analogy is appropiate. I have being studying Japanese on and off for a number of years and I still go through the laborious English to Japanese translation process but every once and a while I find myself thinking in japanese (though these moments are still few and far between). Coding is much the same. It is a language with which we do things. Sometimes our writing is long winded and other times we can code per haiku.

Bill

http://wtullochgoesblog.blogspot.com/

June 13, 2008 21:23

I totally agree. Sometimes you just need to get the rough draft out there before it is forgotten. The key is to refactor early and often. Waiting until later accumulates technical debt.

Scott Hanselman

June 14, 2008 1:07

I often find a liberal sprinkling of TODO comments helps me to remember that I maybe should come back and refactor something.

http://icr.ac.webfusion.co.uk/

June 14, 2008 3:57

Great post. I view this mind set as a habit or "design heuristic" that is not discussed enough when it comes to developing clean code. In some respects these habits are easier to acquire than GoF patterns as this thought process you described can be carried out on a daily basis. Sort of like warm up exercises. As a developer who is mentoring a team of budding programmers, I think this approach should be a prep before moving on to the GoF patterns.

David Robbins

June 14, 2008 4:34

Nick

...Paul Vick was talking about this sort of thing for a future version of VB and

Rory

June 14, 2008 4:35

Ok I thought that might not work ... the preview indicated that I somehow misunderstood the url syntax....

The urls were http://www.panopticoncentral.net/archive/2008/05/05/23212.aspx and http://www.panopticoncentral.net/archive/2008/05/14/23348.aspx

Rory

June 14, 2008 6:19

There's a lot that could be said about attributes here, too... Last night I was faced with the problem of having to convert strings to Enums... the problem was, the strings that I was receiving were RosettaNet enumerations, and they they conveniently chose to define their values as nothing that resembles computer language tokens (things such as "6 Pack" and "Waffle Pack" - and even these are some of the tamer entries). So I ended up writing my enums like this:

public enum GlobalProductPackageCode
{
    [Description("6 Pack")] SixPack,
    [Description("Waffle Pack")] WafflePack,
    ...and so on for about 50 more entries
}

and then used reflection to locate the appropriate enum value. This was far superior to writing a switch for the 50-odd entries for the PackageCode or even worse the several hundred (I think) of ways to measure things (Hertz, Ounce, Troy Ounce, Pound, etc).

In this case, I think the description makes much more sense than creating an artificial statically-coded lookup table, as it let's me now do two-way translation from description to enum and enum back to description in a clean way, and the two values are associated in the same physical location in code. I could have done some caching, ending up with the same type of Dictionary lookup to handle any performance concerns over reflection, but my needs weren't that demanding.

This was nothing new - there are several examples around the net of people doing this... The neat thing (for me, anyway) was that I originally starting writing separate functions for each of the enums I had, leading to the same issues you are describing... But before I got to #3, I realized this was ridiculous and I ended up writing a generic version so I could call:

string gptcDescription = GetGPTCDescription(...);
GlobalProductPackageCode gptc = EnumHelper.ParseDescription<GlobalProductPackageCode>(gptcDescription);

(I'll leave the implementation up to the reader.)

-mdb

Michael Bray

June 14, 2008 9:17

Great post. I would like to see more like these. You are using code rush and resharper on the same machine at the same time? Does this work well for you?

Mike Lasseter

June 14, 2008 9:34

Great post! I'm working my way up from junior dev myself, so it's great to see something more geared to my level when much of the .NET blogging community often writes on topics that are a bit beyond my pay grade. (Not to say they aren't great or interesting, however)

Do you have any recommendations on other resources for this audience? I'm reading blogs and working my way through the written classics on programming, but I still often find myself getting hung up on implementations (like a recent architecture problem of mine)
-Scott

Scott Parker

June 14, 2008 11:46

Great stuff. I'd never noticed the StringComparison flags added in .NET 2.0. I got rid of a bunch of ToLower's and got a 25% performance boost!

Mike Stockdale

June 14, 2008 13:51

Great idea on doing the "Back to Basics" series. We've been talking about how to keep our .NET user group useful for both newbies and veterans. We might start doing something like this briefly at the beginning of each meeting.

Michael Paladino

June 14, 2008 17:18

@Scott Parker:

Check out the book .NET Domain-Driven Design with C#: Problem - Design - Solution by Tim McCarthy (ISBN 0470147563). I found it very useful for questions about architecture. He walks the reader through thinking about an entire application and coding it, so the parts all fit together and you can see it all, top to bottom. The code is available for download, of course. Good book.

Vikram

June 15, 2008 14:17

Wouldn't a trie or even Aho-Corrasick work best for the first example? It would quickly search all the keywords in the same time and just assign to wt the enum attached to the last node in the chain.

Siderite

June 15, 2008 16:45

nice article Scott.Hope to see more of it :)

http://panchi2131.myopenid.com/

June 15, 2008 18:09

@Vikram:

Oooh, that looks really enticing. Hope you get commission, as there's one coming my way.

Seriously though, thanks for the recommendation. That seems to be exactly what I was looking for.
-Scott

Scott Parker

June 15, 2008 18:27

Brilliant post. I particularly approve of the comparison to using "like" in every sentence. Perfect.

You can codify any idea with the "bread and butter" constructs, but there's certainly a tipping point at which it all becomes excessive and flat-out confusing.

I'm looking at 200~ lines of nested conditionals right now. It's repellent, but I understand how it came about. It's the path of least resistance for anyone to tack on "just one or two more conditionals." Like, it's not that hard, you know?

Looking forward to continued posts in this category!

Chuck Rector

June 15, 2008 23:48

I had a similiar discusion with at colleague of mine only a few days ago which let to a discusion on parsing enums. We realized that some of it might not be so intuitive

For one default(T) where T is a enum might not be a defined value, that is Enum.IsDefined(typeof(T),default(T)) might be false.

RuneFS

June 16, 2008 0:12

.Net Enums are a code smell - you probably shouldn't be using them (at least until a future version of C# implements the OO Enums from Java).

So your code should be something like:


class Weather {
  public static readonly Weather Fog = new Weather();
  public static readonly Weather Haze = new Weather();
  public static readonly Weather Dust = new Weather();
  public static readonly Weather Rain = new Weather();
  public static readonly Weather Shower = new Weather();

  private static readonly Dictionary<string, Weather> _allWeather = new Dictionary.....

  private Weather() {}

  public static Weather Parse(string serialized) {
    return _allWeather[serialized];
  }
}

[Aside: Using a Map is just a new fangled way of saying table-driven logic from yesteryear. And surprisingly, if you're switching on ints, the C# compiler creates an MSIL switch() which is a fast table-lookup system (it rearranges the cases and inserts NOP cases so the int is an index into the table).]

RichB

June 16, 2008 5:50

I'd just like to add my voice to those requesting a series of these - for some reason I found this much more interesting and readable than the The Weekly Source Code. Looking forward to the next one...

Jon

June 16, 2008 5:58

I really apprecaite what this post is trying to say. I cannot count how many times I have looked at code that someone else wrote and found ways to take it from 300 lines in a function to 1/3rd that just by adding a helper function, or rethinking something into a more generic format.

At the end of the day though many times code quality is more a reflection of "time allocated" then developer skill. Its easy to fix code after the fact but many times shipping is quality :).

Dmitry

June 16, 2008 6:02

This is the perfect article for me to pass to my team of fellow developers. Our code base has had soooo many developers over the years and some of the most awful switch statement abuse seen by mankind. Thanks for posting this article as I'm sure reading it will make us better.

Michael Reynolds

June 16, 2008 6:17

I assume that it's just an artifact of example code, but it should be mentioned that building the dictionary on every call to that method is a serious performance hit. It should be built once at startup, and just referenced after that.

James Curran

June 16, 2008 7:40

Having been out of college for about 2 and a half years now, I've taken to reading programming blogs to "fill in the blanks" of my education- It's articles like this that help me add perspective to the way I'm programming, especially since CS courses tend to grade by pumping your application through a test script and diffing the output. If it met test cases, 100%! Nobody ever gave a lecture on how to avoid writing giant if or switch blocks. Which is to say, thanks for a solid post. They do wonders for my self-education process:D

-Alex

callingshotgun

June 16, 2008 9:42

Awesome article, Scott. I've fallen into this kind of trap before and was just starting to find ways around it. You've probably hastened my escape from a poor practice greatly.

Owen Pellegrin

June 16, 2008 10:05

Hi Scott. A total side note here, but I just wanted to highlight a line that I think is a gem in this article:

"We were talking this through step by step to explain my "from point a to point d" wait of thinking. I tend to skip b and c, so it's useful to show these incremental steps, kind of like showing all your work in Math class when doing long division."

I think showing your work is so important when explaining solutions, and I seem to rarely see it online or in real life.

I find that the good to great developers go straight to step d when explaining a solution. I think that's because they see the problem so clearly, that the proceeding steps are trivial, if not invisible to them. That's not a bad thing, it's intelegent and efficient. But, to show all your work, whether in a blog or in a code review or pair programming session can be invaluable, not only to your audience/team mates, but to yourself as well (catch errors and assumptions, etc).

I'm very much a middle of the road journeyman in the bell curve distribution of developers, so sometimes I consume solutions from those ahead of me, and sometimes I explain things to those behind me. I find in both cases that the more "work shown" the better for all involved. I think it is kind of like teaching a man to fish vs just giving a man step D.

Just wanted to say I really appreciate the extra effort, I think it goes a long way. I hope more folks adopt a "show your work" attitude to posting and coding.

Thanks,
-Andy

Andy

June 16, 2008 10:24

This is INCREDIBLE! I am blown away at my ignorance. I have love the usage of the dictionary object here. I am going to use this!! What an efficient refactor.

Great post! Thanks for sharing.

Mike

June 17, 2008 7:09

Nice post that started a good discussion. It's a great reminder of several things:
1. If you see the same code over and over again, it's time to refactor.
2. Constantly studying the capabilities of your language (increasing your vocabulary) can only make you a better code writer (speaker). You learn to say things more succinctly and efficiently.
3. A comment about what the code is doing and why, written before actually coding, can sometimes help you write the code better. Separating the comment into several lines, might have helped your friend start off better (maybe at step c or d). For Example:


//Get attribute value from xml node.
//See if my string is in the value, ignoring case.
//If so, assign the return value to the corresponding enum value.

As soon as you write something like this, you might think to a) get the attribute value once instead of repeatedly, b) see if there's a comparison function which ignores the case, c) see if there's a data structure you can use to equate the two pieces of information you have so you can do a generic lookup. Although you might not end up with the most elegant of all possible solutions, you'll probably start from a better place.

Once you start by thinking at a conceptual level, you start thinking about writing code instead of just writing it. Not only that, but maybe when you're done, you'll have some documentation for the next guy who has to figure out what you were thinking.

Bill

June 17, 2008 11:02

Great post Scott-

I think one of the best ways to learn is to see how other people code. I would like to see more of the "Back to Basics" series.

Justin

Justin Kohnen

June 18, 2008 23:52

Scott, neat trick to shrink the lines of code. I can tell you this though. I feel sorry for the people who end up maintaining the code left in your wake, must take them hours on end to figure out all your cute tricks! Something has to be said for just being straight forward for the sake of the mere mortals that follow behind you ;-)

Dave

June 19, 2008 10:40

I like this article, in my mind this feels more like transition statements from an equilibrium rest point (A) to a destination equilibrium point (B) where the transistion from the two points is defined as a function that can facilitate the jump.

The reason I don't use words like maps, is because some day we might not travel in code or in life, through space but may require some other concept of transition to get there. You could replace transition with "Transform" I suppose.

Great stuff!!

John Edwards

June 23, 2008 2:10

Good article; some thoughts:

"but he's on 2.0."

Well, if you're going to use C# 3 syntax, note that you can still use C# 3 extension methods on .NET 2.0; just declare:


namespace System.Runtime.CompilerServices {
    public class ExtensionAttribute : Attribute { }
}

Also - re the use of Dictionary<,> here - you're not actually using it as a Dictionary - just a list; you could just as easily use a List<T>/T[] for some T. For the data in question this isn't an issue of course. An anon-type (for T) would work well, but then it couldn't be stored in a static field - but maybe (euch) KeyValuePair<,> [not pretty...].

Getting a bit more esoteric - and stress I don't for a moment suggest that it is a good idea in this case - but if you *do* have a scenario where you want the power of anon-types, but available on a static field, then one option is captured variables - for example, the following works fine on .NET 2.0 (with System.Func<,> declared):


    static Func<string, MyEnum?> stringToMyEnum;
    static Program() { // static ctor
        // substrings to match to enums
        var lookup = new[] {
            new {Key="fog", Value = MyEnum.Fog},
            new {Key="haze", Value = MyEnum.Fog},
            new {Key="fred", Value = MyEnum.Funky},
        };
        // create a delegate to lookup match from array
        stringToMyEnum = ws => {
            foreach (var pair in lookup) {
                // declared (not shown) as extension method
                if (ws.ContainsIgnoreCase(pair.Key)) {
                    return pair.Value;
                }
            }
            return null; // default
        };
    }

Marc Gravell

June 23, 2008 9:57

Good points and good discussion. Your if then else refactoring changes the logic though. Consider the case of someAttr="fog rain". The original logic would return MyEnum.Rainy but after your if then else refactorings would return MyEnum.Fog. Maybe the developer wanted these falling through? This is why good unit tests are so important.

Evan

Comments are closed.