Back to Basics - Life After If, For and Switch - Like, a Data Structures Reminder
I just had a great one on one coding learning session with a good friend of mine over lunch. He's trying to take his coding skills to the "next level." Just as we plateau when we work out physically, I think we can plateau when coding and solving problems. That one of the reasons I try to read a lot of code to be a better developer and why I started The Weekly Source Code. He said it was OK to write about this as someone else might benefit from our discussion. The code and problem domain have been changed to protect the not-so-innocent.
One of the things that we talked about was that some programmers/coders/developers have just a few tools in their toolbox, namely, if, for, and switch.
I'm not making any judgements about junior devs vs. senior devs. I'm talking about idioms and "vocab." I think that using only if, for and switch is the Computer Programmer equivalent of using "like" in every sentence. Like, you know, like, he was all, like, whatever, and I was like, dude, and he was, like, ewh, and I was like meh, you know?
When speaking English, by no means do I have a William F Buckley, Jr.-sized vocabulary, nor do I believe in being sesquipedal for sesquipedalianism's sake, but there is a certain sparkly joyfulness in selecting the right word for the right situation. I'm consistently impressed when someone can take a wordy paragraph and condense it into a crisp sentence without losing any information.
Refactoring code often brings me the same shiny feeling. Here's a few basic things that my friend and I changed in his application over lunch, turning wordy paragraphs into crisp sentences. Certainly this isn't an exhaustive list of anything, but perhaps it can act as a reminder to myself and others to be mindful and think about solving problems beyond, like, if and for and, like, switch, y'know?
Massives Ifs are Sometimes Maps
He had some code that parsed a ridiculous XML document that came back from a Web Server. That the format of the XML was insane wasn't his fault, to be sure. We all have to parse crap sometimes. He had to check for the existence of a certain value and turn it into an Enum.
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("fog") >= 0)
{
wt = MyEnum.Fog;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("haze") >= 0)
{
wt = MyEnum.Haze;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("dust") >= 0)
{
wt = MyEnum.Haze;
}
if (xmlNode.Attributes["someAttr"].Value.ToLower().IndexOf("rain") >= 0)
{
wt = MyEnum.Rainy;
}
...and this went on for 40+ values. There's a few problems with this.
First, he's using IndexOf() and ToLower() to when he's trying to say "ignoring case, does this string contain this other string?" Using ToLower() for a string comparison is always a bad idea, and not just because of the Turkish i problem (details here, here, here and here). Be sure to check out the Recommendations for Strings in .NET.
We could make this simpler and more generic with a helper method that we'll use later. It expresses what we want to do pretty well. If we were using .NET 3.5 we could make this an extension method, but he's on 2.0.
private static bool ContainsIgnoreCase(string s, string searchingFor)
{
return s.IndexOf(searchingFor, StringComparison.OrdinalIgnoreCase) >= 0;
}
Second, he's indexing into the Attributes collection over and over again, and he's hasn't "else cased" the other ifs, so every indexing into Attributes and every other operation runs every time. He can do the indexing once, pull it out, then check his values.
string ws = xmlNode.Attributes["someAttr"].Value;
if (ContainsIgnoreCase(ws, "cloud"))
wt = MyEnum.Cloudy;
else if (ContainsIgnoreCase(ws, "fog"))
wt = MyEnum.Fog;
else if (ContainsIgnoreCase(ws, "haze"))
wt = MyEnum.Haze;
else if (ContainsIgnoreCase(ws, "dust"))
wt = MyEnum.Dust;
else if (ContainsIgnoreCase(ws, "rain"))
wt = MyEnum.Rainy;
else if (ContainsIgnoreCase(ws, "shower"))
...and again, as a reminder, this goes on for dozens and dozens of lines.
We were talking this through step by step to explain my "from point a to point d" wait of thinking. I tend to skip b and c, so it's useful to show these incremental steps, kind of like showing all your work in Math class when doing long division.
At this point, I pointed out that he was clearly mapping the strings to the enums. Now, if the mapping was direct (and it's not for a variety of horrible many-to-one reasons that are spec-specific as well as that this a "contains" operations rather than a direct equality comparison) he could have parsed the string an enum like this where the last parameter ignores case:
wt = (MyEnum)Enum.Parse(typeof(MyEnum), ws, true);
However, his mapping has numerous exceptions and the XML is messy. Getting one step simpler, I suggested making a map. There's a lot of folks who use Hashtable all the time, as they have for years in .NET 1.1, but haven't realized how lovely Dictionary<TKey, TValue> is.
var stringToMyEnum = new Dictionary<string, MyEnum>()
{
{ "fog", MyEnum.Fog},
{ "haze", MyEnum.Fog},
{ "fred", MyEnum.Funky},
//and on and on
};
foreach (string key in stringToMyEnum.Keys)
{
if (ContainsIgnoreCase(ws, key))
{
wt = stringToMyEnum[key];
break;
}
}
Because his input data is gross, he spins through the Keys collection and calls ContainsIgnoreCase on each key until settling on the right Enum. I suggested he clean up his input data to avoid the for loop, turning the whole operation into a simple mapping. At this point, of course, the Dictionary can be shoved off into some global readonly static resource.
string ws = xmlNode.Attributes["someAttr"].Value;
//preproccessing over ws to tidy up
var stringToMyEnum = new Dictionary<string, MyEnum>()
{
{ "fog", MyEnum.Fog},
{ "haze", MyEnum.Fog},
{ "fred", MyEnum.Funky},
//and on and on
};
wt = stringToMyEnum[key];
When Switches Attack
Often large switches "just happen." What I mean is that one introduces a switch to deal with some uncomfortable and (apparently) unnatural mapping between two things and then it just gets out of hand. They tell themselves they'll come back and fix it soon, but by then it's grown into a hairball.
My buddy had a method that was supposed to remove an icon from his WinForms app. The intent is simple, but the implementation became another mapping between a fairly reasonable Enum that he couldn't control, and a number of named icons that he could.control.
The key here is that he could control the icons and he couldn't easily control the enum (someone else's code, etc). That the mapping was unnatural was an artifact of his design.
The next thing he knew he was embroiled in a switch statement without giving it much thought.
private void RemoveCurrentIcon()
{
switch (CurrentMyEnumIcon)
{
case MyEnum.Cloudy:
CustomIcon1 twc = (CustomIcon1)FindControl("iconCloudy");
if (twc != null)
{
twc.Visible = false;
this.Controls.Remove(twc);
}
break;
case MyEnum.Dust:
CustomIcon2 twd = (CustomIcon2)FindControl("iconHaze");
if (twd != null)
{
twd.Visible = false;
this.Controls.Remove(twd);
}
break;
//etc...for 30+ other case:
There's a few things wrong here, besides the switch is yucky. (Yes, I know there are uses for switches, just not here.) First, it's useful to remember when dealing with a lot of custom classes, in this case CustomIcon1 and CustomIcon2 that they likely have a common ancestor. In this case a common ancestor is Control.
Next, he can control the name of his controls. For all intents in his WinForms app, the Control's name is arbitrary. Because his controls map directly to his Enum, why not literally map them directly?
private void RemoveCurrentIcon(MyEnumIcon CurrentMyEnumIcon)
{
Control c = FindControl("icon" + CurrentMyEnumIcon);
if(c != null)
{
c.Visible = false;
Controls.Remove(c);
}
}
The recognition of the shared base class along with the natural Enum to Control name mapping turns 160 lines of switch into a single helper method.
It's very easy, seductive even, to try to explain to the computer to "do it like this" using if, for and switch. However, using even the basic declarative data structures to describe the shape of the data can help you avoid unnecessary uses of the classic procedural keywords if, for and switch.
FINAL NOTE: We worked for a while and turned and 8100 line long program into 3950 lines. I think we can squeeze another 1500-2000 lines out of it. We did the refactoring with the new Resharper 4.0 and a private daily build of CodeRush/Refactor!Pro. I'll do some writeups on both wonderful tools in the coming weeks.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
i ended up with a big honking switch, just today that had a lot of:
case MyEnum.Cloudy: return "I think it will rain";
case MyEnum.Drizzle: return "there's some light drizzle already";
}
but maybe declaring a dictionary would've been more succinct.
wt = (from key in stringToMyEnum.Keys where ContainsIgnoreCase(ws, key) select stringToMyEnum[key]).SingleOrDefault();
I personally struggle getting away from the {if, for, switch } coding thought process. Even when I refactor code to get it the way you describe, my mind is still thinking "switch". How do you suggest getting past this block? Or do you think it's reasonable to code the switch and then refactor?
I liken this dilema to learning a new language (verbal or code I guess). I'm a fluent English speaker, but I do know some French. When I come up with a French sentence I'm constructing it from English... I'm not thinking in French. Then again, I'm not a fluent French speaker. Je ne parle pas francais. Do you think this analogy works?
Scott
I think it's only when you know there will be growth in a particular area like that is it worth it to implement this type of thing up front....that or you know that it will be much more difficult to change it to a more extensible method down the road (which is not often for this particular problem IMO).
James
var stringToMyEnum = new Dictionary<string, MyEnum>() { { "fog", MyEnum.Fog}, { "haze", MyEnum.Fog}, { "fred", MyEnum.Funky} };
I think the language analogy is appropiate. I have being studying Japanese on and off for a number of years and I still go through the laborious English to Japanese translation process but every once and a while I find myself thinking in japanese (though these moments are still few and far between). Coding is much the same. It is a language with which we do things. Sometimes our writing is long winded and other times we can code per haiku.
Bill
The urls were http://www.panopticoncentral.net/archive/2008/05/05/23212.aspx and http://www.panopticoncentral.net/archive/2008/05/14/23348.aspx
public enum GlobalProductPackageCodeand then used reflection to locate the appropriate enum value. This was far superior to writing a switch for the 50-odd entries for the PackageCode or even worse the several hundred (I think) of ways to measure things (Hertz, Ounce, Troy Ounce, Pound, etc).
{
[Description("6 Pack")] SixPack,
[Description("Waffle Pack")] WafflePack,
...and so on for about 50 more entries
}
In this case, I think the description makes much more sense than creating an artificial statically-coded lookup table, as it let's me now do two-way translation from description to enum and enum back to description in a clean way, and the two values are associated in the same physical location in code. I could have done some caching, ending up with the same type of Dictionary lookup to handle any performance concerns over reflection, but my needs weren't that demanding.
This was nothing new - there are several examples around the net of people doing this... The neat thing (for me, anyway) was that I originally starting writing separate functions for each of the enums I had, leading to the same issues you are describing... But before I got to #3, I realized this was ridiculous and I ended up writing a generic version so I could call:
string gptcDescription = GetGPTCDescription(...);(I'll leave the implementation up to the reader.)
GlobalProductPackageCode gptc = EnumHelper.ParseDescription<GlobalProductPackageCode>(gptcDescription);
-mdb
Do you have any recommendations on other resources for this audience? I'm reading blogs and working my way through the written classics on programming, but I still often find myself getting hung up on implementations (like a recent architecture problem of mine)
-Scott
Check out the book .NET Domain-Driven Design with C#: Problem - Design - Solution by Tim McCarthy (ISBN 0470147563). I found it very useful for questions about architecture. He walks the reader through thinking about an entire application and coding it, so the parts all fit together and you can see it all, top to bottom. The code is available for download, of course. Good book.
Oooh, that looks really enticing. Hope you get commission, as there's one coming my way.
Seriously though, thanks for the recommendation. That seems to be exactly what I was looking for.
-Scott
You can codify any idea with the "bread and butter" constructs, but there's certainly a tipping point at which it all becomes excessive and flat-out confusing.
I'm looking at 200~ lines of nested conditionals right now. It's repellent, but I understand how it came about. It's the path of least resistance for anyone to tack on "just one or two more conditionals." Like, it's not that hard, you know?
Looking forward to continued posts in this category!
For one default(T) where T is a enum might not be a defined value, that is Enum.IsDefined(typeof(T),default(T)) might be false.
So your code should be something like:
class Weather {
public static readonly Weather Fog = new Weather();
public static readonly Weather Haze = new Weather();
public static readonly Weather Dust = new Weather();
public static readonly Weather Rain = new Weather();
public static readonly Weather Shower = new Weather();
private static readonly Dictionary<string, Weather> _allWeather = new Dictionary.....
private Weather() {}
public static Weather Parse(string serialized) {
return _allWeather[serialized];
}
}
[Aside: Using a Map is just a new fangled way of saying table-driven logic from yesteryear. And surprisingly, if you're switching on ints, the C# compiler creates an MSIL switch() which is a fast table-lookup system (it rearranges the cases and inserts NOP cases so the int is an index into the table).]
At the end of the day though many times code quality is more a reflection of "time allocated" then developer skill. Its easy to fix code after the fact but many times shipping is quality :).
-Alex
"We were talking this through step by step to explain my "from point a to point d" wait of thinking. I tend to skip b and c, so it's useful to show these incremental steps, kind of like showing all your work in Math class when doing long division."
I think showing your work is so important when explaining solutions, and I seem to rarely see it online or in real life.
I find that the good to great developers go straight to step d when explaining a solution. I think that's because they see the problem so clearly, that the proceeding steps are trivial, if not invisible to them. That's not a bad thing, it's intelegent and efficient. But, to show all your work, whether in a blog or in a code review or pair programming session can be invaluable, not only to your audience/team mates, but to yourself as well (catch errors and assumptions, etc).
I'm very much a middle of the road journeyman in the bell curve distribution of developers, so sometimes I consume solutions from those ahead of me, and sometimes I explain things to those behind me. I find in both cases that the more "work shown" the better for all involved. I think it is kind of like teaching a man to fish vs just giving a man step D.
Just wanted to say I really appreciate the extra effort, I think it goes a long way. I hope more folks adopt a "show your work" attitude to posting and coding.
Thanks,
-Andy
Great post! Thanks for sharing.
1. If you see the same code over and over again, it's time to refactor.
2. Constantly studying the capabilities of your language (increasing your vocabulary) can only make you a better code writer (speaker). You learn to say things more succinctly and efficiently.
3. A comment about what the code is doing and why, written before actually coding, can sometimes help you write the code better. Separating the comment into several lines, might have helped your friend start off better (maybe at step c or d). For Example:
//Get attribute value from xml node.
//See if my string is in the value, ignoring case.
//If so, assign the return value to the corresponding enum value.
As soon as you write something like this, you might think to a) get the attribute value once instead of repeatedly, b) see if there's a comparison function which ignores the case, c) see if there's a data structure you can use to equate the two pieces of information you have so you can do a generic lookup. Although you might not end up with the most elegant of all possible solutions, you'll probably start from a better place.
Once you start by thinking at a conceptual level, you start thinking about writing code instead of just writing it. Not only that, but maybe when you're done, you'll have some documentation for the next guy who has to figure out what you were thinking.
I think one of the best ways to learn is to see how other people code. I would like to see more of the "Back to Basics" series.
Justin
The reason I don't use words like maps, is because some day we might not travel in code or in life, through space but may require some other concept of transition to get there. You could replace transition with "Transform" I suppose.
Great stuff!!
"but he's on 2.0."
Well, if you're going to use C# 3 syntax, note that you can still use C# 3 extension methods on .NET 2.0; just declare:
namespace System.Runtime.CompilerServices {
public class ExtensionAttribute : Attribute { }
}
Also - re the use of Dictionary<,> here - you're not actually using it as a Dictionary - just a list; you could just as easily use a List<T>/T[] for some T. For the data in question this isn't an issue of course. An anon-type (for T) would work well, but then it couldn't be stored in a static field - but maybe (euch) KeyValuePair<,> [not pretty...].
Getting a bit more esoteric - and stress I don't for a moment suggest that it is a good idea in this case - but if you *do* have a scenario where you want the power of anon-types, but available on a static field, then one option is captured variables - for example, the following works fine on .NET 2.0 (with System.Func<,> declared):
static Func<string, MyEnum?> stringToMyEnum;
static Program() { // static ctor
// substrings to match to enums
var lookup = new[] {
new {Key="fog", Value = MyEnum.Fog},
new {Key="haze", Value = MyEnum.Fog},
new {Key="fred", Value = MyEnum.Funky},
};
// create a delegate to lookup match from array
stringToMyEnum = ws => {
foreach (var pair in lookup) {
// declared (not shown) as extension method
if (ws.ContainsIgnoreCase(pair.Key)) {
return pair.Value;
}
}
return null; // default
};
}
Comments are closed.