Back to Basics - Trust Nothing as User Input Comes from All Over
There was an interesting bug recently that was initially blamed on Bing. Basically someone searched for something, clicked the first result and got a YSOD (Yellow Screen of Death.)
They were searching Bing.com for this term:
"Eugene Myers's O(ND) Diff algorithm"
When they clicked on a link that looked like a good result, they got a scary YSOD like this:
Server Error in '/' Application.
'/t:tracking/t:referrer[@url='http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh']' has an invalid token.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Xml.XPath.XPathException: '/t:tracking/t:referrer[@url='http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh']' has an invalid token.
Source Error:
Stack Trace:
[XPathException: '/t:tracking/t:referrer[@url='http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh']' has an invalid token.]
MS.Internal.Xml.XPath.XPathParser.ParseStep(AstNode qyInput) +539
...snip...
Eek! That is scary. Because the user clicked a link on Bing and the next thing they got was an error, they figured it was Bing that caused it. Well, indirectly. What went wrong here?
The target site the user was visiting is tracking their visitors, as many sites do and should. When you visit a site from another, HTTP includes a header called "Referer" (yes, it's actually misspelled in the spec, and is misspelled in reality. Welcome to the Web.)
Since they were visiting from here:
http://www.bing.com/search?q=eugene myers's o(nd) diff algorithm&form=qblh
...then that was referrer. However, the trouble happened when the program took the HTTP Referrer blindly and built up an XPath using the HTTP referrer header directly as input.
It appears that this website is storing its tracking details in an XML file, and the programmer is trying to do a lookup on the referrer so he/she can increment a visit.
Notice that they've used a single quote around the string, but the original search included an additional quote in the string "Engine Myers's." The resulting concatenated XPath isn't valid XPath, and the system fails.
Just in case you care, the same problem happens to this poor site when searching from Google:
http://www.google.com/search?q=Eugene+Myers's+O(ND)+Diff+algorithm
Yields:
Server Error in '/' Application.
'/t:tracking/t:referrer[@url='http://www.google.com/search?q=eugene myers's o(nd) diff algorithm']' has an invalid token.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Xml.XPath.XPathException: '/t:tracking/t:referrer[@url='http://www.google.com/search?q=eugene myers's o(nd) diff algorithm']' has an invalid token.
What's the Back to Basics lesson? Well, there's a few:
- Trust no user input.
- Input comes from many locations.
- There's explicit input like Form POSTs, but also implicit input like HTTP Referers and Cookies.
- "Injection" attacks aren't just about SQL Inject.
- You can inject things into XPath and Regular expressions just as easily and possibly bring down or hang sites, as well as potentially expose private information.
- Any time you take a string from input of any kind and concatenate it into any language you're giving bad people to be bad.
Interesting (and obscure) stuff!
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
And here comes a quote from Douglas Adams:
The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.
On my current project my paranoia of exactly such a thing happening has produces some strange user input validation. So I am really contemplating where the golden middle is.
Oh yes, I forgot, this is a Webforms app we're talking about. God forbid we should have to do some actual work. ;)
By the way -- exposing the stack trace like this is a far worse security risk than having your XPath parser choke.
Good reminder, however, that user input is not to be trusted and that you should always parse and validate any input from any source with exceptional paranoia; not just to prevent hackers, but in an attempt to catch as many of these little types of gotchas as possible.
I always enjoy conversations with developers, QA, and product managers that explore the edge cases with questions like "What happens if we give it all spaces, high-order characters, all quotes, or just a whole bunch of junk?" It seems to me that many "edge cases" occur all too commonly to be ignored as often as they are.
<compilation debug="true"/>
is on within an internet facing application.
Also wonder why they never thought of installing ELMAH.
Who knows, maybe they will get some feedback from a user with Scott's blog post and fix their mistakes.
J.Ja
People think that determining (say) the DB to connect to based on hostname is a fine idea. What they're forgetting is that the hostname is supplied by the client too.
"SELECT something FROM table WHERE column = '" + userInput.Replace("'", "''") + "'"
Sure, we can use XLinq, but since there's no IQueryable support, it seems that the code could end up traversing the entire XML DOM to find the matching node. (I may be wrong on this - please feel free to correct me if I am!)
Microsoft did start to write an XQuery implementation for .NET, but the site (xqueryservices.com) has been down since 2003. According to the XML Team blog, they were considering Linq to XQuery for a future release, but that seems to have dropped off the radar - the blog makes no mention of it after February 2007.
IMHO, we either need IQueryable support in XLinq, so that a query for a node gets compiled to a proper XPath query with robust literal escaping, or we need an XPath equivalent to the SQL command and parameter constructs.
Bing looks like it was technically "ready" meaning it served up results to search phrase, but it looks like they missed a lot of the polish thats required to really impress.
I've done searches before from Google Chrome that don't get escaped correctly (which I think is Chrome's fault actually), but Bing just displays a blank page -- no errors, no suggestions, no fixes...
Comments are closed.