Scott Hanselman

Hanselminutes Podcast 134 - StackOverflow uses ASP.NET MVC - Jeff Atwood and his technical team

October 19, 2008 Comment on this post [34] Posted in ASP.NET | ASP.NET MVC | Podcast
Sponsored By

stackoverflow-logo-250 My one-hundred-and-thirty-fourth podcast is up.

Well, actually a few weeks ago, but I totally forgot to update my website with the details. You'd think somewhere around 100 shows I'd had automated this somehow. Hm. If I only I know a programmer and the data was available in some kind of universal structure syndication format…;)

Scott chats with Jeff Atwood of CodingHorror.com and most recently, StackOverflow.com. Jeff and Joel Spolsky and their technical team have created a new class of application using ASP.NET MVC. What works, what doesn't, and how did it all go down?

Subscribe: Subscribe to Hanselminutes Subscribe to my Podcast in iTunes

Do also remember the complete archives are always up and they have PDF Transcripts, a little known feature that show up a few weeks after each show.

Telerik is our sponsor for this show!

Building quality software is never easy. It requires skills and imagination. We cannot promise to improve your skills, but when it comes to User Interface, we can provide the building blocks to take your application a step closer to your imagination. Explore the leading UI suites for ASP.NET and Windows Forms. Enjoy the versatility of our new-generation Reporting Tool. Dive into our online community. Visit www.telerik.com.

As I've said before this show comes to you with the audio expertise and stewardship of Carl Franklin. The name comes from Travis Illig, but the goal of the show is simple. Avoid wasting the listener's time. (and make the commute less boring)

Enjoy. Who knows what'll happen in the next show?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook bluesky subscribe
About   Newsletter
Hosting By
Hosted on Linux using .NET in an Azure App Service
October 19, 2008 16:25
Hey Now Scott,

Another Great Show. Codinghorror 2.0!

Thanks for the info,
Catto


October 19, 2008 17:14
I really enjoyed this episode and especially a few of the harder questions for Jeff. I do hope he will open up and share more of the basic structure and setup of the site on his blog and SO podcasts. Not so much the source at least the software and hardware stack.

Hopefully we can catch them on DotNetRocks soon and listen to an even more detailed discussion.
October 19, 2008 18:40
Brian: Jeff talked quite a bit about the Soft- and Hardware on the Stackoverflow Podcasts and the blog. See http://blog.stackoverflow.com/category/server/

I find the Podcast interesting because on one side, one could argue that Jeff made quite a few WTF?? moments (Production, Dev and SQL Server running on the same machine?), on the other hand I think he has a point there: Why buy additional servers just now? Startups do not succeed because they chuck out craploads of money for an infrastructure that's oversized by a factor of 10, startups do succeed by actually bringing outproducts and building a community and then scaling the infrastructure when needed.
October 19, 2008 23:22
Great podcoast. Answered some of my pondering questions in my head. Excellent site stackoverflow is. I look forward to using it more. Anyway, back to some code and the stackoverflow. :)
October 20, 2008 12:01
Re: Gzipping cache entries

It's only the same as the meme that goes around every few years that NTFS compression speeds up disk access if the ratio of CPU to HDD tips.

Re: Open RDP ports

If your hosting supplier is a cloud supplier, you have a couple of choices to admin your server - SSH or RDP. Did you really expect StackOverflow to have a leased line to their data center?
October 20, 2008 14:04
Great podcast, I really love what Jeff Atwood and his team have accomplished with Stackoverflow and the way they've done it, lots of interesting things to think about about ASP MVC and web development in for large consume in general.
BG
October 20, 2008 17:04
Hi Scott,

I've launched a few sites myself running the whole thing on one machine. This is not due to not knowing how to do it correctly, but financial realities. When you are deploying a free site, and don't have a revenue story, it can be king of difficult to justify the expense of a fully segmented production platform. They would probably not have given us the benefit of overstack if their barrier of entry was a multi server deployment.

I think you were kind of a little snobbish and harsh, when it was cleared that he knew how to do it right, but did not have the funds to invest in it. After all it is a free question asking site, not a financial institution.

Love your podcast, but have to let you know when I don't agree with you.


October 20, 2008 17:16
Nice Show,

But c'mon, the way you talked about windows was like the olden days when admin's actually automated nightly reboots on machines... i think you can secure production/dev pretty well on modern windows machines--

I find SO wonderfully refreshing, & a nice emema to those silly "scalability zeolots"

October 20, 2008 19:29

I too had a production server with the web server and database on the same server due to economics. It was also a test bed for me to learn about security. I had this running for years under NT 4.0 and 2000 and the only incident I had was the Slammer. I didn't even have a firewall on it for a long time because firewalls were expensive back then! I had so many people say I am crazy but it worked. I trusted my machine and my experience.

I was up to date with all the Windows patches as soon as they came out. My worry was only the 0day exploits and Slammer was the only one I got hit with. Luckily I was able to contain it.

A web server and database can work very well on the same machine. At least I get rid of any network latency. The trick is to put the database on a different hard drive and the database log on a third hard drive. This way you have 3 drives spinning at the same time for maximum efficiency (OS, database & log). I also use SCSI drives only. They are the fastest with fast seek times. Plus 4G RAM which is the maximum Windows 32bit can use.

However since Jeff is using 64bit, he should be using at least 8G. RAM is so cheap and RAM really makes a big difference. He mentions he's using caching a lot and caches love memory. Using RAM drives can boot performance too. Make sure only temp files or files you don't care about if they get lost reside there.
October 20, 2008 22:22
Agreed, way too harsh for a startup. I mean I don't even see ads on their site. Why on earth would they want to pay for multiple servers out of their own pocket?
October 20, 2008 22:54
Great show again !

For me the biggest WTF moment was when I heard Joel Spolsky freely admit (on the Stackoverflow podcast) that they didn't use a bug tracking application. If it were anybody else I might let this one go by, but Joel 'if you don't use a bugtracking app you're going to hell' Spolsky ???? :-)
October 20, 2008 23:34
Surreal podcast. That really did have a lot of worst practices, but they have a functioning, useful site.

Just a comment about databases -- SQL Server doesn't default to the snapshot isolation mode that they used because it would break legacy applications that depended on readers blocking writers and writers blocking readers. SQL Server 2000, and pretty much all other RDBMS other than Oracle, didn't support a mode where readers weren't blocked by writers and writers weren't blocked by readers. This was a new feature of SQL Server 2005, and IMHO is the only way to go for most new applications.

Oracle has worked this way since at least version 6.0 (1988 time frame). Oracle does not support the nolock keyword because it is not needed. You don't have to make a trade-off between being fast and being correct. The only other RDBMS I know that supported this kind of multi-versioning read consistency was PostgreSQL.
October 21, 2008 12:08
In several cases I've actually seen a significant decrease in performance after moving the database to a dedicated server. The problem were the latency and slow speed of ethernet compared to memory and inter-process communication. As long as your application fits in RAM and CPU, and most do in a quad core 8GB setting, the performance is hard to beat.
Security wise it may not be the smartest configuration though.
October 21, 2008 12:40
In several cases I've actually seen a significant decrease in performance after moving the database to a dedicated server.


Jonas, comments like this scare me.. I too was worried about the cost of inter-machine communication, as we do LOTS (really, LOTS) of tiny queries for every page. Sort of a LINQ 2 SQL thing, I guess..

We're negotiating with the provider now to get a loopback gigabit ethernet connector for the 2nd server as we grow -- this will be the DB server. Do you think that's enough?

Oh, and ironically, it is CHEAPER for us to get a 2nd server than it is to upgrade to 8GB on the single server we have. Lame, but they won't budge on this. Upgrading from 4 GB to 8 GB on the same server will just about double our monthly hosting costs. Might as well just go for server no. 2 with 4 GB at that point..
October 21, 2008 13:24
I too was worried about the cost of inter-machine communication, as we do LOTS (really, LOTS) of tiny queries for every page. Sort of a LINQ 2 SQL thing, I guess..


But as I understand you also do a lot of caching and that might very well save you.

We're negotiating with the provider now to get a loopback gigabit ethernet connector for the 2nd server as we grow -- this will be the DB server. Do you think that's enough?


One could take a wild guess but measuring is the way to go.
October 21, 2008 21:21
Hey, Jeff & Scott, what's up?! An ongoing problem right now is how to get Linq to SQL to scale and to perform well on enterprise-grade applications (i.e., something bigger than the usual RAD examples out there).

Would it be possible for you, Jeff, to open-source your Linq to SQL pattern implementation or for you, Scott, to blog about a suggested implementation? Microsoft at New Zealand implemented some patterns in their Backgroundmotion project, but somehow I don't think it works right.

I would love to know more about stackoverflow's implementation.
October 21, 2008 21:38
Hi Jeff,

Also in the mean time could you put the db on a different spinning disk platter from iis? that would probably ease some disk contention... (if thats not how it is already)
October 22, 2008 2:54
Watch out if someone puts a firewall between your database server and your web server. So-called "wire speed" firewalls usually never come anywhere near doing gigabit ethernet, and performance degrades based on packet sizes. I saw a large drop in performance (90%) when using a quite expensive firewall.
October 22, 2008 3:20
Jeff:

That's silly to pay a recurring extra monthly fee for going from 4GB to 8GB! I use my own servers and I can put whatever hardware I want. It's the same monthly fee. I suggest you go that route instead of being under the mercy of your host. You live in/near San Francisco? There must lots of ISP's where you can colo your own machines.
October 22, 2008 23:40
Scott,
Regarding gzipping cache data:

If they are caching final page output in this way and they are already sending HTTP responses across the wire as gzip,

then gzipping the output cache would make great sense. But only if they are NOT first unzipping when taking the page out of cache and then re-zipping it up again before sending the response.

If they check the 'Accept-Encoding' header to make sure gzip is supported, then they can just send the gzipped output cache directly into the response stream and set the Encoding header with 'gzip'. If Accept-Encoding doesn't include gzip, then the cache would have to be unzipped, but this is a very small percentage of browsers.

All this logic could go into a wrapper to the Output Cache.
What do you think?

BTW: Yahoo Open ID implementation doesn't seem to work. I get a 404 when yahoo tries to redirect back to your site.
October 23, 2008 1:11
Scott,

I think that was my favorite Hanselminutes of all time. What a fantastic show it was. I'm mostly a winforms developer with just a bit of web dev on the side but I totally loved that show. It might have something to do with the fact that I think StackOverflow is a genius idea.

I love how pragmatic the StackOverflow team has been. I often feel guilty about not implementing best practices but it often boils down to cost (either actual cost or dev time cost). I could tell the guys felt bad about some of their architectural decisions but I think it's great to get some high-profile guys out there talking about how they shortcut their design. I often feel a big separation between real-world development and the stuff I read in blogs. I know there are a lot of people doing everything they can to make their designs great but I work for a very small company and when the rubber meets the road I still have to finish the app within the (often too short) timeline and the (often too small) budget.
October 23, 2008 3:50
Great show. Loved the tension between Startup-Mode and Banker-Mode.

Scott's incredulity about some of the worst-practices was hilarious. Podcasting Gold.

And I agree with Abdu:

silly to pay a recurring extra monthly fee for going from 4GB to 8GB!


it be much nicer if you could pay them once for the ram, plus a mildly outrageous installation fee, and then forget about it.




October 24, 2008 3:57
This was a great show. As I listen to you whenever I do my workout I couldn't help but laugh as you guys comments on how at the end of the day the site just was to work. As a self taught developer I can related to the horror stories that happen all the time especially when it comes to scalability. (Like when I broke Access and found out we weren't supposed to use it for handling a corporate storefront. Apparently it maxes out at 20 users and switches into throttling mode.. But SQL server has since solved that problem.) What caching provider did they use for stackoverflow?

Here is my big question today. Caching!!! I am trying to figure out what the best way is to implement caching out of process on a single server so that I can run a local web farm and also distributed caching so that I can cache objects across more than one server. What are the best products (free preferably) to do this and also why has Microsoft not put together an out of process caching provider like they did for session state. This just seems so intuitive to me but it's been missed! Scott Gu should put some time into that.

Thanks,

Andrew

PS. As for expanding the capacity via another server, that might not necessarily work as expected. They will likely get far less than a doubling of performance as there are latency issues between DB and the webserver. I would definitely advocate for more memory first as it won't increase latency and they aren't maxing out the server on the CPU end. We load our entire 4GB DB in memory and it works great on our webserver (yes we use one machine for both, it's probably more common than you think.) and when you have 11GB or memory you can afford to run SQL 2005 and IIS all at the same time. Perhaps you can do a show about some of the more common approaches that the at home developer takes where cost is an issues and the client is clearly more concerned with the deadline than with the security behind the product.
October 24, 2008 17:10
Excellent podcast, Scott. Really enjoyed the discussion of how the StackOverflow team has architected their hardware schema. It's obvious you and Jeff come from different schools of thought (I probably fall somewhere in between myself), neither of which is right or wrong but which each have their own advantages/disadvantages.

I work in the tax and accounting business, so it was an equal shock to me to hear that they have everything running on a single server. However, their environment is very different, probably more "from the hip", whereas ours is shackled by Sarbanes-Oxley and other compliance req's.
October 24, 2008 17:12
Just a note: the above link to "digg this post" yields on the Digg page:

"Podcasts have left the building (at least on Digg)"
October 28, 2008 7:30
Wow I think i suffered the same shock as Scott at the architectural decisions. I have supported some scary implementations like that, although it didn't really mention how much traffic they are getting - any idea on the uniques?
October 28, 2008 14:40
Yes, I just love it, nice to hear we are not in the perfect world.

Great to hear the shock from Scott, and to make it public too.
October 30, 2008 4:41
The second part of this podcast was gold! I loved the candidness of it. It was awesome to hear some smart guys shooting the breeze and dropping the occasional f-bomb. Encore, Encore!
October 30, 2008 4:43
Best Hanselminutes ever!

Everyone can learn a lot from the "methodology" the stack overflow team followed. They got the site out, and it works.

Nobody is storing confidential information on the site, and nobody is paying to use the site. They can shoot from the hip a lot more freely.

IMO the team set their priorities appropriately for what they were delivering. I used to be surprised by how many companies that have a lot more at risk, actually shoot from the hip even more than SO.

Go Stackoverflow team!!

BTW: Scott you sound exactly like my brother in this podcast. My brother is older than me and deals in a much more "strict (not quite banking but close)" area of software development. My phone call with him when I launched Vast Rank http://www.vastrank.com (a much smaller site I just launched as a side project) sounded just like this call. It is great stuff and balances things out. (I have to be "strict" for my day job work too).

Keep up the great podcasts! You produce so much great content, I greatly appreciate it!

October 31, 2008 0:53
Scott

A server running IIS and SQL Server on the same box visible to the web seems to be secure enough without going through the "Enterprise level security" guidelines you mentioned. I am mentioning this from the point of someone who can't afford all the extra hardware and software.

If I have a software firewall running on the same machine, have all patches installed, have all ports blocked except for the standard ones (like 80,443,110,25,53, 21, 20), block ports 1433, 1434 but let local ip addresses be trusted, explain to me how this server is susceptible to hacks or exploits.

I would sleep very well if I have such a server on the web.
I applaud Jeff for his efforts for believing in what he's doing is sufficient without being paranoid about security.

Also you said that you won't add any value if you checked SO's code. Why not? I believe you have enough skills to find issues with anyone's code, at least in terms of security.





November 03, 2008 20:00
I found the second interview very interesting, and feel that it has added more value to me than the first one. Even though both interviews taught me things, I think the second one might have taught me more :)

Thanks for always having such inspiring and positive podcasts!
November 05, 2008 18:57
Part Two was excellent. It felt like Scott was really saying what he thought instead of being a host.
November 09, 2008 0:10
Another awesome show!

The only comment I can make about SQL on the same box (since I know nothing) is that Microsoft seems to be OK with including a SQL express database with every copy of Office Professional, so they must be fairly confident that it is secure. Jeff could treat his own SQL server as an embedded database and just turn off TCP/IP access and allow local connections only. Another benefit keep SQL on the same box is you can forget about all the data caching infrastructure programming and just let the SQL server cache be your data cache.

The overclocker in me wants to know how extreme you can take this. How much performance can you squeeze out of a single server setup? Has anyone built a server out of an overclocked PC with lots of RAM so that everything is cached? Is it possible to run ASP.NET within the SQL server CLR so that all data stays in-proc?
Jim
November 10, 2008 9:47
Wow, what an episode. I am a fan of StackOverflow, and I've listened to the associated StackOverflow podcast. It generally consists of Joel Spolsky and Jeff Atwood chatting about the genesis of the site and how they plan to go forward with it. I've gotten a sense of how the project has developed. For certain, the entire StackOverflow team has put a great deal of thought into how the site should work, and the result is spectacular.

But your podcast had plenty of surprises. I thought I had misheard Jeff when he said that the whole thing is hosted on one box. My shock wasn't from the security angle (like yours, Scott) -- it was more from having been in innumerable capacity planning meetings. I've noticed a tendency among enterprise stakeholders -- they swing for the fence on hardware configurations, even if it's an application whose user base is limited. And, even after explaining to stakeholders that the cost of application downtime/maintenance is often a lot less than the cost of big redundancy or extreme availability, they're still going to fight for uptime guarantees with infinite nines. I've seen a lot of "fat" hardware configurations.

So my mental model of the StackOverflow hardware was a rack of servers, cranking away, rather than one box, with specs not too different from my desktop machine. There may be consequences down the road for StackOverflow -- from security, to the introduction of database latency when scaling out, to the possibility of performance degradation as the site grows and it has to be maintained on that one box. But it made me realize how far "stock" hardware can take you. Because the site works for a demanding audience.

This also adds a new consideration to the concepts behind cloud computing. If you look at Azure, one of its driving ideas that you can add hardware capacity as you need it, and that applications should be architected with principles of scalability being paramount. But how many sites are really going to need that, if StackOverflow can run with its current architecture and hardware setup? I'm sure that StackOverflow doesn't get the same kind of traffic as say, MSN.com, but still -- it's a very active site and probably has a user base that most sites would kill for.

I went to PDC, and there seemed to be a consistent message that the cloud computing/scalability principles are not a big change from current models that most developers are using. My own investigations make me think otherwise. The move away from relational databases, for instance, is a non-trivial change in thinking. That style is the right answer for the giants of the internet. Google, for instance, seems to be having a successful beta test. But is it right for the local optician's web site? We'll see, I suppose. But what I know now is that StackOverflow is getting it done with what they got.

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.