Hanselminutes Podcast 134 - StackOverflow uses ASP.NET MVC - Jeff Atwood and his technical team
My one-hundred-and-thirty-fourth podcast is up.
Well, actually a few weeks ago, but I totally forgot to update my website with the details. You'd think somewhere around 100 shows I'd had automated this somehow. Hm. If I only I know a programmer and the data was available in some kind of universal structure syndication format…;)
Scott chats with Jeff Atwood of CodingHorror.com and most recently, StackOverflow.com. Jeff and Joel Spolsky and their technical team have created a new class of application using ASP.NET MVC. What works, what doesn't, and how did it all go down?
- Download: MP3 Full Show #134
- Play in your browser.
- ACTION: Please vote for us on Podcast Alley! Digg us at Digg Podcasts!
Do also remember the complete archives are always up and they have PDF Transcripts, a little known feature that show up a few weeks after each show.
Telerik is our sponsor for this show!
Building quality software is never easy. It requires skills and imagination. We cannot promise to improve your skills, but when it comes to User Interface, we can provide the building blocks to take your application a step closer to your imagination. Explore the leading UI suites for ASP.NET and Windows Forms. Enjoy the versatility of our new-generation Reporting Tool. Dive into our online community. Visit www.telerik.com.
As I've said before this show comes to you with the audio expertise and stewardship of Carl Franklin. The name comes from Travis Illig, but the goal of the show is simple. Avoid wasting the listener's time. (and make the commute less boring)
Enjoy. Who knows what'll happen in the next show?
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Hopefully we can catch them on DotNetRocks soon and listen to an even more detailed discussion.
I find the Podcast interesting because on one side, one could argue that Jeff made quite a few WTF?? moments (Production, Dev and SQL Server running on the same machine?), on the other hand I think he has a point there: Why buy additional servers just now? Startups do not succeed because they chuck out craploads of money for an infrastructure that's oversized by a factor of 10, startups do succeed by actually bringing outproducts and building a community and then scaling the infrastructure when needed.
It's only the same as the meme that goes around every few years that NTFS compression speeds up disk access if the ratio of CPU to HDD tips.
Re: Open RDP ports
If your hosting supplier is a cloud supplier, you have a couple of choices to admin your server - SSH or RDP. Did you really expect StackOverflow to have a leased line to their data center?
I've launched a few sites myself running the whole thing on one machine. This is not due to not knowing how to do it correctly, but financial realities. When you are deploying a free site, and don't have a revenue story, it can be king of difficult to justify the expense of a fully segmented production platform. They would probably not have given us the benefit of overstack if their barrier of entry was a multi server deployment.
I think you were kind of a little snobbish and harsh, when it was cleared that he knew how to do it right, but did not have the funds to invest in it. After all it is a free question asking site, not a financial institution.
Love your podcast, but have to let you know when I don't agree with you.
But c'mon, the way you talked about windows was like the olden days when admin's actually automated nightly reboots on machines... i think you can secure production/dev pretty well on modern windows machines--
I find SO wonderfully refreshing, & a nice emema to those silly "scalability zeolots"
I too had a production server with the web server and database on the same server due to economics. It was also a test bed for me to learn about security. I had this running for years under NT 4.0 and 2000 and the only incident I had was the Slammer. I didn't even have a firewall on it for a long time because firewalls were expensive back then! I had so many people say I am crazy but it worked. I trusted my machine and my experience.
I was up to date with all the Windows patches as soon as they came out. My worry was only the 0day exploits and Slammer was the only one I got hit with. Luckily I was able to contain it.
A web server and database can work very well on the same machine. At least I get rid of any network latency. The trick is to put the database on a different hard drive and the database log on a third hard drive. This way you have 3 drives spinning at the same time for maximum efficiency (OS, database & log). I also use SCSI drives only. They are the fastest with fast seek times. Plus 4G RAM which is the maximum Windows 32bit can use.
However since Jeff is using 64bit, he should be using at least 8G. RAM is so cheap and RAM really makes a big difference. He mentions he's using caching a lot and caches love memory. Using RAM drives can boot performance too. Make sure only temp files or files you don't care about if they get lost reside there.
For me the biggest WTF moment was when I heard Joel Spolsky freely admit (on the Stackoverflow podcast) that they didn't use a bug tracking application. If it were anybody else I might let this one go by, but Joel 'if you don't use a bugtracking app you're going to hell' Spolsky ???? :-)
Just a comment about databases -- SQL Server doesn't default to the snapshot isolation mode that they used because it would break legacy applications that depended on readers blocking writers and writers blocking readers. SQL Server 2000, and pretty much all other RDBMS other than Oracle, didn't support a mode where readers weren't blocked by writers and writers weren't blocked by readers. This was a new feature of SQL Server 2005, and IMHO is the only way to go for most new applications.
Oracle has worked this way since at least version 6.0 (1988 time frame). Oracle does not support the nolock keyword because it is not needed. You don't have to make a trade-off between being fast and being correct. The only other RDBMS I know that supported this kind of multi-versioning read consistency was PostgreSQL.
Security wise it may not be the smartest configuration though.
In several cases I've actually seen a significant decrease in performance after moving the database to a dedicated server.
Jonas, comments like this scare me.. I too was worried about the cost of inter-machine communication, as we do LOTS (really, LOTS) of tiny queries for every page. Sort of a LINQ 2 SQL thing, I guess..
We're negotiating with the provider now to get a loopback gigabit ethernet connector for the 2nd server as we grow -- this will be the DB server. Do you think that's enough?
Oh, and ironically, it is CHEAPER for us to get a 2nd server than it is to upgrade to 8GB on the single server we have. Lame, but they won't budge on this. Upgrading from 4 GB to 8 GB on the same server will just about double our monthly hosting costs. Might as well just go for server no. 2 with 4 GB at that point..
I too was worried about the cost of inter-machine communication, as we do LOTS (really, LOTS) of tiny queries for every page. Sort of a LINQ 2 SQL thing, I guess..
But as I understand you also do a lot of caching and that might very well save you.
We're negotiating with the provider now to get a loopback gigabit ethernet connector for the 2nd server as we grow -- this will be the DB server. Do you think that's enough?
One could take a wild guess but measuring is the way to go.
Would it be possible for you, Jeff, to open-source your Linq to SQL pattern implementation or for you, Scott, to blog about a suggested implementation? Microsoft at New Zealand implemented some patterns in their Backgroundmotion project, but somehow I don't think it works right.
I would love to know more about stackoverflow's implementation.
Also in the mean time could you put the db on a different spinning disk platter from iis? that would probably ease some disk contention... (if thats not how it is already)
That's silly to pay a recurring extra monthly fee for going from 4GB to 8GB! I use my own servers and I can put whatever hardware I want. It's the same monthly fee. I suggest you go that route instead of being under the mercy of your host. You live in/near San Francisco? There must lots of ISP's where you can colo your own machines.
Regarding gzipping cache data:
If they are caching final page output in this way and they are already sending HTTP responses across the wire as gzip,
then gzipping the output cache would make great sense. But only if they are NOT first unzipping when taking the page out of cache and then re-zipping it up again before sending the response.
If they check the 'Accept-Encoding' header to make sure gzip is supported, then they can just send the gzipped output cache directly into the response stream and set the Encoding header with 'gzip'. If Accept-Encoding doesn't include gzip, then the cache would have to be unzipped, but this is a very small percentage of browsers.
All this logic could go into a wrapper to the Output Cache.
What do you think?
BTW: Yahoo Open ID implementation doesn't seem to work. I get a 404 when yahoo tries to redirect back to your site.
I think that was my favorite Hanselminutes of all time. What a fantastic show it was. I'm mostly a winforms developer with just a bit of web dev on the side but I totally loved that show. It might have something to do with the fact that I think StackOverflow is a genius idea.
I love how pragmatic the StackOverflow team has been. I often feel guilty about not implementing best practices but it often boils down to cost (either actual cost or dev time cost). I could tell the guys felt bad about some of their architectural decisions but I think it's great to get some high-profile guys out there talking about how they shortcut their design. I often feel a big separation between real-world development and the stuff I read in blogs. I know there are a lot of people doing everything they can to make their designs great but I work for a very small company and when the rubber meets the road I still have to finish the app within the (often too short) timeline and the (often too small) budget.
Scott's incredulity about some of the worst-practices was hilarious. Podcasting Gold.
And I agree with Abdu:
silly to pay a recurring extra monthly fee for going from 4GB to 8GB!
it be much nicer if you could pay them once for the ram, plus a mildly outrageous installation fee, and then forget about it.
Here is my big question today. Caching!!! I am trying to figure out what the best way is to implement caching out of process on a single server so that I can run a local web farm and also distributed caching so that I can cache objects across more than one server. What are the best products (free preferably) to do this and also why has Microsoft not put together an out of process caching provider like they did for session state. This just seems so intuitive to me but it's been missed! Scott Gu should put some time into that.
Thanks,
Andrew
PS. As for expanding the capacity via another server, that might not necessarily work as expected. They will likely get far less than a doubling of performance as there are latency issues between DB and the webserver. I would definitely advocate for more memory first as it won't increase latency and they aren't maxing out the server on the CPU end. We load our entire 4GB DB in memory and it works great on our webserver (yes we use one machine for both, it's probably more common than you think.) and when you have 11GB or memory you can afford to run SQL 2005 and IIS all at the same time. Perhaps you can do a show about some of the more common approaches that the at home developer takes where cost is an issues and the client is clearly more concerned with the deadline than with the security behind the product.
I work in the tax and accounting business, so it was an equal shock to me to hear that they have everything running on a single server. However, their environment is very different, probably more "from the hip", whereas ours is shackled by Sarbanes-Oxley and other compliance req's.
"Podcasts have left the building (at least on Digg)"
Great to hear the shock from Scott, and to make it public too.
Everyone can learn a lot from the "methodology" the stack overflow team followed. They got the site out, and it works.
Nobody is storing confidential information on the site, and nobody is paying to use the site. They can shoot from the hip a lot more freely.
IMO the team set their priorities appropriately for what they were delivering. I used to be surprised by how many companies that have a lot more at risk, actually shoot from the hip even more than SO.
Go Stackoverflow team!!
BTW: Scott you sound exactly like my brother in this podcast. My brother is older than me and deals in a much more "strict (not quite banking but close)" area of software development. My phone call with him when I launched Vast Rank http://www.vastrank.com (a much smaller site I just launched as a side project) sounded just like this call. It is great stuff and balances things out. (I have to be "strict" for my day job work too).
Keep up the great podcasts! You produce so much great content, I greatly appreciate it!
A server running IIS and SQL Server on the same box visible to the web seems to be secure enough without going through the "Enterprise level security" guidelines you mentioned. I am mentioning this from the point of someone who can't afford all the extra hardware and software.
If I have a software firewall running on the same machine, have all patches installed, have all ports blocked except for the standard ones (like 80,443,110,25,53, 21, 20), block ports 1433, 1434 but let local ip addresses be trusted, explain to me how this server is susceptible to hacks or exploits.
I would sleep very well if I have such a server on the web.
I applaud Jeff for his efforts for believing in what he's doing is sufficient without being paranoid about security.
Also you said that you won't add any value if you checked SO's code. Why not? I believe you have enough skills to find issues with anyone's code, at least in terms of security.
Thanks for always having such inspiring and positive podcasts!
The only comment I can make about SQL on the same box (since I know nothing) is that Microsoft seems to be OK with including a SQL express database with every copy of Office Professional, so they must be fairly confident that it is secure. Jeff could treat his own SQL server as an embedded database and just turn off TCP/IP access and allow local connections only. Another benefit keep SQL on the same box is you can forget about all the data caching infrastructure programming and just let the SQL server cache be your data cache.
The overclocker in me wants to know how extreme you can take this. How much performance can you squeeze out of a single server setup? Has anyone built a server out of an overclocked PC with lots of RAM so that everything is cached? Is it possible to run ASP.NET within the SQL server CLR so that all data stays in-proc?
But your podcast had plenty of surprises. I thought I had misheard Jeff when he said that the whole thing is hosted on one box. My shock wasn't from the security angle (like yours, Scott) -- it was more from having been in innumerable capacity planning meetings. I've noticed a tendency among enterprise stakeholders -- they swing for the fence on hardware configurations, even if it's an application whose user base is limited. And, even after explaining to stakeholders that the cost of application downtime/maintenance is often a lot less than the cost of big redundancy or extreme availability, they're still going to fight for uptime guarantees with infinite nines. I've seen a lot of "fat" hardware configurations.
So my mental model of the StackOverflow hardware was a rack of servers, cranking away, rather than one box, with specs not too different from my desktop machine. There may be consequences down the road for StackOverflow -- from security, to the introduction of database latency when scaling out, to the possibility of performance degradation as the site grows and it has to be maintained on that one box. But it made me realize how far "stock" hardware can take you. Because the site works for a demanding audience.
This also adds a new consideration to the concepts behind cloud computing. If you look at Azure, one of its driving ideas that you can add hardware capacity as you need it, and that applications should be architected with principles of scalability being paramount. But how many sites are really going to need that, if StackOverflow can run with its current architecture and hardware setup? I'm sure that StackOverflow doesn't get the same kind of traffic as say, MSN.com, but still -- it's a very active site and probably has a user base that most sites would kill for.
I went to PDC, and there seemed to be a consistent message that the cloud computing/scalability principles are not a big change from current models that most developers are using. My own investigations make me think otherwise. The move away from relational databases, for instance, is a non-trivial change in thinking. That style is the right answer for the giants of the internet. Google, for instance, seems to be having a successful beta test. But is it right for the local optician's web site? We'll see, I suppose. But what I know now is that StackOverflow is getting it done with what they got.
Comments are closed.
Another Great Show. Codinghorror 2.0!
Thanks for the info,
Catto