Creating an OData API for StackOverflow including XML and JSON in 30 minutes
I emailed Jeff Atwood last night a one line email. "You should make a StackOverflow API using OData." Then I realized that, as Linus says, Talk is Cheap, Show me the Code. So I created an initial prototype of a StackOverflow API using OData on an Airplane. I allocated the whole 12 hour flight. Unfortunately it took 30 minutes so I watched movies the rest of the time.
You can follow along and do this yourself if you like.
Preparation
Before I left for my flight, I downloaded two things.
First, I got Sam Saffron's "So Slow" StackOverflow SQL Server Importer. This is a little spike of Sam's that takes the 3gigs of XML Dump Files from StackOverflow's monthly dump and imports it into SQL Server.
Second, I got the StackOverflow Monthly Dump. I downloaded it with uTorrent and unzipped it in preparation for the flight.
Importing into SQL Server
I went into Visual Studio 2010 (although I could have used 2008, I like the Entity Framework improvements in 2010 enough that it made this job easier). I right clicked on the Data Connections node in the Server Explorer and created a database in SQL Express called, ahem, "StackOverflow."
Next, I opened up Sam's RecreateDB.sql file from his project in Visual Studio (I avoid using SQL Server Management Studio when I can) and connected to the ".\SQLEXPRESS" instance, selected the new StackOverflow database and hit "execute."
One nit about Sam's SQL file, it creates tables that line up nicely with the dump, but it includes no referential integrity. The tables don't know about each other and there's no cardinality setup. I've overwritten the brain cells in my head that know how to do that stuff without Google Bing so I figured I'd deal with it later. You will too.
Next, I opened Sam's SoSlow application and ran it. Lovely little app that works as advertised with a gloriously intuitive user interface. I probably would have named the "Import" button something like "Release the Hounds!" but that's just me.
At this point I have a lovely database of a few hundred megs filled with StackOverflow's public data.
Making a Web Project and an Entity Model
Now, from within Visual Studio I selected File | New Project | ASP.NET Web Application. Then I right clicked on the resulting project and selected Add | New Item, then clicked Data, then ADO.NET Entity Data Model.
What's the deal with that, Hanselman? You know StackOverflow uses LINQ to SQL? Have you finally sold out and are trying to force Entity Framework on us sneakily within this cleverly disguised blog post?
No. I used EF for a few reasons. One, it's fast enough (both at runtime and at design time) in Visual Studio 2010 that I don't notice the difference anymore. Two, I knew that the lack of formal referential integrity was going to be a problem (remember I mentioned that earlier?) and since LINQ to SQL is 1:1 physical/logical and EF offers flexible mapping, I figured it be easier with EF. Thirdly, "WCF Data Services" (the data services formerly known as ADO.NET Data Services or "Astoria") maps nicely to EF.
I named it StackOverflowEntities.edmx and selected "Update Model from Database" and selected all the tables just to get started. When the designer opened, I noticed there were no reference lines, just tables in islands by themselves.
So I was right about there being no relationships between the tables in SQL Server. If I was a smarter person, I'd have hooked up the SQL to include these relationships, but I figured I could add them here as well as a few other things that would make our OData Service more pleasant to use.
I started by looking at Posts and thinking that if I was looking at a Post in this API, I'd want to see Comments. So, I right-clicked on a Post and click Add | Association. The dialog took me a second to understand (I'd never seen it before) be then I realized that it was creating an English sentence at the bottom, so I just focused on getting that sentence correct.
In this case, "Post can have * (Many) instances of Comment. Use Post.Comments to access the Comment instances. Comment can have 1 (One) instance of Post. Use Comment.Post to access the Post instance." was exactly what I wanted. I also already had the foreign keys properties, so I unchecked that and clicked OK.
That got me here in the Designer. Note the line with the 1...* and the Comments Navigation Property on Post and the Post Navigation Property on Comment. That all came from that dialog.
Next, I figured since I didn't have it auto-generate the foreign key properties, I'd need to map them myself. I double clicked on the Association Line. I selected Post as the Principal and mapped its Id to the PostId property in Comments.
Having figured this out, I just did the same thing a bunch more times for the obvious stuff, as seen in this diagram where Users have Badges, and Posts have Votes, etc.
Now, let's make a service.
Creating an OData Service
Right-click on the Project in Solution Explorer and select Add | New Item | Web | WCF Data Service. I named mine Service.svc. All you technically need to do to have a full, working OData service is add a class in between the angle brackets (DataService<YourTypeHere>) and include one line for config.EntitySetAccessRule. Here's my initial minimal class. I added the SetEntitySetPageSize after I tried to get all the posts. ;)
public class Service : DataService<StackOverflowEntities>
{
// This method is called only once to initialize service-wide policies.
public static void InitializeService(DataServiceConfiguration config)
{
config.SetEntitySetAccessRule("*", EntitySetRights.AllRead);
//Set a reasonable paging site
config.SetEntitySetPageSize("*", 25);
config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
}
}
Expanding on this class, I added caching, and an example Service Operation, as well as WCF Data Services support for JSONP. Note that the Service Operation is just an example there to show StackOverflow that they CAN have total control. Using OData doesn't mean checking a box and putting your database on the web. It means exposing specific entities with as much or as little granularity as you like. You can intercept queries, make custom behaviors (like the JSONP one), make custom Service Operations (they can include query strings, of course), and much more. OData supports JSON natively and will return JSON when an accept: header is set, but I added the JSONP support to allow cross-domain use of the service as well as allow the format parameter in the URL, which is preferred by man as it's just easier.
namespace StackOveflow
{
[JSONPSupportBehavior]
public class Service : DataService<StackOverflowEntities>
{
// This method is called only once to initialize service-wide policies.
public static void InitializeService(DataServiceConfiguration config)
{
config.SetEntitySetAccessRule("*", EntitySetRights.AllRead);
//This could be "*" and could also be ReadSingle, etc, etc.
config.SetServiceOperationAccessRule("GetPopularPosts", ServiceOperationRights.AllRead);
//Set a reasonable paging site
config.SetEntitySetPageSize("*", 25);
config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;
}
protected override void OnStartProcessingRequest(ProcessRequestArgs args)
{
base.OnStartProcessingRequest(args);
//Cache for a minute based on querystring
HttpContext context = HttpContext.Current;
HttpCachePolicy c = HttpContext.Current.Response.Cache;
c.SetCacheability(HttpCacheability.ServerAndPrivate);
c.SetExpires(HttpContext.Current.Timestamp.AddSeconds(60));
c.VaryByHeaders["Accept"] = true;
c.VaryByHeaders["Accept-Charset"] = true;
c.VaryByHeaders["Accept-Encoding"] = true;
c.VaryByParams["*"] = true;
}
[WebGet]
public IQueryable<Post> GetPopularPosts()
{
var popularPosts = (from p in this.CurrentDataSource.Posts
orderby p.ViewCount
select p).Take(20);
return popularPosts;
}
}
}
But what does this get us? So what?
Accessing StackOverflow's Data via OData
Well, if I hit http://mysite/service.svc I see this service. Note the relative HREFs.
If I hit http://173.46.159.103/service.svc/Posts I get the posts (paged, as I mentioned). Look real close in there. Notice the <link> stuff before the content? Notice the relative href="Posts(23)"?
Remember all those associations I set up before? Now I can see:
- Get a post: http://173.46.159.103/service.svc/Posts(23)/
- Votes on a post http://173.46.159.103/service.svc/Posts(23)/Votes
- Get a post and get all the comments at the same time: http://173.46.159.103/service.svc/Posts(23)?$expand=Comments
- I can even navigate to a user's details off a comment off a post: http://173.46.159.103/service.svc/Posts(23)/Comments(55049)/User
But that's just navigation. I can also do queries. Go download LINQPad Beta for .NET 4. Peep this. Click on Add Connection, and put in my little Orcsweb test server.
Disclaimer: This is a test server that Orcsweb may yank at any moment. Note also, that you can sign up for your own at http://www.vs2010host.com or find a host at ASP.NET or host your own OData in the cloud.
I put this in and hit OK.
Now I'm writing LINQ queries against StackOverflow over the web. No Twitter-style API, JSON or otherwise can do this. StackOverflow data was meant for OData. The more I mess around with this, the more I realize it's true.
This LINQ query actually turns into this URL. Again, you don't need .NET for this, it's just HTTP:
Try the same thing with an accept header of accept: application/json or just add $format=json
It'll automatically return the same data as JSON or Atom, as you like.
If you've got Visual Studio, just go bust out a Console App real quick. File | New Console App, then right-click in references and hit Add Service Reference. Put in http://173.46.159.103/service.svc and hit OK.
Try something like this. I put the URIs in comments to show you there's no trickery.
class Program
{
static void Main(string[] args)
{
StackOverflowEntities so = new StackOverflowEntities(new Uri("http://173.46.159.103/service.svc"));
//{http://173.46.159.103/service.svc/Users()?$filter=substringof('Hanselman',DisplayName)}
var user = from u in so.Users
where u.DisplayName.Contains("Hanselman")
select u;
//{http://173.46.159.103/service.svc/Posts()?$filter=OwnerUserId eq 209}
var posts =
from p in so.Posts
where p.OwnerUserId == user.Single().Id
select p;
foreach (Post p in posts)
{
Console.WriteLine(p.Body);
}
Console.ReadLine();
}
}
I could keep going with examples in PHP, JavaScript, etc, but you get the point.
Conclusion
StackOverflow has always been incredibly open and generous with their data. I propose that an OData endpint would give us much more flexible access to their data than a custom XML and/or JSON API that they'll need be constantly rev'ing.
With a proprietary API, folks will rush to create StackOverflow clients in many languages, but that work is already done with OData including libraries for iPhone, PHP and Java. There's a growing list of OData SDKs that could all be used to talk to a service like this. I could load it into Excel using PowerPivot if I like as well.
Also, this service could totally be extended beyond this simple GET example. You can do complete CRUD with OData and it's not tied to .NET in anyway. TweetDeck for StackOverflow perhaps?
I propose we encourage StackOverflow to put more than the 30 minutes that I have put into it and make a proper OData service for their data, rather than a custom API. I volunteer to help. If not, we can do it ourselves with their dump data (perhaps weekly if they can step it up?) and a cloud instance.
Thoughts?
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
I do think that opening StackOverflow via any medium, be it OData or any other type of service, will bring StackOverflow to new heights.
I think as a community, we could easily help Jeff and the team create an open source SO API. If you would like some help, I'm sure lots of volunteers would jump at the opportunity.
Entity Framework is awesome, since the announcement of OData, I think it's elevated or re-energized EF and lots of devs are revisiting this technology.
Thank you once again for an awesome and informative post
It does indeed show off OData with a flourish. I've always found real world examples of technology much easier to grok than abstract examples.
Furthermore, you do make a compelling case for the use of OData as an API in and of itself, and for where it fits in with other technologies. I know that alot of people, myself included, watched the Mix10 keynote and groaned. Since this had every chance of being Yet Another Protocol.
I must say that you guys have pitched OData just right, appealing to those, like me, who are in want of something better, while respecting the adherents of everything else. You certainly didn't come across and say that OData is the One True Way.
I look forward to more OData posts from you, and I'll definitely be using it in my own projects.
Thanks a lot.
Roberto
Great post, Scott, but I agree with Frank that controls will have to be put in place to prevent a DOS attack.
Does that help?
What that means is that if you're using a modern client-side JScript library, you don't have to set the header or put the $format=JSON on the end of the URL at all, e.g. here's how you'd do it in jQuery:
<html>
<head>
<title>OData JSON Test</title>
<script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.3.2.js" type="text/javascript"></script>
</head>
<body>
<script type="text/javascript">
$(document).ready(function () {
$("#foo").text("fetching...");
var url = "http://173.46.159.103/service.svc/Posts(23)/";
$.getJSON(url, function (result) {
$("#foo").html(result.d.Body);
});
});
</script>
<p id="foo">loading page...</p>
</body>
</html>
If you save this HTML to your hard drive and serve it up from your web server or just surf to it directly, you'll see the Body of Post 23 as fetched from Scott's OData endpoint. Enjoy.
1. Costly Queries: The accesss policies are a huge help here, I think the right use of QueryInterceptors and ChangeInterceptors is another useful option here. I'd love to see a simple way of limiting the set of valid predicates in a way that surfaces the "rules" in metadata. I don't see how to do this with a QueryInterceptor attribute today. One option instead is to set ReadSingle as your policy and the expose the entity set in queryable ways through additional service operations that support querying with predicates that you know your database has indexes for.
2. Cacheable Results: I'd like to be able to have an approach to lists that is more like IEnumerable than IList. Also, if I could create a set of named projections, my freemium data feed could limit free queries to cache-friendly sets. Again, I think you could do this with custom service operations today.
Question: Can you explain a bit more or point to more info about how to do odata caching? Re: your example OnStartProcessingRequest, there's a "return" on the second line so your caching logic doesn't actually get executed, right? Also I presume this is browser/client-side caching that's intended?
Thanks,
-djl
Similar to the first post by Frank Krueger, I too felt that there was a lot of smell with both the result xml and the url to request the data. Is it possible to maybe customise the response .. to make it a wee bit cleaner? I feel that having a clean and short response structure helps make the API more usable (lower barrier to entry).
Secondly, is it possible to add some authentication against service calls? For example, simple querystring authentication or even better, basic authentication against particular services only (instead of the entire service) .. (and yes, I understand the security implications of using Basic Auth over no-SSL).
Lastly .... will OData be the recommendation for ASP.NET MVC sites who wish to have a json/xml API ?
Thanks Scott for the post -- please keep these OData + ASP.NET MVC posts coming .. and in the short term :)
-J-
Chris- JSON != JSONP. I realize that JSON support is built in, sorry if it wasn't clear from the post. JSON can only be used on the same domain, while JSONP is needed to access this data from the web from another domain. However, many folks *like* the ease of $format=json feature and it's used for cross domain calls. I can use straight JSON locally, sure, but as the community will want to use this from *their* domain as it's a service, therefore I *do* need JSONP added.
Great example of using OData. Looking forward to your sessions on DevDays 2010 tomorrow.
Kind regards
I absolutely agree that this could be an excellent interface on Stack Overflow. Rather than designing a complete interface from scratch, which Jeff and Joel admit in their podcast is very difficult. I think SO's data is very well suited for a data centric interface like this.
Thanks for the nice tutorial! Have fun in my little country this week @ DevDays 2010.
Great article! I will definitely take a look at the new EF and at OData.
Christina
It seems to me that if you change the definition of "Post" then any consumer of the interface will break once you post that change. This is the same problem we have with SOAP web services bound directly to the domain model -- you inevitably end up writing some "message" objects that you translate from domain <-> message for use in your service layer.
Do you suggest that we need to have two Models? One that we use internally and one that we use to bind our services to? You're then in the same boat as if you'd created the interface from scratch (double maintenance, etc.)...
As for changing stuff, there's two things that could change - the database or the entity model. Because I'm using the Entity Framework (or if I was using any ORM), I can always map the Entity Model to the physical database to maintain things. I could change a table name or a datatype and map them for the most part. If the service itself changes/versions, it would depend (as it does with all web services.) There's no silver bullet. I like to version my services with dates in the URL, so http://foo.com/2010/03/service.svc, etc.
Great article, would also like to thank you for the link to: http://www.orcsweb.com, been researching ASP.NET Hosting and they seem great. Is that a Microsoft hosting venture? They offer a pretty stellar package for pretty cheap!
Thanks,
Eric
Thanks for a great article - not sure what's changed but
http://173.46.159.103/service.svc/Posts()?$filter=substringof('SQL',Title) or substringof('<sql-server>',Tags) is returning
ie returning 500 : A potentially dangerous Request.QueryString value was detected from the client ($filter="...stringof('<sql-server>',Tags)").
Here are links to a couple of my tutorials for OData from SQL Azure, which eliminate the need for rolling your own Entity Data Model if your database has relationships defined:
Querying OData.org’s SQL Azure Northwind Sample Data Service
Enabling and Using the OData Protocol with SQL Azure
Cheers,
--rj
It may be an operation, but GetPopularPosts must already be accessed with the HTTP Get verb. Don't you think it would be more RESTful to change its name to PopularPosts?
Patrick - Under what context are you running it? If it's under IIS or the VS Web Server it'll be not-null. You can remove it if it's a problem, but it enables caching.
Roger - Thanks for sharing the links!
Chris Eargle - Agreed, that would be more RESTful. That's my old WS-*.* personality showing.
Otherwise, a read-only and slightly dated snapshot of the data isn't as useful.
But this showcases something else too: you CAN continue to use whatever technology you use internally, and still leverage EF+WCF Data Services to quickly expose what you want (even allow controlled writes with interceptors that call your existing biz logic if it's properly isolated behind entity interfaces!).
I'd love to see this adopted for StackOverflow. I'm willing to help and build the coolest VS integration for it you can think of ;)
The PowerPivot link at the end of the post seem to be referring to the service on the OrcsWeb test site.
<system.serviceModel>
<serviceHostingEnvironment aspNetCompatibilityEnabled="true" />
</system.serviceModel>
So my question is which is better, to keep it as IQueryable or change it to something else e.g. ICollection or IEnumerable?
Here is an alternate dump loader that is pretty darn quick and supports sql server, mysql and sqlite.
http://meta.stackoverflow.com/questions/45333/fast-multi-platform-data-dump-import-sql-2000-05-08-sqlite-mysql
If the Service Operation returns an IQueryable , then you can further compose on top of the results and the query will be executed on the server side.
You can filter the results via the $filter query operation on the server side , projection works too .
For example , you can filter the popular posts by the view count
http://173.46.159.103/service.svc/GetPopularPosts?filter=ViewCount eq 0
get only the body of the Popular posts.
http://173.46.159.103/service.svc/GetPopularPosts?filter=ViewCount eq 0&$select=Body
Phani
http://blogs.msdn.com/PhaniRaj
I was confused till I noticed that no service operations are being generated on service client code!!
And I noticed that Excute method returns IEnumrable!
So I'm supposed to form my Request Uri that I want to execute
Thank you for the hint.
e.g. http://173.46.159.103/service.svc/Posts(23)/
are returning xml based response. Shouldn't they be returning JSON format as scott has already put [JSONPSupportBehavior] at the top? Or does the service return response based on the client requesting the response? Got confusion here?
I am used to JSON writing asp.net mvc apps but I am new to WCF, data services, EF, and OData. I also read on www.OData.org website that OData supports two formats for representing the resources (Collections, Entries, Links, etc) it exposes: the XML-based AtomPub format and the JSON format.
Regards,
Nabeel
I got another question though. As you know JSONP is for cross domain calls. The following JQuery code, posted earlier by Dan, seems to be working from my machine though. And its returning valid JSON response without appending url with "?$format=json" in $.getJSON() url.
<html>
<head>
<title>OData JSON Test</title>
<script src="http://ajax.microsoft.com/ajax/jquery/jquery-1.3.2.js" type="text/javascript"></script>
</head>
<body>
<script type="text/javascript">
$(document).ready(function () {
$("#foo").text("fetching...");
var url = "http://173.46.159.103/service.svc/Posts(23)/";
$.getJSON(url, function (result) {
$("#foo").html(result.d.Body);
});
});
</script>
<p id="foo">loading page...</p>
</body>
</html>
Cool idea, took your idea and extended it further by including reactive extensions to it.
Using OData , LINQPad, Reactive Extensions (Rx) to query stackoverflow
Thanks
Naveen
However, from a practical point of view (e.g. if i was writing my own interface to a system) i'm not sure i like it - certainly not in a public sense and as a default data access mechanism.
I still see more value in structured API's that encapsulate this kind of data - where there could be value is in *others* being able to write API's over your oData services. Effectively you could have distributed API's for your system.
I can imagine this being powerful for data integration but less so for normal devs.
Given me a bit to think on.
The service is not local..It's been hosted by you on some machine...isn't it?..look at the service url http://173.46.159.103/service.svc/Posts(23)/ ... Its already on the internet?
Regards
Nabeel
http://localhost/ or http://[my IP machine]/ or http://[my machine name]/
is exactly the same call, but to access local websites is always better to NOT use localhost as, for example, IE bypasses all proxies you might have (so it will not work with Fidller) and for services is always better and more reliable to use IP or machine name, never localhost.
Thanks for your reply... I tried LINQ to SQL too which I've read supports SQL Server 2000 (with some caveats), but I couldn't get a connection to the database -- the same error : "This server is not supported. You must have Microsoft SQL Server 2005 or later". I'm using VS2010, don't know if that makes any difference. Would I have to manually configure this somewhere? Sorry I'm being a bit dim. Thanks again.
Turns out that VS2010 has removed support for SQL 2000, so I'll probably have to reinstall VS2008 to get the connectivity...
oData combined with Entity Framework and WCF Data Services is SUCH a new powerful way to expose and consume data. Love the post, keep up the great work.
Why is the Tags property one big string with tags delimited by angle brackets? Was this to work around a limitation in OData or it just how StackOverflow stores tags?
I'm asking because I've been trying to figure out how you can use OData to test for the presence of an element in a collection. Something like:
http://173.46.159.103/service.svc/Posts()?$filter=contains('sql-server',Tags)
where Tags is an array, instead of:
http://173.46.159.103/service.svc/Posts()?$filter=substringof('<sql-server>',Tags)
I've seen other solutions to this problem where they invert the query like this:
http://173.46.159.103/service.svc/Tags('sql-server')/Posts
but I'm still curious how we can perform queries that say, get me all the X, where its one-to-many property Y contains a Z.
If you have any insight, it's much appreciated.
I have taken your sample and generated the OData and EF against a local database and it is sort of working. When I do the "View In Browser" on the .svc file, all works as expected, but... when I add the name of one of the database tables to the end of the query string, I get an HTTP 500 error. The same project works on another machine without any problems. I am running Windows XP with VS 2010.
Any idea of what might be causing the issue?
Thanks,
Eric
But never the less, I get a hit on the server side for each and every request.
My other data service is based on a Linq-to-sql and I can see that it reaches the database everytime, even with those caching settings when the request starts.
However, it would be interesting how we can define SqlCachdeDependency and the server to get returning 304-not modified results!
Any advices/examples are welcome!
For anyone following along ...
If you get the following error msg when trying to save your .edmx after adding associations:
"Error 113: Multiplicity is not valid in Role ....."
You have to go the the foreign key fields (property) and set them to nullable=false. Ex. for the UserPost association, a Post MUST contain a UserID (OwnerUserID). Right click on OwnerUserID, choose properties, and set Nullable = false. Repeat for all foreign keys used in your associations.
a quick question: is it possible to specify that certain fields on an entity/table are not exposed? It seems like the config.SetEntitySetAccessRule does only apply to the entity/table itself and does not support more granular control.
Thanks,
/Jesper
Copenhagen, Denmark
I've used the JSONPBehavior attribute that Adam Kahtava created, but wondered if there was a way right out of the box.
Thanks for your help.
Nathan
Thanks
@nato24
http://odata.stackexchange.com/
Comments are closed.
This really shows off the power of OData and the VS tooling.
But what are the performance implications of allowing people to run arbitrary queries on your DB? DOS seems inevitable.
I was initially turned off by OData for its horrendous URL query syntax. But I find that if I just don't look at those (treat them as assembler), then OData is quite nice. Funny how a small detail like that can make you dislike an entire technology.