HTTP, HEAD, and Range Requests...
Venkat writes that he has a text file (CSV) containing over 50,000 URLs. I want to run a program that will take this file as input and output a text file which contains only the valid URLs. Basically I need a URL/Link Validator that can perform this job. I tried to put together a custom C# program to do this, but it takes several minutes just to do a hundred URL. Is there any program/code you are aware that can do this?
I recommended a Range Retrieval Request, such as those used by GETRIGHT. GetRight uses a Range Retrieval Request, like this. You can do this in .NET by just adding the name/values for Range to the Headers collection. NOTE: The Server CAN (and many will) ignore this request. If you get partial content, you wont get an OK 200, youll get a 206 and the Content-Length will have the amount of data included.
However, another fellow, more clever than myself wrote me to say that a HEAD (rather than a GET) should provide enough information - namely the headers - to determine page existance, without the trouble of the HTTP Body Content. Good stuff!
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2
http://www.vbip.com/winsock/winsock_http_08_01.asp
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Comments are closed.