Writing a URL Checker in C#

Leveraging the .NET Framework, developers can make some interesting .NET programs to interact with the web. In this case we are going to make a simple application that will check if a link is valid.

The definition of a valid link varies depending on who you ask. For some, a valid link is a URL that takes the user somewhere, anywhere at all. For others it is a URL that points to a valid page, which means error pages (404, etc.) are no good. We will cover mainly the first case, but some of the second case.

The first of the conditions is to check if a link takes the user anywhere. To do that in C#.NET, the application will try to visit the page. If it finds data there, then it will determine that the link is at least pointing to some page in the web. If it fails to connect to the page, then the link is not pointing to a valid location. This of course assumes that the application is properly connected to the internet. Because if there is not internet connection, the application will not be able to receive any page's data, and it will declare all the links invalid.

To download data from a link, we make use of the WebRequest.NET class. The WebRequest creates a request to the given URL. To receive the answer from the request, use the WebResponse class. The WebResponse class has a GetResponse method, which if surrounded by a try-catch block, will achieve the behavior we are after. If the request is successful, the code can continue for further checking. If an exception is caught, that means the request did not go through.

If the request does go through, we now have the option to verify if it is a valid page or an error page. This is a bit tricky. A quick way to do it is to check the ResponseUri property of the WebReponse object. This property gives the URL that the application ended up in (this would differ if the original link redirected the user somewhere else). Since most error pages are named after the error (404.php or 404.html for example), we can check the response URL to make sure it's not an error page.

Another option is to actually download the contents of the response and then scan the page for error codes. This increases the level of complexity since the application now has to assume certain characteristics about the content of a web page.

0 Response to "Writing a URL Checker in C#"

Post a Comment