Scraping Cities
Hacks are always hilarious and brute force. One of the companies that I work with needs all businesses everywhere. To do this, we first need to know all the cities, and their corresponding GPS coordinates.
I could Geocode, but for fun, I’ll ruin somebody’s web server.
Ehhh the design needs work on this site :/… Whatever, next is to map the general structure of the site:
First page is a list of States:
One link deep we have all the cities
Another link deep we have the coordinates:
Quick look at the source, and we get the corresponding Xpath for both coordinates:
Longitude: td div td:nth-child(2) strong
Latitude: td div td:nth-child(1) strong
Then some simple string substitution to the format we want (North and East are positive valies, West and South are Negative Values).
Now all we have to do is loop it into a database, and we’re laughing. Below’s the code, and the video.