Hacking Data from Web Sites

Sometimes I hate “the cloud”, and other times I love it. One thing I really love about the cloud is that it is more often than not just storage with no actual computing. This makes it much nicer when web sites try to hide data from you, because they simply put the data in the cloud, and all you need to do it pluck it out from there.

I’ve come across this situation a few times in the last week or so. Some web site tries to hide information that I’m looking for. Rather than give me simple data, they fart around with it and put it in pretty Flash SWFs or in a graphic. Since I really do prefer the actual raw data so that I can look at it and manipulate it however I like, I need to hack through some of the most pathetic security you can imagine.

It’s pretty simple, and anyone can do it. You only need to know enough HTML to locate where the data is on the page, and then know enough to pick out a URL to the information that’s stored in the cloud or on a CDN (Content Delivery Network, e.g. Akamai).

Here’s how…

  1. Mouse over the data, or if it’s a Flash file, mouse over just below or above or to the side of it.
  2. Right-click and choose “Inspect source” or whatever they call it in your browser.
  3. In the source at the bottom of the page, mouse over the elements until the data portion is highlighted above in your browser window.
  4. Drill down through the elements by clicking the little triangles. Do this until you get to some kind of an anchor tag or an object tag.
  5. Copy the element into a simple text editor. One with syntax highlighting is nicer.
  6. Look for URLs with a different domain name. Often the look bizarre with apparently random characters in them.
  7. Copy the URL and then go paste it into a new tab in your browser. If you picked the right URL, then you’ll get the right data. If not, go back to step 4 and repeat until you get the data you want.

I’ve done this for XML files that held the raw data I wanted as well as video files (MP4s). Who in their right mind would choose to watch a video that they paid for in their browser when they can simply download the video and watch it in a decent video player?

Here’s an example of an XML file with data that I plucked out of a site:

The data was from a Flash file which made it useless to me as I actually want to manipulate the data so that I can look at it from different perspectives.

But, you can see the funky URL there. It’s probably an account number of something. Who knows? Who cares?

Here’s a screenshot of where I found the URL (click for a larger version):

You can see how the chart is highlighted there, and the URL is in the EMBED tag.

This works for a surprising number of sites. They seem to make it very hard to actually get the data in anything other than a format that they find pretty. Some sites do this to prevent you from uploading videos somewhere else. Some sites try to present the data in a formatted way, but probably don’t want you to do it, because then you’d be competition. So, stopping you from getting the data is the only way.

Anyways. That’s just a quick way to get data in a raw, more usable format.




Written by:

276 Posts

View All Posts
Follow Me :

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.