19 July 2011

HTML-scraping

Ever had a need to process pages from a website that does not support any form of structural system integration like Web Service, RSS, REST, etc.
The only information available is ill-formed HTML; not even XHTML!?I have always been using HTML Agility for the .NET platform to perform such HTML screen-scraping.
Recently, found a number of Java equivalent toolkit to do the same:
Found this site that collects various toolkits for this purpose here.

1 comment:

Carly Fiorina said...

Hi all,

Acquiring data displayed on screen by capturing the text manually with the copy command or via software. Web pages are constantly being screen scraped in order to save meaningful data for later use. Thanks for sharing it.....

Data Scraping Software