Blog Objective

This is a blog that attempts to make life easier by noting down the author's accrued knowledge and experiences.
The author has dealt with several IT projects (in Java EE and .NET) and is a specialist in system development.

19 July 2011


Ever had a need to process pages from a website that does not support any form of structural system integration like Web Service, RSS, REST, etc.
The only information available is ill-formed HTML; not even XHTML!?I have always been using HTML Agility for the .NET platform to perform such HTML screen-scraping.
Recently, found a number of Java equivalent toolkit to do the same:
Found this site that collects various toolkits for this purpose here.

1 comment:

Carly Fiorina said...

Hi all,

Acquiring data displayed on screen by capturing the text manually with the copy command or via software. Web pages are constantly being screen scraped in order to save meaningful data for later use. Thanks for sharing it.....

Data Scraping Software