scraplab — Extractomatic in Sinatra on JRuby on Google App Engine on the Internet

by mort

The other day I threw together a little service which I’ve nicknamed Extractomatic. It’s a very simple web-based API to detect and extract the main content from a web page, removing all of the clutter, such as headers, footers, advertising and so on. I guess it’s somewhat similar to Readability or Instapaper, but more suitable to building into your own applications.

via scraplab — Extractomatic in Sinatra on JRuby on Google App Engine on the Internet.