About InfoExtractor

InfoExtractor is a framework to extract relevant information from various sources such as blogs, YouTube, and Twitter.
As a web service, currently InfoExtractor understand video pages and user profile pages on YouTube, Wikipedia entries, blogcatalog, Huffington Post, The Foundry, and Facebook profile pages.

  1. How do I use it?
    Simply enter a URL in the textbox below and InfoExtractor will extract relevant information from that page. You can download this information in text, CSV, or XML format. Alternatively, you can upload a text file with a set of URLs to process and download the extracted information in XML.
  2. How much does it cost?
    Nothing! Yes, InfoExtractor is absolutely free to use for non-commercial purposes. See the license page for more details.
  3. I want to extract information from some other source. Can InfoExtractor help?
    We are working constantly to incorporate many other sources than what's currently supported by InfoExtractor. If you have a specific requirements, you can send us a request.


Enter a web address (URL)

InfoExtractor currently understands the following URLs:
  • YouTube video pages
  • YouTube user profile pages
  • Facebook profiles and pages
  • Wikipedia entries
  • Huffingtonpost posts
  • Blogcatalog blog posts
  • The Heritage Foundation blog (The Foundry)
Upload a file to process

The file should be in plain text format.
Put one URL per line.
See an example file.

Check out our Facebook Harvester, soon to be integrated into InfoExtractor.
Also coming soon - The New York Times Crawler.


Home ● About InfoExtractor ● Firefox toolbar Chirag Shah Bookmark and Share