InfoExtractor is a framework to extract relevant information from various sources such as blogs, YouTube, and Twitter.

As a web service, InfoExtractor helps one extract structured information from a supplied URL. For example, one can enter a URL of a YouTube video and InfoExtractor will extract a number of associated attributes (title, tags, view count, comments, etc.) in a format that can be easily exported, analyzed, or plugged into something else. Try it right here!


Enter a web address (URL)

InfoExtractor currently understands the following URLs:
  • YouTube video pages
  • YouTube user profile pages
  • Facebook profiles and pages
  • Wikipedia entries
  • Huffingtonpost posts
  • Blogcatalog blog posts
  • The Heritage Foundation blog (The Foundry)
Upload a file to process

The file should be in plain text format.
Put one URL per line.
See an example file.

Check out our Facebook Harvester, soon to be integrated into InfoExtractor.
Also coming soon - The New York Times Crawler.
● Home ● About InfoExtractorFirefox toolbar Chirag Shah Bookmark and Share