Algorithmia Blog

Rapidly Extract Information from Public Websites

We have a lot of fun, heavy-hitting algorithms in our marketplace: deep-learning tools like Image Tagger and pipelining mechanisms such as Video Metadata Extraction are designed to bring the power of Machine Learning to your app via easy-to-use APIs.

But sometimes, all you need to do is extract some simple information from publicly available sources: for example, finding all the email addresses of a company’s C-Suite, or summarizing the topic pages of a FAQ. You could accomplish some of it with a Python script and some RegEx magic, but that wouldn’t bring the benefits of a remote API: datacenter-grade network connections, multiple IPs, and distributed parallel processing. And it wouldn’t give you access to more complex algos such as automatic tagging or sentiment analysis. With Algorithmia, you get all the benefits of the cloud without having to build and host your own workers, plus the combined experience of our growing network of experienced algorithm developers.

Site Spidering

Page Summarization, Tagging, and Analysis

Data Extraction

Miscellaney

As with any content you might acquire, be sure you have permission to use and/or republish anything you grab. We make it easy to pull down information, but don’t want to see you involved in a copyright dispute or receive complaints about sending spam without permission!

Have fun out there, and if you build something awesome, let us know!

-icon comes from nounproject