Algorithmia Blog

Introduction to Automatic Text Summarization

Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what the heck someone is talking about in a paper or report.

And, if you need to get through hundreds of documents – good luck.

Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way.

Automatic text summarization is part of the field of natural language processing, which is how computers can analyze, understand, and derive meaning from human language.

What Is Automatic Text Summarization

Summarizer is a microservice that uses the Classifier4J framework and it’s summarization module to scan through large documents and returns the sentences that are most likely useful for generating a summary.

Automatic summarization of text works by first calculating the word frequencies for the entire text document. Then, the 100 most common words are stored and sorted. Each sentence is then scored based on how many high frequency words it contains, with higher frequency words being worth more. Finally, the top X sentences are then taken, and sorted based on their position in the original text.

By keeping things simple and general purpose, the automatic text summarization algorithm is able to function in a variety of situations that other implementations might struggle with, such as documents containing foreign languages or unique word associations that aren’t found in standard english language corpuses.

Why You Need Text Summarization

Business leaders, analysts, paralegals, and academic researchers need to comb through huge numbers of documents every day to keep ahead, and a large portion of their time is spent just figuring out what document is relevant and what isn’t. By extracting important sentences and creating comprehensive summaries, it’s possible to quickly assess whether or not a document is worth reading.

Automatic text summarization is also useful for students and authors. Imagine being able to automatically generate an abstract based for your research paper or chapter in a book in a clear and concise way that is faithful to the original source material!

How Do I Use Summarizer?

Using the summarizer is easy, all you need to do is provide is the text in a string form you want to summarize, and it’ll take it from there. Don’t forget: You need a free Algorithmia API key.

Sample Input

import Algorithmia

input = "In the history of artificial intelligence, an AI winter is a period of " + \
"reduced funding and interest in artificial intelligence research. The term was coined by analogy " + \
"to the idea of a nuclear winter. The field has experienced several hype cycles, followed by disappointment and criticism, " + \
"followed by funding cuts, followed by renewed interest years or decades later. " + \
"The term first appeared in 1984 as the topic of a public debate at the annual meeting of AAAI "+\
"(then called the \"American Association of Artificial Intelligence\"). "+\
"It is a chain reaction that begins with pessimism in the AI community, followed by pessimism in the press, "+\
"followed by a severe cutback in funding, followed by the end of serious research. "+\
"At the meeting, Roger Schank and Marvin Minsky—two leading AI researchers who had survived "+\
"the \"winter\" of the 1970s—warned the business community that enthusiasm for AI had spiraled out of control in "+\
"the '80s and that disappointment would certainly follow.Three years later, the billion-dollar AI industry began "+\
"to collapse. Hypes are common in many emerging technologies, such as the railway mania or the dot-com bubble. "+\
"An AI winter is primarily a collapse in the perception of AI by government bureaucrats and venture capitalists. "+\
"Despite the rise and fall of AI's reputation, it has continued to develop new and successful technologies. "+\
"AI researcher Rodney Brooks would complain in 2002 that \"there's this stupid myth out there that AI has failed, "+\
"but AI is around you every second of the day. In 2005, Ray Kurzweil agreed: Many observers still think that the "+\
"AI winter was the end of the story and that nothing since has come of the AI field. "+\
"Yet today many thousands of AI applications are deeply embedded in the infrastructure of every industry. "+\
"He added: the AI winter is long since over."
client = Algorithmia.client('API KEY HERE')
algo = client.algo('nlp/Summarizer/0.1.3')
print algo.pipe(input)

Sample Output

"In the history of artificial intelligence, an AI winter is a period of reduced funding and interest in artificial intelligence research. The term was coined by analogy to the idea of a nuclear winter. The field has experienced several hype cycles, followed by disappointment and criticism, followed by funding cuts, followed by renewed interest years or decades later."

That was pretty painless. Now you have a tool for automatic text summarization you can use to summarize any kind of text in any language.

If you want to get even more information from text? Take a look at our implementations of Named Entity Recognition and Parsey McParseface algorithms to extract even more information from your documents.