All posts by James Sutton

Introduction to Language Identification

Identifying the language of text programmatically
Quick, what languages are these two sentences written in:

“Hey bana bir sorununuz olur mu?”

What about this one?

“Halló ég er með vandamál getur þú hjálpað mér?”

Not easy, right?

Figuring out a document’s source language is an essential first step for many cross-language tools and that’s why we’ve implemented a Language Identification algorithm. Read More…

Introduction to Automatic Text Summarization

Automatic text summarization algorithmSifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what the heck someone is talking about in a paper or report.

And, if you need to get through hundreds of documents – good luck.

Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way.
Read More…