Algorithmia Blog

Understand Customer Data Using Time Series and Sentiment Analysis

While data science offers many ways to visualize and make predictions with your customer data, most can be time consuming. Worse, you often can’t reuse your code with other datasets.

Sentiment Time Series is a microservice that can be used on a variety of datasets to process unstructured text and return a sentiment time series plot and frequency. Since the microservice handles most of the data processing via an API call, you can spend more time concentrating on your analysis and less time writing code. 

By combining time series analysis with natural language processing, we’re able to show how the sentiment of unstructured text data changes over time, as well as use it to predict future data trends.

For instance, we could look at the sentiment time series plot of customer support questions to understand the positive, negative, and neutral trends. This allows us to make better business decisions (i.e. identify the impact of new features in our product) and improve our interactions with customers by providing better support. This same pipeline could be used in numerous situations, from brand managers wanting to track the social sentiment of mentions on Twitter, to analyzing the message history of a Slack channel.

What is the Sentiment Time Series Algorithm?

The Sentiment Time Series algorithm is a microservice that combines the Social Sentiment Analysis algorithm and the R time series libraries dplyr, plyr, and rjson to produce a sentiment plot showing positive, negative, and neutral trends. The API returns a JSON file with the frequencies grouped by sentiment and the corresponding dates.

The microservice groups the data by date and sentiment to create counts of the positive, negative, and neutral sentiments, which can be plotted together on a single plot. This also makes the data ready for use with algorithms like Forecast, Simple Moving Average, Linear Detrend, or Remove Seasonality to predict trends and understand how sentiment is changing with our data over time.

With Social Sentiment Analysis, we’re using an implementation of VADER, a rule-based model to get the sentiment score of unstructured, but cleaned text data. The model was trained on short text documents, like tweets and chat logs.

How to Use Sentiment Time Series

To check for sentiment over time, we pass the algorithm a JSON object with several required arguments:

This microservice is provided as an API endpoint. To use it, you’ll need a free Algorithmia account. Next, choose and install the Algorithmia client for the programming language you’re comfortable with. Now you’re ready to make your first API call.

Sample Input

{
    "input_file": "data://username/data_collection_name/time_comments.csv",
    "output_plot": "data://username/data_collection_name/sent_timeseries_plot.png",
    "output_file": "data://username/data_collection_name/sent_freq_file.json",
    "start": [2015, 4],
    "end": [2016,9],
    "freq": 12,
    "dt_format": "%m/%Y",
    "tm_zone": "GMT"
}

Sample API Call in Python

client = Algorithmia.client('your_api_key')
algo = client.algo('nlp/SentimentTimeSeries/0.1.0')
print(algo.pipe(input).results)

Sample API Call in R 

input <- list(input_file="data://username/data_collection_name/time_comments.csv",
output_plot="data://username/data_collection_name/sent_timeseries_plot.png",
output_file="data://username/data_collection_name/sent_freq_file.json",
start=c(2015, 4),
end=c(2016, 9),
freq=12,
dt_format="%m/%Y",
tm_zone="GMT")

client <- getAlgorithmiaClient("your_algorithmia_api_key")
algo <- client$algo("nlp/SentimentTimeSeries/0.1.0")
results <- algo$pipe(input)$result

Sentiment Time Series Output

This algorithm outputs two files: the first is an R-generated plot of the sentiment time series. The other is the sentiment frequency JSON object.

Sentiment Time Series Plot

The sentiment time series tick marks will show the frequency on the y-axis and the time in numeric form (i.e. 2016.05 for May of 2016) on the x-axis. The positive sentiment line will be shown in green, the negative in red, and the neutral in blue.

Sentiment Time Series Frequency JSON File

The JSON file is to be used with any of the forecasting and smoothing algorithms mentioned above, and is split into positive, negative, and neutral time series with the corresponding frequencies.

{
   "pos":{"tm":["01/01/2016","01/21/2016"], "freq":[2,1]},
   "neg":{"tm":["02/13/2016","02/18/2016"], "freq":[1,1]},
   "neu":{"tm":["01/02/2016","01/05/2016"], "freq":[1,1]}
}

In a future post, we’ll show you how to tie this to the Forecast algorithm to create a full pipeline to read your customer comments, tweets, Slack chats, or other unstructured text data to find the positive, negative, and neutral frequencies, and plot them to predict future sentiment frequencies.

Happy coding!