Algorithmia

Understand Customer Data Using Time Series and Sentiment Analysis

Analyzing Sentiment Over TimeWhile data science offers many ways to visualize and make predictions with your customer data, most can be time consuming. Worse, you often can’t reuse your code with other datasets.

Sentiment Time Series is a microservice that can be used on a variety of datasets to process unstructured text and return a sentiment time series plot and frequency. Since the microservice handles most of the data processing via an API call, you can spend more time concentrating on your analysis and less time writing code. 

By combining time series analysis with natural language processing, we’re able to show how the sentiment of unstructured text data changes over time, as well as use it to predict future data trends.

For instance, we could look at the sentiment time series plot of customer support questions to understand the positive, negative, and neutral trends. This allows us to make better business decisions (i.e. identify the impact of new features in our product) and improve our interactions with customers by providing better support. This same pipeline could be used in numerous situations, from brand managers wanting to track the social sentiment of mentions on Twitter, to analyzing the message history of a Slack channel.

What is the Sentiment Time Series Algorithm?

The Sentiment Time Series algorithm is a microservice that combines the Social Sentiment Analysis algorithm and the R time series libraries dplyr, plyr, and rjson to produce a sentiment plot showing positive, negative, and neutral trends. The API returns a JSON file with the frequencies grouped by sentiment and the corresponding dates.

The microservice groups the data by date and sentiment to create counts of the positive, negative, and neutral sentiments, which can be plotted together on a single plot. This also makes the data ready for use with algorithms like Forecast, Simple Moving Average, Linear Detrend, or Remove Seasonality to predict trends and understand how sentiment is changing with our data over time.

With Social Sentiment Analysis, we’re using an implementation of VADER, a rule-based model to get the sentiment score of unstructured, but cleaned text data. The model was trained on short text documents, like tweets and chat logs.

How to Use Sentiment Time Series

To check for sentiment over time, we pass the algorithm a JSON object with several required arguments:

  • input_file: Path your CSV file containing the timestamps and text data (i.e. comments; messages). The timestamp comes first. There cannot be any headers in this file.
  • output_plot: The location to save the sentiment time series plot. It will need to be stored in your Algorithmia data collection. For more, check out the hosted data docs.
  • output_file: The location to save the sentiment time series file that holds the JSON data. Like the output_plot, this needs to be saved in your data collection.
  • start: A month-year array used to mark the start data in the sentiment time series plot. You can also pass just the year.
  • end: A month-year array that marks the end date in the sentiment time series plot.
  • freq: The number of observations per unit of time. For example, if your data was collected once per day, then the frequency would be 365. If it was once per month, the freq would be 12.
  • dt_format: Represents the date format that your data is in. For example if your date is: 5/26/2016 then your dt_format should be: “dt_format”: “%m/%d/%Y”

This microservice is provided as an API endpoint. To use it, you’ll need a free Algorithmia account. Next, choose and install the Algorithmia client for the programming language you’re comfortable with. Now you’re ready to make your first API call.

Sample Input

{
    "input_file": "data://username/data_collection_name/time_comments.csv",
    "output_plot": "data://username/data_collection_name/sent_timeseries_plot.png",
    "output_file": "data://username/data_collection_name/sent_freq_file.json",
    "start": [2015, 4],
    "end": [2016,9],
    "freq": 12,
    "dt_format": "%m/%Y",
    "tm_zone": "GMT"
}

Sample API Call in Python

client = Algorithmia.client('your_api_key')
algo = client.algo('nlp/SentimentTimeSeries/0.1.0')
print(algo.pipe(input).results)

Sample API Call in R 

input <- list(input_file="data://username/data_collection_name/time_comments.csv",
output_plot="data://username/data_collection_name/sent_timeseries_plot.png",
output_file="data://username/data_collection_name/sent_freq_file.json",
start=c(2015, 4),
end=c(2016, 9),
freq=12,
dt_format="%m/%Y",
tm_zone="GMT")

client <- getAlgorithmiaClient("your_algorithmia_api_key")
algo <- client$algo("nlp/SentimentTimeSeries/0.1.0")
results <- algo$pipe(input)$result

Sentiment Time Series Output

This algorithm outputs two files: the first is an R-generated plot of the sentiment time series. The other is the sentiment frequency JSON object.

Sentiment Time Series Plot

The sentiment time series tick marks will show the frequency on the y-axis and the time in numeric form (i.e. 2016.05 for May of 2016) on the x-axis. The positive sentiment line will be shown in green, the negative in red, and the neutral in blue.

sentiment time series plot

Sentiment Time Series Frequency JSON File

The JSON file is to be used with any of the forecasting and smoothing algorithms mentioned above, and is split into positive, negative, and neutral time series with the corresponding frequencies.

{
   "pos":{"tm":["01/01/2016","01/21/2016"], "freq":[2,1]},
   "neg":{"tm":["02/13/2016","02/18/2016"], "freq":[1,1]},
   "neu":{"tm":["01/02/2016","01/05/2016"], "freq":[1,1]}
}

In a future post, we’ll show you how to tie this to the Forecast algorithm to create a full pipeline to read your customer comments, tweets, Slack chats, or other unstructured text data to find the positive, negative, and neutral frequencies, and plot them to predict future sentiment frequencies.

Happy coding!