Algorithmia Blog - Deploying AI at scale

Using R to Build a Sentiment Analysis Forecasting Pipeline

Using R to Forecast Sentiment AnalysisTime series forecasting algorithms are a common method for predicting future values based on historical data using sequential data, such as snowfall per hour (anyone ready for snowboarding season?), customer sign-ups per day, or quarterly sales data. In this R recipe, we’ll show how to easily link algorithms together to create a data analysis pipeline for sentiment time series forecasting.

In a previous post, we introduced the Sentiment Time Series algorithm, which grabs the sentiment of unstructured text and creates a time series object. The output is a sentiment time series plot and JSON file with the positive, neutral, and negative sentiment frequency counts and timestamps.

Now we want to teach how to integrate this into your R project and build a pipeline for forecasting the sentiment of a time series using the Forecast algorithm. Forecasting sentiment time series data is useful when there is a seasonal component in a variety of use cases such as scheduling call center employees for a retail business, understanding market sentiment for stock market prediction or adjusting your social media marketing campaigns based on sentiment forecasts.

Let’s get started!

Prerequisites:

You’ll need a dataset of sentiment frequencies to use with the Forecast algorithm. If you don’t have a dataset handy, try using using our Twitter search algorithm to pull data and create a CSV. This could then be passed it into the Sentiment Time Series algorithm. Or, try our handy blog post on machine learning datasets. This analysis won’t perform that well if your data doesn’t contain seasonality or a linear trend.

Step 1: Install the Algorithmia Client

Let’s start by installing the Algorithmia package and stats library from CRAN, and loading them in your R environment:
install.packages("Algorithmia")
install.packages("stats")

library(algorithmia)
library(stats)

Now grab your Algorithmia API key, found on your profile page under the Credentials tab.

Credentials screenshot

Then create a client object by plugging in your API key:
client <- getAlgorithmiaClient("your_api_key")

Step 2: Analyze the Time Series Sentiment

Before we run the Forecast algorithm, we’ll need to get sentiment score frequencies. We do this by running the Sentiment Time Series algorithm. If your time series data set contains observations that aren’t equally spaced out or in sequential order, don’t worry. The algorithm will take care of that for you. Learn more about using the Sentiment Time Series algorithm.

Now, let’s run the algorithm. Remember to define where your files will be written to in the output_file and output_plot paths. The example shown is using Algorithmia’s hosted data source, which lets you store files and data models. We also support Dropbox and S3 data connections.

# This is input for the Sentiment Time Series algorithm
sent_freq <- function(){
  sent_input <- list(input_file="data://username/data_collection_name/time_comments.csv",
      output_plot="data://username/data_collection_name/sent_timeseries_plot.png",
      output_file="data://username/data_collection_name/sent_freq_file.json",
      start=data_start_date,
      end=data_end_date,
      freq=observations_per_season,
      dt_format=date_format,
      tm_zone=timezone)

  # Call the Sentiment Time Series algorithm
  sent_algo <- client$algo("nlp/SentimentTimeSeries/0.1.0")
  # Pipe in sent_input to write the files to your stated directories in output_plot and output_file paths
  sent_algo$pipe(sent_input)$result
}
sent_freq()

Most of the arguments passed into this algorithm are used to create a time series object in R. Check out documentation to learn more about the time series object in the stats library.

Step 3: Forecast the Sentiment Score Frequencies

Next, let’s get the JSON file from the previous step.
# Extract your data from the JSON file, saving it to a variable called input which is an R list.
forecast_input <- client$file("data://.my/testing/sent_freq_file.json")$getJson()
We then want to create a function that maps the timestamp with the newly generated forecast frequencies:
restructure_df <- function(sent_tm, results){
  # Map results of forecast with original timestamp
  structure(do.call(rbind.data.frame, Map('c', results, tm=sent_tm)),names=c('forecast_freq','timestamp'))
}

Now, let’s call the Forecast algorithm and pass in our sentiment frequencies from our JSON file. In order to just get the forecast frequency results without any metadata use $result at the end of the algo$pipe(input).

plot_sent_ts <- function(){
  # Call Forecast algorithm and retrieve result for pos, neg, neu sentiment
  algo <- client$algo("TimeSeries/Forecast/0.2.0")
  pos_results <- algo$pipe(forecast_input$pos$freq)$result
  neg_results <- algo$pipe(forecast_input$neg$freq)$result
  neu_results <- algo$pipe(forecast_input$neu$freq)$result

  pos_df <- restructure_df(forecast_input, pos_results)
  neg_df <- restructure_df(forecast_input, neg_results)
  neu_df <- restructure_df(forecast_input, neu_results)

  # Creates time series objects
  neu_ts <- ts(neu_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
  pos_ts <- ts(pos_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
  neg_ts <- ts(neg_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))

  # Option to plot all three sentiment forecasts on one plot
  # plot_ts <- ts.plot(pos_ts, neg_ts, neu_ts, gpars = list(col = c("green", "red", "blue")))
  # Plot just the neutral sentiment
  plot_ts <- ts.plot(neu_ts)
  
  # Return the plot you want to save to your Desktop
  return(plot_ts)
}

We’re using the first timestamp and last timestamp data observations for our time series object.

Step 4: Save the Time Series Plot

We’ll save the plot as a PNG and to the same directory as our R script.

plot_forecast <- function(){
  # Create a png object with the filename of your choice
  png(filename="neutral_forecast.png")
  # Create the plot
  plot_sent_ts()
  # Turn off graphical device
  dev.off()
}

plot_forecast()

If everything went as planned, then you should get a plot that shows the forecast sentiment time series looking something like this:

neutral sentiment forecast plot

While this shows the end result being a simple plot, you could use the accompanying JSON file to pipe it into Tableau or Plotly for better data visualizations.

Conclusion

In this post, we showed you how to easily link algorithms together to create a data analysis pipeline in R. The algorithms used in this recipe were Sentiment Time Series and Forecast.

Get the complete Sentiment Analysis Forecasting Pipeline on GitHub, and then run it from your console or IDE with:

Rscript name_of_script.R
Here’s the complete code snippet for trying this recipe out for yourself:
install.packages("Algorithmia")
install.packages("stats")

library(algorithmia)
library(stats)

client <- getAlgorithmiaClient("your_api_key")
# This is input for the Sentiment Time Series algorithm
sent_freq <- function(){
  sent_input <- list(input_file="data://username/data_collection_name/time_comments.csv",
      output_plot="data://username/data_collection_name/sent_timeseries_plot.png",
      output_file="data://username/data_collection_name/sent_freq_file.json",
      start=data_start_date,
      end=data_end_date,
      freq=observations_per_season,
      dt_format=date_format,
      tm_zone=timezone)

  # Call the Sentiment Time Series algorithm
  sent_algo <- client$algo("nlp/SentimentTimeSeries/0.1.0")
  # Pipe in sent_input to write the files to your stated directories in output_plot and output_file paths
  sent_algo$pipe(sent_input)$result
}
sent_freq()

restructure_df <- function(sent_tm, results){
  # Map results of forecast with original timestamp
  structure(do.call(rbind.data.frame, Map('c', results, tm=sent_tm)),names=c('forecast_freq','timestamp'))
}

plot_sent_ts <- function(){
  # Extract your data from the JSON file, saving it to a variable called input which is an R list. 
  forecast_input <- client$file("data://.my/testing/sent_freq_file.json")$getJson()
  
  # Call Forecast algorithm and retrieve result for pos, neg, neu sentiment
  algo <- client$algo("TimeSeries/Forecast/0.2.0")

  # Pipe each sentiment frequency input into algor$pipe and retrieve results
  pos_results <- algo$pipe(forecast_input$pos$freq)$result
  neg_results <- algo$pipe(forecast_input$neg$freq)$result
  neu_results <- algo$pipe(forecast_input$neu$freq)$result

  # Map each sentiment result with their corresponding timestamp
  pos_df <- restructure_df(forecast_input, pos_results)
  neg_df <- restructure_df(forecast_input, neg_results)
  neu_df <- restructure_df(forecast_input, neu_results)

  # Creates time series objects
  neu_ts <- ts(neu_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
  pos_ts <- ts(pos_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
  neg_ts <- ts(neg_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))

  # Option to plot all three sentiment forecasts on one plot or you can plot only one sentiment at a time
  plot_ts <- ts.plot(pos_ts, neg_ts, neu_ts, gpars = list(col = c("green", "red", "blue")))

  # Return the plot you want to save to your Desktop
  return(plot_ts)
}

plot_forecast <- function(){
  # Create a png object with the filename of your choice
  png(filename="sentiment_forecast.png")
  # Create the plot
  plot_sent_ts()
  # Turn off graphical device
  dev.off()
}
plot_forecast()