All posts by Whitney Nistrian

Improving Customer Retention Analytics With Machine Learning

ML allows companies to base their product and marketing retention strategies on predictive customer analytics

Customers have an abundance of options when it comes to products for purchase. This excess of options, however, increases the risk of poor customer retention. Since acquiring new customers costs much more than keeping current customers, a higher retention rate is always better.

Customer retention represents the number of customers who continue purchasing from a company after their first purchase. This is usually measured as the customer retention rate, which is the percentage of customers your company has retained over a certain time period. The opposite of retention rate is churn rate, which represents the percentage of customers a company has lost over a given time period.

Customer retention analytics can be done through machine learning, allowing companies to base their product and marketing strategies on predictive customer analytics rather than less reliable predictions made manually.

In a survey of more than 500 business decision-makers that Algorithmia conducted in the fall of 2018, 59 percent of large companies said that customer retention was their primary use case for machine learning technology. 

What Is Customer Retention Analysis?

Customer retention analysis is the application of statistics in order to understand how long customers are retained before churning out and to identify trends in customer retention. This type of analysis discerns how long customers usually stick around, whether or not seasonality affects customer retention, and discovers behaviors and factors that differentiate retained customers from churned customers.

Why Is Customer Retention Analysis Important For Your Company?

Customer retention analysis is important for your company because it helps you understand which personas have higher retention rates and discern which features impact retention. This provides actionable insights that can help you make more effective product and marketing decisions. 

It can be difficult for a product or sales team to know how well a product is actually performing with the target audience. They may think that features and messaging is on brand and clear because acquisition numbers are growing. However, just because new customers are purchasing a product does not necessarily mean customers like the product or service enough to stick around. 

That is where customer retention analytics comes in. Every company needs data in order to make effective business and marketing decisions. Machine learning makes this easier than it has ever been before, which is great news for companies that wish to leverage this data.

How Do You Analyze Customer Retention?

Use past customer data to predict future customer behavior

Machine learning for customer retention analytics uses past customer data to predict future customer behavior. This is done using big data. In today’s data-driven world, companies can track hundreds of data points about thousands of customers. Therefore, the input data for the customer retention model could be any combination of the following:

  • Customer demographics
  • Membership/loyalty rewards
  • Transaction/purchase history
  • Email/phone call history
  • Any other relevant customer data

During the model training process, this data will be used to find correlations and patterns to create the final trained model to predict customer retention. Not only does this tell you the overall churn risk of your customer base, but it can determine churn risk down to the individual customer level. You could use this data to proactively market to those customers with higher churn risk or find ways to improve your product, customer service, messaging, etc. in order to lower your overall churn rate.

How Do You Improve Retention?

To improve retention, you have to first understand the cause of your retention issues. As discussed, machine learning models are a very efficient way to analyze customer retention to determine risks and solutions. 

Data science teams can build the machine learning models necessary for this type of predictive analytics, but there are challenges associated with developing machine learning processes. For example, deploying models written in different languages is not easy, to say the least. Algorithmia’s AI Layer solves these issues using a serverless microservice architecture, which allows each service to be deployed independently with options to pipeline them together. 

Another challenge is overcoming the cost of time lost to building, training, testing, deploying, and managing a model, let alone multiple in a machine learning program. 

Improving customer retention is one of the main uses Algorithmia’s early adopters focused on because it is one of the simpler machine learning models to build and use, and it’s even easier with the serverless microservices framework provided by the AI Layer. Our platform has built-in tools for versioning, deployment, pipelining, and integrating with customers’ current workflows. The AI Layer integrates with any technology your organization is currently using, fitting in seamlessly to make machine learning easier, getting you from data collection to model deployment and analysis much faster. 

To learn more about how the AI Layer can benefit your company, watch a demo to see how much easier your machine learning projects can be.

Connectivity in Machine Learning Infrastructure 

ML Life Cycle | Connect, deploy, scale, and manage

As companies begin developing use cases for machine learning, the infrastructure to support their plans must be able to adapt as data scientists experiment with new and better processes and solutions. Concurrently, organizations must connect a variety of systems into a platform that delivers consistent results.

Machine learning architecture consists of four main groups:

  • Data and Data Management Systems
  • Training Platforms and Frameworks
  • Serving and Life Cycle Management
  • External Systems 

ML-focused projects generate value only after these functional areas connect into a workflow.

In part 3 of our Machine Learning Infrastructure whitepaper series, “Connectivity,” we discuss how those functional areas fit together to power the ML life cycle. 

It all starts with data

Most data management systems include built-in authentication, role access controls, and data views. In more advanced cases, an organization will have a data-as-a-service engine that allows for querying data through a unified interface. 

Even in the simplest cases, ML projects likely rely on a variety of data formats—different types of data stores from many different vendors. For example, one model might train on images from a cloud-based Amazon S3 bucket, while another pulls rows from on-premises PostgreSQL and SQL Server databases, while a third interprets streaming transactional data from a Kafka pipeline.  

machine learning architecture

Select a training platform

Training platforms and frameworks comprise a wide variety of tools used for model building and training. Different training platforms offer unique features. Libraries like TensorFlow, Caffe, and PyTorch offer toolsets to train models. 

The freedom of choice is paramount, as each tool specializes in certain tasks. Models can be trained locally on a GPU and then deployed or they can be trained directly in the cloud using Dataiku, Amazon, SageMaker, Azure ML Studio, or other platforms or processors.

Life cycle management systems

Model serving encompasses all the services that allow data scientists to deliver trained models into production and maintain them. Such services include the abilities to ingest models, catalog them, integrate them into DevOps workflows, and manage the ML life cycle. 

Fortunately, each ML architecture component is fairly self-contained, and the interactions between those components are fairly consistent:

  • Data informs all systems through queries.
  • Training systems export model files and dependencies.
  • Serving and life cycle management systems return inferences to applications and model pipelines, and export logs to systems of record.
  • External systems call models, trigger events, and capture and modify data.

It becomes easy to take in data and deploy ML models when these functions are grouped together. 

External systems

External Systems can consume model output and integrate it in other places. Based on the type of deployment, we can create different user interfaces. For example, the model output can integrate into a REST API or another web application. RESTful APIs assist us in calling our output from any language and integrating it into new or existing project. 


Connectivity and machine learning sophistication

Data have made the jobs of business decision makers easier. But data is only useful after models interpret it, and model inference only generates value when external apps can integrate and consume it. That journey toward integration has two routes: horizontal integration and loosely coupled, tight integration.  

The quickest way to develop a functioning ML platform is by supporting only a subset of solutions from each of the functional groups to more quickly integrate each into a horizontal platform. Doing so requires no additional workforce training and adds speed to workflows already in place. 

Unfortunately, horizontal integration commits an organization to full-time software development rather than building and training models to add business value. An architecture that allows each system to evolve independently, however, can help organizations choose the right components for today without sacrificing the flexibility to rethink those choices tomorrow. 

To enable a loosely coupled, tightly integrated approach, a deployment platform must support three kinds of connectivity: 

  • Publish/Subscribe 
  • Data Connectors
  • RESTful APIs

Publish/subscribe

Publish/subscribe (pub/sub) is an asynchronous, message-oriented notification pattern. In such a model, one system acts as a publisher, sending events to a message broker. Through the message broker, subscriber systems explicitly enroll in a channel, and the hub forwards and verifies delivery of publisher notifications, which can then be used by subscribers as event triggers. 

Algorithmia’s AI Layer has configurable event listeners that allow users to trigger actions based on input from pub/sub systems. 

Pub/sub approach

Data connectors

While the model is the engine of any machine learning system, data is both the fuel and the driver. Data feeds the model during training, influences the model in production, then retrains the model in response to drift. 

As data changes, so does its interaction with the model, and to support that iterative process, an ML deployment and management system must integrate with every relevant data connector.

RESTful APIs

Because there is a variety of requesting platforms and high unpredictability therein, a loose coupling is, again, the most elegant answer. RESTful APIs are the most elegant implementation, due to these required REST constraints:

  • Uniform interface: requests adhere to a standard format
  • Clint-Server: the server only interacts with the client through requests
  • Stateless: all necessary information must be included within a request
  • Layered system: the REST client passes any layers between itself and the server
  • Cacheable: Developers can store certain responses

To learn more about how connectivity feeds into the machine learning life cycle, download the full whitepaper.

And visit our website to read parts 1 and 2 of the Machine Learning Infrastructure whitepaper series.

We got a new look!

Algorithmia developers unveiled a new UI on our user platform this week. Now, when you log into Algorithmia, you’ll see a personalized dashboard with your account information and recent activity, enabling easy account management and navigation.

New Algorithmia UI

This change comes about as we mark the four-year anniversary of the Algorithmia website, but the redesign of the platform was tied to creating a more user-friendly and intuitive experience for our users.

A Mini Tour

There are two new menus: global and navigation. The global menu (the purple one at the left) is designed to keep primary actions readily available (creating new algorithms, data sources, searching, accessing your notifications, and profile actions).

The navigation menu (the light gray one) provides quick access to the main pages of the site (your user pages, algorithms, data, and the accompanying technical docs).

We were excited to move to a drawer-based navigation style, and surfacing application functionality to be more consistent. We’re making the app more intuitive and usable for everyone.
-Ryan Miller, Algorithmia Frontend Engineering Lead

What’s New?

We designed the new UI with user control in mind. Some of the features include:

  • Lists organization: the new UI provides easy access to saved work, API Keys, and set organizations for model sharing.
  • Dashboard navigation: easy access via two nav panes.
  • Mobile integration: much of the updated drawer navigation features of the site are now available for mobile users as well.

“A good UI should get out of the way, but as the app grew we began to see the opposite—we were inhibiting user effectiveness. So we broke it down and are iteratively building it back up with usability at the foundation so users can focus on their work—not how to use the app.”
      -James Hoover, Algorithmia Product Designer

We welcome any feedback you have as we continue to make the Algorithmia user platform ever more intuitive and usable.

Check out our updated UI today!

Read More…

Data Scientists and Deploying Models From any Framework

any framework

Asking a data scientist to work with only one framework is like asking a carpenter to work with only a hammer. It’s essential that professionals have access to all the right tools for the job.

It’s time to rethink best practices for leveraging and building ML infrastructure and set the precedent that data scientists should be able to use whichever tools they need at any time.

Now, certainly some ML frameworks are better suited to solve specific problems or perform specific tasks, and as projects become more complex, being able to work across multiple frameworks and with various tools will be paramount.  

For now, machine learning is still in its pioneering days, and though tech behemoths have created novel approaches to ML (Google’s TensorFlow, Amazon’s SageMaker, Uber’s Michelangelo), most ML infrastructure is still immature or inflexible at best, which severely limits data scientists and DevOps. This should change.

Flexible frameworks and ML investment

Most companies don’t have dozens of systems engineers who can devote several years to building and maintaining their own custom ML infrastructure or learning to work within new frameworks, and sometimes, open-source models are only available in a specific framework. This could restrict some ML teams from using them if the models don’t work with their pre-existing infrastructure. Companies can, and should, have the freedom to work concurrently across all frameworks. The benefits of doing so are multifold:

Increase interoperability of ML teams

Often machine learning is conducted in different parts of an organization by data scientists seeking to automate their processes. These silos are not collaborating with other teams doing similar work. Being able to blend teams together while still retaining the merits of their individual work is key. It will de-duplicate efforts as ML work becomes more transparent within an organization.

Allow for vendor flexibility and pipelining

You don’t want to end up locked-in to only one framework or only one cloud provider. The best framework for a specific task today may be overtaken next month or next year by a better product, and businesses should be able to scale and adapt as they grow. Pipelining different frameworks together creates the environment for using the best tools.

any framework

Reduce the time from model training to deployment

Data scientists write models in the framework they know best and hand them over to DevOps, who rewrite them to work within their infrastructure. Not only does this usually decrease the quality of a model, it creates a huge iteration delay.

Enable collaboration and prevent wasted resources

If a data scientist is accustomed to PyTorch but her colleague has only used TensorFlow, a platform that supports both people’s work saves times and money. Forcing work to be done with tools that aren’t optimal for a given project is like showing up with knives to a gunfight.

Leverage off-the-shelf products

There’s no need to constantly reinvent the wheel; if an existing open-source service or dataset can do the job, then that becomes the right tool.

Position your team for future innovations

Because the ML story is far from complete, being flexible now will enable a company to pivot more easily come whatever tech developments arise.

How to deploy models in any framework

Attaining framework flexibility, however, is no small feat. The main steps to enable deploying ML models from multiple frameworks are as follows:

Dependency management

It’s fairly simple to run a variety of frameworks on a laptop, but trying to productionize them requires a way to manage all dependencies for running each model, in addition to interfacing with other tools to manage compute.

Containerization and orchestration

Putting a model in a container is straightforward. When companies have only a handful of models, they often task junior engineers with manually containerizing, putting the models into production, and managing scale-up. This process unravels as usage increases, model numbers grow, and as multiple versions of models run in parallel to serve various applications.

Many companies are using Kubernetes to orchestrate containers—there are a variety of open-source projects on components that will do some of the drudge work of machine learning for container orchestration. Teams who have attempted to do this in-house have found that it requires constant maintenance and becomes a Frankenstein of modular components and spaghetti code that falls over when trying to scale. Worse still, after models are in production, you discover that Kubernetes doesn’t deal well with many machine learning use cases.

API creation and management

Handling many frameworks requires a disciplined API design and seamless management practice. When data scientists begin to work faster, a growing portfolio of models with an ever-increasing number of versions needs to be continuously managed, and that can be difficult.

Languages and DevOps

Machine learning has vastly different requirements than traditional compute, including the freedom to support models written in many languages. A problem arises, however, when data scientists working in R or Python sync with DevOps teams who then need to rewrite or wrap the models to work in the language of their current infrastructure.

Down the road

Eventually, every company that wants to extend its capabilities with ML is going to have to choose between enabling a multi-framework solution like the AI Layer or undertaking a massive, ongoing investment in building and maintaining an in-house DevOps platform for machine learning.

We Run the World’s Machine Learning. Literally.

Satellite imagery

Algorithmia’s AI Layer Powers the UN Methods Service

Algorithmia is renewing its commitment to global humanitarian efforts by making powerful ML tools available to everyone.

Economic and population data are key elements of planning and decision-making in first-world countries, but access to sophisticated analytic and compute power is limited or non-existent in developing countries.

Meeting the Problem Head On

Working in conjunction with the United Nations Global Platform for Official Statistics, Algorithmia built a repository of algorithms that are readily available to any data scientist of any member state at any time. There are models for predicting economic, environmental, and social trends to enable smarter decision-making for strategies like agricultural planning, flooding probabilities, and curbing deforestation.

The United Nations Global Platform for Official Statistics sought to build the algorithm repository as part of the Sustainable Development Goals (SDGs). The SDGs aim to meet global challenges in healthcare, poverty, environmental degradation, and inequality by 2030. The algorithm repository will serve member states to “establish strategies to reuse and adapt algorithms across topics and to build implementations for large volumes of data.” UN Big Data

Building an Algorithm Marketplace for the Developing World

The UN wanted a way to share models with underdeveloped countries to curate economic, environmental, and social data to save lives and improve health and environmental conditions. Using the UN algorithm repository, for example, a developing country could model farmland satellite imagery to predict draughts, urbanization trends, or migration patterns.

Such statistical information can be used in myriad ways by both humanitarian organizations and policy-making, governmental bodies to make smarter resource-allocation decisions, better understand urban planning needs from population data, and even predict migration crop cycles using geospatial imagery.

The UN’s partnership with Algorithmia demonstrates our dedication to leveraging AI and machine learning to seek solutions to global problems. We are so looking forward to empowering the developing world, one algorithm at a time.

Read the Case Study