Algorithmia Blog - Deploying AI at scale

Connectivity in Machine Learning Infrastructure 

ML Life Cycle | Connect, deploy, scale, and manage

As companies begin developing use cases for machine learning, the infrastructure to support their plans must be able to adapt as data scientists experiment with new and better processes and solutions. Concurrently, organizations must connect a variety of systems into a platform that delivers consistent results.

Machine learning architecture consists of four main groups:

  • Data and Data Management Systems
  • Training Platforms and Frameworks
  • Serving and Life Cycle Management
  • External Systems 

ML-focused projects generate value only after these functional areas connect into a workflow.

In part 3 of our Machine Learning Infrastructure whitepaper series, “Connectivity,” we discuss how those functional areas fit together to power the ML life cycle. 

It all starts with data

Most data management systems include built-in authentication, role access controls, and data views. In more advanced cases, an organization will have a data-as-a-service engine that allows for querying data through a unified interface. 

Even in the simplest cases, ML projects likely rely on a variety of data formats—different types of data stores from many different vendors. For example, one model might train on images from a cloud-based Amazon S3 bucket, while another pulls rows from on-premises PostgreSQL and SQL Server databases, while a third interprets streaming transactional data from a Kafka pipeline.  

machine learning architecture

Select a training platform

Training platforms and frameworks comprise a wide variety of tools used for model building and training. Different training platforms offer unique features. Libraries like TensorFlow, Caffe, and PyTorch offer toolsets to train models. 

The freedom of choice is paramount, as each tool specializes in certain tasks. Models can be trained locally on a GPU and then deployed or they can be trained directly in the cloud using Dataiku, Amazon, SageMaker, Azure ML Studio, or other platforms or processors.

Life cycle management systems

Model serving encompasses all the services that allow data scientists to deliver trained models into production and maintain them. Such services include the abilities to ingest models, catalog them, integrate them into DevOps workflows, and manage the ML life cycle. 

Fortunately, each ML architecture component is fairly self-contained, and the interactions between those components are fairly consistent:

  • Data informs all systems through queries.
  • Training systems export model files and dependencies.
  • Serving and life cycle management systems return inferences to applications and model pipelines, and export logs to systems of record.
  • External systems call models, trigger events, and capture and modify data.

It becomes easy to take in data and deploy ML models when these functions are grouped together. 

External systems

External Systems can consume model output and integrate it in other places. Based on the type of deployment, we can create different user interfaces. For example, the model output can integrate into a REST API or another web application. RESTful APIs assist us in calling our output from any language and integrating it into new or existing project. 


Connectivity and machine learning sophistication

Data have made the jobs of business decision makers easier. But data is only useful after models interpret it, and model inference only generates value when external apps can integrate and consume it. That journey toward integration has two routes: horizontal integration and loosely coupled, tight integration.  

The quickest way to develop a functioning ML platform is by supporting only a subset of solutions from each of the functional groups to more quickly integrate each into a horizontal platform. Doing so requires no additional workforce training and adds speed to workflows already in place. 

Unfortunately, horizontal integration commits an organization to full-time software development rather than building and training models to add business value. An architecture that allows each system to evolve independently, however, can help organizations choose the right components for today without sacrificing the flexibility to rethink those choices tomorrow. 

To enable a loosely coupled, tightly integrated approach, a deployment platform must support three kinds of connectivity: 

  • Publish/Subscribe 
  • Data Connectors
  • RESTful APIs

Publish/subscribe

Publish/subscribe (pub/sub) is an asynchronous, message-oriented notification pattern. In such a model, one system acts as a publisher, sending events to a message broker. Through the message broker, subscriber systems explicitly enroll in a channel, and the hub forwards and verifies delivery of publisher notifications, which can then be used by subscribers as event triggers. 

Algorithmia’s AI Layer has configurable event listeners that allow users to trigger actions based on input from pub/sub systems. 

Pub/sub approach

Data connectors

While the model is the engine of any machine learning system, data is both the fuel and the driver. Data feeds the model during training, influences the model in production, then retrains the model in response to drift. 

As data changes, so does its interaction with the model, and to support that iterative process, an ML deployment and management system must integrate with every relevant data connector.

RESTful APIs

Because there is a variety of requesting platforms and high unpredictability therein, a loose coupling is, again, the most elegant answer. RESTful APIs are the most elegant implementation, due to these required REST constraints:

  • Uniform interface: requests adhere to a standard format
  • Clint-Server: the server only interacts with the client through requests
  • Stateless: all necessary information must be included within a request
  • Layered system: the REST client passes any layers between itself and the server
  • Cacheable: Developers can store certain responses

To learn more about how connectivity feeds into the machine learning life cycle, download the full whitepaper.

And visit our website to read parts 1 and 2 of the Machine Learning Infrastructure whitepaper series.