Serving GPflow models

Deploying models as webservices has become straightforward using TensorFlow: exporting the graph as SavedModel allows easy deployment with Tensorflow Serve or the Cloud ML engine. But how can you obtain a SavedModel when using the GPflow library for Gaussian Processes?

Many of our AI solutions developed for customers are ultimately deployed on Google Cloud Platform. Containerized applications serve APIs and interfaces, and internally make use of APIs of GCP products for storage, large scale data processing and machine learning.

One of such APIs is the ML Engine online model prediction service, which allows to quickly deploy a TensorFlow model as a scalable REST API web service. Even though we do not host all our models using this service (sometimes it is easier to keep the model in a container as part of the application), using the ML engine offers a few advantages:

  • Online migration: in case we retrain a model a new model version can be created using and traffic can be migrated, without downtime
  • The decoupling between the software and the model as two separate services facilitates software engineers and data scientists working together
  • Scalability: especially when models grow larger, the ML engine can scale the evaluation of the model. If the model is part of the application code, the entire application needs to scale as well, which can add additional complexity in case synchronization between the replicas can not be avoided.

The GCP documentation of the ML Engine covers how to host models for online prediction: after exporting the tf.Graph in SavedModel format deploying is straightforward by using the gcloud command line tool. Alternatively, an API is available to automate the deployment process.

GPflow, a Tensorflow-based framework which makes using these model quite straightforward. Exporting a SavedModel is unfortunately not supported out of the box, but can be accomplished fairly easy, as shown below./p>

Creating a GP

We’ll create a simple model to get started:

import gpflow
import numpy as np

# data
X = np.random.rand(20, 2) * 2 -1
Y = np.sum(np.square(X), axis=1, keepdims=True)

# setup model
kernel = gpflow.kernels.RBF(2, ARD=False)
gp = gpflow.models.GPR(X, Y, kernel)

# Train
optimizer =
optimizer.minimize(gp, maxiter=20, disp=True)

This was quite straightforward: some data is generated, a model is constructed and optimized. GPflow comes with an auto-build feature which, in this case, uses the default tf.Graph and creates a session.

Now that we have trained our model, we’d like to export a SavedModel for hosting an online prediction model on the ML Engine. It is important to note that the steps taken so far constructs the computation graph for computing the likelihood function to optimize the hyperparameters of the model. To predict, another computation is needed, unlike for instance neural networks where the training is based on making predictions (forward pass) and applying backpropagation to tune the network weights.

Exporting a SavedModel for prediction

The computation of predictions in GPflow is implemented by the predict_f method. This method is however decorated with AutoFlow, meaning its input and output are no tf.Tensors. Instead, we’ll use the (private) _build_predict method. Following function can be used to obtain a SavedModel for the predictions:

def save_build_method(method, inputs, outputs, export_dir='model'):
   # Get the model instance from the bound method
   obj = method.__self__
   assert isinstance(obj, gpflow.models.Model)

   sess = obj.enquire_session(session=None)
   with sess.graph.as_default():
       # Create placeholders for the inputs (much like the AutoFlow decorator)
       names, specs = zip(*inputs)
       arguments = [tf.placeholder(*spec) for spec in specs]

       # Obtain the output tensor(s)
       result = method(*arguments)
       assert len(result) == len(outputs)

       # Initialize the graph
       obj.initialize(session=sess, force=False)

       # For the export we need to create build_tensor_info for the I/O
       inputs_info = dict(zip(names, map(tf.saved_model.utils.build_tensor_info, arguments)))
       outputs_info = dict(zip(outputs, map(tf.saved_model.utils.build_tensor_info, result)))
       # Create the signature definition
       signature_def = tf.saved_model.signature_def_utils.build_signature_def(

       signature_def_map = {
           tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_def
       # Now we are ready to export the graph along with the session values
       # The SavedModel is saved in its own directory
       builder = tf.saved_model.builder.SavedModelBuilder(export_dir)

Now we can proceed and export the prediction for our two dimensional model. The input specification is a list of tuples of the format (name, AutoFlow specification). The output specification is simply a list of names. As GPs are probabilistic models, the outcome is a predictive distribution. In this case, a Normal distribution represented by its mean and variance.

# In _build_predict the prediction inputs are passed as a 2D matrix
# So we'll add a single float input, 
inputs = [('Xnew', (gpflow.settings.tf_float, [None, None]))]
# Two outputs (mean and variance of the predictive distribution)
outputs = ['mean', 'variance']
save_build_method(gp._build_predict, inputs, outputs)

This creates a new directory “model” which contains the SavedModel.

Deploy on ML engine

Deploying the model for online predictions on Google Cloud can now be accomplished by uploading the model to Cloud Storage, creating a model resource and a model version:

gsutil cp -r model gs://my-bucket;
gcloud ml-engine models create "tutorial";
gcloud ml-engine versions create v1 \
  --model tutorial \
  --origin gs://my-bucket/model \
  --runtime-version=1.13 \
  --framework TENSORFLOW \

The model is now online and deployed. Following code can be used to send JSON requests to obtain predictions (requires permissions to make requests to this resource):

from googleapiclient import discovery

instances = [
    {'Xnew': [0, 1]},
    {'Xnew': [0.5, 0.5]}

service'ml', 'v1')
name = 'projects/{}/models/{}'.format('gcp-project', 'tutorial')

response = service.projects().predict(
    body={'instances': instances}

if 'error' in response:
    raise RuntimeError(response['error'])


That’s it! Note that a similar method can be used to obtain a SavedModel for use with Tensorflow Serve or Kubeflow