TensorFlow Serving — Deployment of deep learning model

Ravi Valecha
4 min readNov 21, 2020

Deployment of an ML-model or deep learning model simply means the integration of the model into an existing production environment which can take in an input and return an output that can be used in making practical business decisions. It is the last stage of machine or deep learning.

Tensorflow Serving is one of the powerful libraries , which is an extension of Tensorflow libraries to put the model on production server. Its handle lot of stuff in background

Following features of Tensorflow Serving -

  1. It can serve multiple models or multiple version of the same model
  2. Expose both gRPC as well as HTTP inference endpoints
  3. Allows deployment of new version of model without change in client code
  4. Install can be done using docker
  5. It can be deploy on Kubernetes

Apart from TF Serving we can use Flask to access the deployed model from any web application. TF Serving have some advantage over Flask

  • Scalability
  • Low-latency
  • Handles multiple model
  • Handles multiple version of same model

TF Serving Architecture -

Servable Handler handles or serves the model . Its like a lookup or inference

Version manager responsible to publish new model or new version of same model

Loader scans and load the model, get message from version manager to what model to be load, based on the message Loader will reach to file system and load the model and get back to version manager. Version manager then inform Servable handler that model has been loaded and ready to be served.

How to use TensorFlow Serving

Below are the Installation Steps for TensorFlow Model Server

echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

After installing model server install TensorFlow GPU .

Install Requests module . Requests allows you to send HTTP/1.1 requests extremely easily

Import all required dependencies

Now load dataset and divide it to train and test data. Refer dataset from link https://www.cs.toronto.edu/~kriz/cifar.html . You can find the class name from the the dataset link used.

Before train and evaluation of model , Image Normalization is required so that loss would be minimum.

Now create a sequential model add layers and compile it

Train the model with epoch 10

Evaluate the model with text data and check the model accuracy

Save the model for production . Create the directory name ‘model’ or you can choose any name of your choice and save the model inside that directory for TensorFlow Serving.

After saving the model , now set up the production environment. Export MODEL_DIR to system environment variables. Run TensorFlow Serving model server at port 8052 in python. Here we have to use ‘%%bash — bg’ to execute linux command as a service. Server.log is used to see the logs. Using tail command we can see the server logs. Finally server is up and running for image classification.

Now to test deployed model, we have to create a POST request for which a JSON object to send the request to model server

After create of JSON request, send the first POST request to model which is deployed on TensorFlow model server. Set the content-type as application/json , since we are sending a JSON request. Use the same modelname and port used to start and run tensorflow model server.

Use predictions to check the data

This is how we can create a scalable image classification API. Now the question how to change the model or version of same model. For that there is only change in the URL, that has to send with POST request. If there is change in version just append ‘/version/1(2…)’ to URL.

GET http://host:port/v1/models/${MODEL_NAME}[/versions/${VERSION}|/labels/${LABEL}]/metadata

--

--