Using TensorFlow Lite to Speed up Predictions
Speed up predictions on individual records or small batches by converting a Keras/TensorFlow 2.1 model to TensorFlow Lite
I found myself in a situation where I had to run predictions on large numbers of individual data records using Keras/TensorFlow 2.1 models where creating batches was impractical.
Calling model.predict(input)
incurs a large overhead and is very slow unless it is used on large batches. The recommended method of calling the model directly, model(input)
, improves performance but it is still far from ideal. It turns out that we can do much better by converting the model to TensorFlow Lite.
For this benchmark I used the following simple model:
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(384, activation='elu', input_shape=(256,)),
tf.keras.layers.Dense(384, activation='elu'),
tf.keras.layers.Dense(256, activation='elu'),
tf.keras.layers.Dense(128, activation='elu'),
tf.keras.layers.Dense(32, activation='tanh')
])
Then I invoked it in several different ways:
- kmodel-predict-batch: Call
model.predict(input)
on a complete batch. - kmodel-predict-single: Call
model.predict(input)
in a loop on single records. - kmodel-direct-single: Call
model(input)
in a loop on single records. - lmodel-single: Use Tensorflow Lite in a loop on single records.
Here are the results on a 6-core Intel i7–8750H CPU @ 2.20GHz (Windows 10, Python 3.7, Tensorflow 2.1.0):
The overhead of a call to model.predict(input)
is 18ms, while a call to model(input)
takes 1.3ms (a 14x speedup). A call to the TensorFlow Lite model takes 43us (an additional 30x speedup). However, the conversion of the Keras model to TensorFlow Lite takes approximately 2 seconds.
I did not see any difference with or without GPU support (maybe that is only used for training). Also, there was only a negligible difference between a Keras model before and after compilation.
Below is the code used for this benchmark. The class LiteModel
encapsulates the TensorFlow Lite model and provides a predict()
method that works like the Keras equivalent. (Note that it would need to be modified for use with models that don’t just have a single 1-D input and a single 1-D output.)