tl;dr: ML Functions as a Service: Envoy proxy powered Machine Learning Lambdas. Once a machine learning model has been trained, it can be used to generate predictions for online requests. Real world use cases require different machine learning frameworks and ad-hoc transformers to be chained into a single prediction pipeline. Imagine NLP use case when we have custom feature extractors followed by weak random forest classifier and fastText model that prepares input vectors for the final neural network built with TensorFlow. All these stages have to be executed in a real time multi-framefork cross-runtime serving meta pipelines. There is a solution to export the every single model into PMML/PFA and then import it into a separated scoring engine. It has multiple challenges, such as: - Code duplication. Two different implementations and exporting functions of the same model in different languages needs to be maintained. This leads to inconsistencies, bugs and a lot of excess work. - Limited extensibility. The prediction pipeline is not only a machine learning model. It has pre-processing steps, ad-hoc logic and very specific dependencies that could not be encoded in XML. Myriad of ML frameworks such as fastText, deeplearnign4j could not fit into a single scoring engine a priori. - Inconsistency. Different implementations for streaming, large scale and scoring use cases cause different outputs in different contexts for the same inputs. - Extra moving parts. Exporting/importing pipelines create additional complexity and points of failure. It should be eliminated rather than automated.
In this demo based talk with live coding we’ll propose an alternative solution and build a real life use case chaining different machine learning frameworks and runtimes that generate real time predictions.