What if you had to build more machine learnt models than there are data scientists in the world? At enterprise companies like Salesforce, customer data comes in vastly different shapes and forms, making it impossible to build one catch-all model even when focusing on a single problem. Instead, it becomes necessary to build thousands of personalized, per-customer models for any single data-driven application. At Salesforce, we have built solutions to these problems into a project called Optimus Prime which we are using to develop robust, production-quality machine learning applications much more quickly than using Spark alone.
In this talk, we will demonstrate two applications of this platform. The first is AutoML which enables building simple yet powerful models for any use case even without having any background in data science. We will describe the underlying challenges of automating machine learning ranging from the user interface to data extraction and model building, touching more deeply on how we automate feature selection and model selection. The result is a system where users only need domain expertise to build production-ready machine learning applications.
The second demonstration will be of a data product more finely tuned to a specific application. We will demonstrate a product currently in development, Case Classification - automatic classification of service cases. This application is built to not only train and predict on each customer’s individual data, but it is also able to scale the ML pipeline dynamically to accommodate any number of prediction fields; it is multi-tenant, multi-label, multi-model, multi-class predictions. We’ll contrast our implementation using Optimus Prime against one in pure Spark and then show the resulting pipeline performance on real customer data.
DeepLearning4J (Deep Learning for Java - DL4J, inception 2013) was specifically designed with Enterprise and Production in mind, as a first-class citizen to the JVM. Skymind develops and maintains the complete DL4J stack and the abstraction for Scala (ScalNet) with a focal point on scalability and vendor integrations.
This session will focus on the challenges in migrating a research prototype to a more production ready system within the JVM. Specifically, migrating/importing an alternative Deep Learning Framework based on python bindings (e.g. Keras via Tensorflow) to DL4J/ScalNet within a distributed environment using Apache Spark.
A walkthrough of a temporal IoT use case modeling an LSTM Network demonstrating the different phases of a project will be shown. Furthermore, the different workflow capabilities in crossing the language boundaries.