Loading…
Wednesday, November 15
 

9:00am PST

Workshop (Separate Registration): Building Reliable, Transparent and Performant Cloud Native Applications with gRPC and Istio
Together with the leaders in full-stack architectures, we bring you an all -star workshop on the modern microservice architectures as done by Google, Salesforce, Lyft, Starbucks, and other great companies. 

This is a paid workshop preceding the conference.  It requires a separate registration; discounted packages are available together with the conference.
In this special one day hands-on workshop you will learn how to take a Cloud Native Application from inception to production.  Starting with a base sample application we will learn how to break the application into separate services that communicate via gRPC.  We will learn how to take that application into production using Kubernetes.  Next we will look at the challenges of reliable service communication in a complex topology of services. The last part of the workshop will layer in Istio to create a service mesh for advance security, traffic management and telemetry.
This workshop will be taught by a unique team of engineers who have helped build Istio and taken it into production.      
Some of the topics to be covered include:
  • Service communication using Protobuf 3 and gRPC
  • Overview of Kubernetes
  • Building a Service Mesh with Istio
  • Using Envoy to create reliable service to service communication
  • Advanced usages of Envoy for traffic management and secure communication
  • In-depth observability using telemetry and distributed tracing
  • Releasing new services with Canary deployments

Location:Google Launchpad301 Howard St, San Francisco, CA 94105
Workshop Coverage:
All Day:
Ryan Knight  - Grand Cloud
Ben Edwards  - Grand Cloud
James Ward - Salesforce

Morning - gRPC and Kubernetes
Mark Mandel - Google
Mehrdad Afshari - Google

Afternoon - Kubernetes and Istio
Matt Klein  - Lyft 
Mandar Jog  - Google

9:00am doors open, coffee/bagels, intros
9:30am training begins  - gRPC Workshop
10:30am coffee break
10:45am - training continues - Advance gRPC / Intro to Kubernetes
12-1 lunch
1:00pm - training continues - Istio Workshop
2:30 coffee break
2:45pm - training continues - Istio Workshop
4-5pm wrap up

Workshop Setup:
At the workshop I will have a username / password for every participant to access Google Cloud.  Most of the exercises will be run either on a local JVM or in the Google Cloud
Download Google Cloud SDK   -  During the workshop we will walk through authorizing and initializing the cloud sdk
Exercises are in either Java or Scala so have which ever one you prefer setup:
JavaOpenJDK 8 or Orcale Java SDKMaven 3+
Scalasbt
A good Java / Scala Editor like IntelliJ


Speakers
avatar for Ben Edwards

Ben Edwards

Tech Lead, starbucks
avatar for Matt Klein

Matt Klein

Software Engineer, Lyft
Matt Klein is a software engineer at Lyft and the creator of Envoy. He has been working on operating systems, virtualization, distributed systems, networking, and making systems easy to operate for nearly 20 years across a variety of companies. Some highlights include leading the... Read More →
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →
avatar for James Ward

James Ward

Engineering and Open Source Ambassador, Salesforce
James Ward (www.jamesward.com) is the Engineering and Open Source Ambassador at Salesforce.com. James frequently presents at conferences around the world such as JavaOne, Devoxx, and many other Java get-togethers. Along with Bruce Eckel, James co-authored First Steps in Flex. He has... Read More →


Wednesday November 15, 2017 9:00am - 5:00pm PST
Google Launchpad
 
Thursday, November 16
 

8:00am PST

Breakfast + Great Coffee
We serve full breakfast every day in the morning, and excellent coffee all day, every day.

Thursday November 16, 2017 8:00am - 9:00am PST
Functional

9:00am PST

Disorder & Tolerance in Distributed Systems at Scale

Re-framing problems changes how we see and solve them. The intersection of scientific thought and principles parallels much of what we solve as engineers of information (e.g. uncertainty, time, distribution) and need. This talk is an interdisciplinary look at complex adaptive systems and how they innately solve things like resource distribution, growth and rebalancing. From the context of intelligence and systems, this talk will look at ideas around entropy and time, ensemble forecasting, self-organization theory, the butterfly effect, virus-human co-evolution and adaption, natural feedback loops, self-balancing, and adaptation.

  • Can we leverage these principles, behaviors and strategies to design intelligent systems at scale?
  • Can seeing things in an interdisciplinary way benefit solving common problems and speed innovation?

 

 


Speakers
avatar for Helena Edelson

Helena Edelson

CEO, The Axis Initiative
Helena is using AI and complex adaptive systems to study and help endangered species under climate change, biodiversity loss, human-wildlife conflict and illegal wildlife trade. Bridging academia and industry, she is a member of the Environmental Intelligence team of the Interagency... Read More →


Thursday November 16, 2017 9:00am - 9:50am PST
Functional

9:50am PST

Turning a Relational RDBMS Table into a Spark Datasource
This session presents a Spark DataSource implementation for integrating (joining) Big Data in HDFS or NoSQL DBMS with Master Data in RDBMS table. The session describes how to allow parallel and direct access to RDBMS tables from Spark, generate partitions of Spark JDBCRDDs based on the split pattern and rewrites Spark SQL queries into the RDBMS SQL dialect. The session also describes the performance optimizations including hooks in the JDBC driver for faster type conversions, pushing down predicates to the RDBMS, pruning partition based on the where clause, and projecting columns to the RDBMS table to reduce the amount of data returned and processed on Spark.

Speakers
avatar for Kuassi Mensah

Kuassi Mensah

Director Product Management, Oracle Corporation
Kuassi is Director of Product Management at Oracle. He looks after the following product areas (i) Java connectivity to DB (Cloud, on-premises), in-place processing with DB embedded JVM (ii) MicroServices and DB connectivity, and related topics (Data & Tx models, Kubernetes, SAGAs... Read More →


Thursday November 16, 2017 9:50am - 10:30am PST
Data

9:50am PST

Cloud Science
Cloud Science
At GRAIL, our mission is to detect cancer early, when it can be cured. Our approach is data intensive: we sequence cell-free DNA in blood in order to detect minuscule evidence of tumors.
In order to support our myriad workloads—among them ad-hoc analyses, model training, and complicated bioinformatics pipelines—we built Reflow, a system and language for scientific computing in the cloud. Reflow’s data processing engine is fully incremental, and focuses on efficiency, reproducibility, and ease-of-use.
With Reflow, scientists and engineers write ordinary programs that compose existing tools; these programs are then transparently parallelized, memoized, and distributed across many workers using your favorite cloud computing provider. Reflow is vertically integrated: a single binary evaluates the program and is also responsible for elastic cluster management and execution coordination — this makes Reflow very simple to deploy, operate, and retarget to different cloud providers.
I’ll describe how Reflow’s language semantics and runtime are co-designed to yield a simple, robust implementation. I’ll also talk about how Reflow is used for data intensive scientific computing at GRAIL, primarily to analyze next generation sequencing (NGS) data sets.

Speakers
avatar for Marius Eriksen

Marius Eriksen

infrastructure-infrastructure engineer, Grail
Marius is the author of such ideas as Finagle, Zipkin, Your Server is a Function, and many others.


Thursday November 16, 2017 9:50am - 10:30am PST
Functional

9:50am PST

Return of the Transaction King
There has been a long-held belief based on the CAP Theorem that it is impossible for a database to have both strong consistency along with global scaling and high availability. This has led to a whole category of databases that don’t have one of the most fundamental features of a database and that is transactions. A whole new generation of databases are rising that bring back transactions and prove that we can have our cake and eat it to! This talk will focus on one example of this, FaunaDB which provides ACID consistency at a global scale. We will look at Scala code examples of using the FaunaDB functional query language to write transactions in a flexible and type safe style.

Speakers
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Thursday November 16, 2017 9:50am - 10:30am PST
Reactive

10:40am PST

An introduction to Xtract
Xtract is a simple, easy to use scala XML extraction/deserialization library modeled after the JSON library in the Play framework. It uses functional style composition to combine simple parsers into more complex parsers. Lucid Software has successfully used it in our implementation of importing Indesign documents into Lucidpress.

Speakers
avatar for Thayne McCombs

Thayne McCombs

Senior Software Engineer, Lucid Software
I am currently a Dev-Ops engineer at Lucid Software, Inc, put previously worked as a predominantly front-end engineer for Lucidpress, where I led the implementation of importing Indesign documents into Lucidpress. I majored in Astrophysics at Brigham Young University.



Thursday November 16, 2017 10:40am - 11:00am PST
Data

10:40am PST

Scala for the Rest of Us
Scala can be a difficult language to get into, especially for people coming from a OO Java background. Thinking functionally doesn't come naturally to people who haven't worked in it, and a lot of the concepts and patterns are foreign and often heavily abstract. Just getting started can seem like an overwhelming effort. This talk discusses the barrier to entry in moving from OO Java to Scala and what we can do to lower that barrier. I will review the Scala Pet Store and its motivation as a means to help those new to Scala understand what a complete application looks like.

Speakers
avatar for Paul Cleary

Paul Cleary

Senior Principal Engineer, Comcast
20+ years of software development experience, spent most of the last 5 years in Scala. Most of my career is building OO systems, recently converted to FP. After all this time I am still learning. Talk to me if you are struggling with Scala or Functional Programming or if you are... Read More →


Thursday November 16, 2017 10:40am - 11:00am PST
Functional

10:40am PST

Spark Datasets: why they aren't great and what could be done about it
The Dataset API enabled Spark to implement advanced optimizations that were not feasible with RDD, but it also introduced a less intuitive and type-safe user-facing API. This talk will explore the reasons for the Dataframe/Dataset compromises and present a potential solution that would enable Spark to provide a fully type-safe and more intuitive Scala API through quotations without losing the ability to apply advanced optimizations.

Speakers
avatar for Flavio Brasil

Flavio Brasil

Software Engineer, Twitter


Thursday November 16, 2017 10:40am - 11:00am PST
Reactive

11:10am PST

Scala DSL for ML Training Set Stratification
Building flexible machine learning libraries adapted for Netflix’s use cases is paramount in our continued efforts to better model user behaviors and provide great personalized recommendations. This talk introduces one such scala-based DSL library to aid “User Training Set Stratification” in our offline machine learning workflows. Originally created to improve user stratification while building our fact store, the library has evolved to cater to other general-purpose stratification use cases in our ML applications. We will talk about how using the library’s scala-based DSL and its underlying Apache Spark based implementation, one can easily express and dynamically generate the required training data sets for different ML experiment needs by specifying the desired distributions of user attributes such as country, tenure, play frequency etc. The demo section of the talk will showcase how we were to able to utilize idiomatic scala with several API examples in a Zeppelin notebook.

Speakers
avatar for Shiva Chaitanya

Shiva Chaitanya

Senior Software Engineer, Netflix


Thursday November 16, 2017 11:10am - 11:30am PST
Data

11:10am PST

Type Classes for the Rest of Us
Scala as an Object-Functional language has an almost endless supply of options to tackle the day-to-day problems that we run into as developers. However, it's often not clear when to choose one option over another. Type-classes provide a simple solution to several problem spaces -- often filling the gaps that other traditional solutions fail to fill.

In my talk, I will iterate over a real-world problem using common tools made available by Scala to help gain some intuition about how type-classes work and why they are useful. Most importantly, since type-classes can be used to completely replace entire features of the Scala language, I give some concrete examples of pragmatic use-cases to show some insight into where and when to apply type-classes to day-to-day problems. Attendees will leave having a good intuition about when and where to apply type-classes to a problem - and even more importantly when *not* to apply it.

Speakers
avatar for Andrew Kuhnhausen

Andrew Kuhnhausen

Software Engineer, Domino Data Lab
Andrew Kuhnhausen is a developer at Domino Data LAb in San Francisco. He did his thesis on static analysis of Dalvik byte code for malware detection. When not futzing around with functional programming or distributed computing, you can find him riding his bike and enjoying the fine... Read More →


Thursday November 16, 2017 11:10am - 11:30am PST
Functional

11:10am PST

IoT, Wavelets and Machine Learning in the Smart Home
The smart home, driven by AI is a quickly developing sector. Security in the home is a hot sub-sector of the smart home.

This talk will focus on a project that uses vibration sensors, wavelet transformations, and machine learning techniques to perform identification and intruder detection in the home.

This talk will focus on:
  • the sensor considerations and the HW stack, 
  • upgrading the project from a python to a spark stack, and 
  • an examination of the data from raw input through the wavelet transformations. 

A live demo will be given, showing live predictions of this system.

Speakers
avatar for Keita Broadwater, PhD, MBA

Keita Broadwater, PhD, MBA

Founder, Oxygen AI


Thursday November 16, 2017 11:10am - 11:30am PST
Reactive

11:40am PST

Bringing Deep Learning into the Big Data Analytics: BigDL on Apache Spark
Deep Learning is rapidly spreading as one of the most successful and widely applicable set of techniques across a range of applications (vision, language, speech, computational biology, robotics, etc.), leading to some significant commercial success. While the cutting-edge researches emerging in a breathtaking speed, from industry we often see the gaps from papers to prototypes, and even larger gaps between prototypes and production. In this talk, we present several key deep learning applications we built successfully with BigDL, an arising distributed deep learning framework on Apache Spark. For perception applications, we will introduce the object detection pipeline with SSD and how it’s used in production scenarios. For industry applications, we’ll present the neural recommendations system and fraud detection system we built with BigDL and Spark ML pipeline.

Speakers
avatar for Yuhao Yang

Yuhao Yang

Yuhao Yang is a senior software engineer in Intel BigDL team, focusing on BigDL development and deep learning applications. His area of focus is distributed deep learning/machine learning and has accumulated rich solution experiences, including fraud detection, recommendation, speech... Read More →


Thursday November 16, 2017 11:40am - 12:00pm PST
Reactive

11:40am PST

Building a High-Performance Database with Scala, Akka, and Spark
#distributedsystems #scala #akka #spark #FiloDB #cassandra

Scala and its large ecosystem of libraries are increasingly being used to build highly scalable and performant data systems. In this talk, I share years of experience building high performance data systems using Scala, Akka, and Spark, plus recent experience building FiloDB, a high performance analytics database built on these technologies. How do we balance Scala and functional programming with very high performance demands? What are some tips to watch out for when building very very fast Scala code?
  • Why build a new database for streaming applications?
  • Why Scala and Akka makes a great foundation for building a database
  • When to use Futures, Actors, Reactive Streams
  • Using Akka Cluster to coordinate and implement distributed ingestion
  • Monix and use of reactive streams
  • Reactive/async tracing and production metrics
  • Filo: summing integers at billions of ops per second, taking advantage of processor cache and SIMD with super fast vector operations
  • Serialization, GC, and off-heap: how to leverage binary data structures for the win - JVM method dispatch, inlining, and writing lots of small methods

Speakers
avatar for Evan Chan

Evan Chan

Senior Data Engineer, UrbanLogiq
Evan is currently Senior Data Engineer at UrbanLogiq, where he is using Rust, among other tools, in building robust data platforms to help public servants build better communities. Evan has been a distributed systems / data / software engineer for twenty years. He led a team developing... Read More →


Thursday November 16, 2017 11:40am - 12:20pm PST
Data

11:40am PST

The Design of the Scalaz 8 Effect System

Purely functional Scala code needs something like Haskell's IO monad—a construct that allows functional programs to interact with external, effectful systems in a referentially transparent way. To date, most effect systems for Scala have fallen into one of two categories: pure, but slow or inexpressive; or fast and expressive, but impure and unprincipled. In this talk, John A. De Goes, the architect of Scalaz 8’s new effect system, introduces a novel solution that’s up to 100x faster than Future and Cats Effect, in a principled, modular design that ships with all the powerful primitives necessary for building complex, real-world, high-performance, concurrent functional programs.

Thanks to built-in concurrency, high performance, lawful semantics, and rich expressivity, Scalaz 8's effect system may just be the effect system to attract the mainstream Scala developers who aren't familiar with functional programming.


Speakers
avatar for John A. De Goes

John A. De Goes

Solution Architect, De Goes Consulting
John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects.Known for his ability to take very... Read More →


Thursday November 16, 2017 11:40am - 12:20pm PST
Functional

12:00pm PST

Reasonable Scala Compiler
Reasonable Scala compiler is an experimental Scala compiler focused on compilation speed. We believe that it possible to achieve dramatic compilation speedups (5-10x) for typical Scala codebases, and we are currently well on track to realizing this vision. Join us at Scala by the Bay to see this with your own eyes. 

Speakers
avatar for Eugene Burmako

Eugene Burmako

Language tools lead, Twitter
Language tools lead at Twitter, member of the Scala language committee, founder of Rsc, Scalameta and Scala Macros.


Thursday November 16, 2017 12:00pm - 12:20pm PST
Reactive

12:00pm PST

Unconference
We'll have an Unconference track where you can propose a talk and others can hear it!  The talk selection will be done by the attendees in the track from the submissions, moderated by Remy, Travis, and Jon.  Please submit talks at

http://chief.sc/unconf2017

Speakers
TB

Travis Brown

Software Engineer at Stripe, previously OSS at Twitter
avatar for Remy DeCausemaker

Remy DeCausemaker

Open Source Program Manager, Twitter
As a Civic Hacker, Hackademic, and Program Manager of Twitter Open Source, @Remy_D builds communities that use their powers for good.
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Thursday November 16, 2017 12:00pm - 5:00pm PST
Unconference

12:20pm PST

Lunch
We serve great lunch every day.

Thursday November 16, 2017 12:20pm - 1:10pm PST
Functional

1:10pm PST

Avoiding Spark Pitfalls at Scale
There’s no doubt that Apache Spark is a very powerful tool for scalable data, but beware, forces lurk in the shadows to bring upon your downfall, especially so at scale! In this talk, we’ll talk about some of the challenges and pitfalls encountered when writing data pipelines with Spark and how we’ve learned to deal with them. Our tales will involve battles with memory management, dataset typesafety, lazy versus strict evaluations, and beyond. This talk will use only the Scala API of Spark, but the tips and tricks presented will apply to Spark in general.

Speakers
avatar for Long Cao

Long Cao

Software Engineer, Coatue Management
Long is a software engineer on the data science team at Coatue Management, where he builds scalable data pipelines in Scala and Spark that consume alternative data to provide insight and market signals. He has been based in New York for the last 5 years by way of Texas and obsesses... Read More →


Thursday November 16, 2017 1:10pm - 1:30pm PST
Data

1:10pm PST

Teaching Scala to the statically challenged
Most developers at SBTB came to Scala already knowing at least one statically typed language. There is an ever-expanding pool of potential Scala learners who know only Python, JavaScript and/or Ruby, but they are too-often stymied by “introductory” resources that assume familiarity with Java or C. We must decouple Scala scholarship from legacy languages in order to benefit from this diverse pool of talented, enthusiastic, and impressionable developers.

This talk is for experienced Scala coders who care about growing and diversifying our community. We will cover general guidelines for being a great teammate/mentor without committing too much of your own time, as well as specific language features to emphasize or avoid with beginners.

Speakers
avatar for Rebecca Ely

Rebecca Ely

Spark Platform Engineer, Bloomberg
Software Engineer at Bloomberg


Thursday November 16, 2017 1:10pm - 1:30pm PST
Functional

1:10pm PST

Tame Your Data with Akka Streams
Stream processing is a hot topic today and it’s easy to get lost among all the possibilities. In this live coding session we will explore the Reactive Streams approach used by the Akka Streams project. On an almost real-life example we’re going to walk through both the basics and some more advanced usages of the library.

Speakers
avatar for Jacek Kunicki

Jacek Kunicki

Passionate Software Engineer, SoftwareMill
I'm a passionate software engineer living in the JVM land - mainly, but not limited to. I also tend to play with electronics and hardware. When sharing my knowlegde, I always keep in mind that a working example is worth a thousand words.


Thursday November 16, 2017 1:10pm - 1:50pm PST
Reactive

1:40pm PST

Satellite data monitoring for analytical models
Astro Digital API streams satellite data for change monitoring. The API allows to setup monitoring of the whole World and get a time series of daily-updated satellite images at scientific quality ready for mapping and plugging into analytical models. All the necessary cross-calibrations and imagery processing is done by Astro Digital platform so any developer can stream the data directly to the users via imagery files and cloud-based web maps. Astro Digital API output dataset is ready to be built into analytical algorithms. It enables ML and AI developers add context and a history of changes to the process of building the models. The API is the interface of cloud-based Astro Digital platform for searching, processing and distributing satellite data allowing to monitor territories of any size up in any location in the World. The data feed is based on the Public Domain sources and Astro Digital own constellation of Landmapper satellites with a launch of the first batch in June, 2017. The API allows to get data points to up to 17 years back in history, get newly appeared data points in less than 24 hours after the shot is taken and set up future-looking "alerts" that are updated once a new image for the location is available. Astro Digital API streams satellite data for persistent change monitoring of live resources like agriculture, forest and algae, and urban constructions like buildings, roads and pipelines. The API allows to setup monitoring of any location in the World and get a time series of satellite data products at scientific quality. All the necessary cross-calibrations and imagery processing is done by Astro Digital platform so any developer can stream the data directly to the users via imagery files and cloud-based web maps. Astro Digital API output dataset is ready to be built into analytical algorithms. It enables ML and AI developers add context and a history of changes to the process of building the models. The API is the interface of cloud-based Astro Digital platform for searching, processing and distributing satellite data allowing to monitor territories of any size up in any location in the World. The data feed is based on the Public Domain sources and Astro Digital own constellation of Landmapper satellites with a launch of the first batch in June, 2017. The API allows to get data points to up to 17 years back in history, get newly appeared data points in less than 24 hours after the shot is taken and set up future-looking "alerts" that are updated once a new image for the location is available.

Speakers
avatar for Alex Kudriashova

Alex Kudriashova

Integration Lead, Astro Digital


Thursday November 16, 2017 1:40pm - 2:00pm PST
Data

1:40pm PST

Inventory History as Pure Functions
As a proudly data-driven company dealing with physical goods, Stitch Fix has put lots of effort into inventory management. Tracer, the inventory history service, is a new project we have been building to enable more fine-grained analytics by providing precise inventory state at any given point of time. I will talk about how we design this "time machine" with functional programming principles to help us reason about stateful data, and how we implement it in Spark using Scala for the core and Python for the end-user APIs. This project also gives us a chance to explore a new way to build complex ETL pipelines; by chaining testable and composable transforming functions as a data flow.

Speakers
avatar for Sky Yin

Sky Yin

Data Scientist, Stitch Fix
Data paranoid, failed entrepreneur, former stock trader, Canadian in the US, Shanghainese. A husband and a father of a daughter. Currently working on data science and engineering in the world of fashion.


Thursday November 16, 2017 1:40pm - 2:00pm PST
Functional

2:10pm PST

Featran77 - Generic Feature Transformer for Data Pipelines
Featran, a.k.a. Feature Transformer, Featran77 or F77, is a generic feature engineering library for Scala data pipeline frameworks, including Scio, Spark, Scalding and Flink. We'll talk about the design and implementation of the library, including uses of Algebird Semigroups, Aggregators, Breeze, and Scalacheck. We'll also cover other relevant topics that makes our data/ML pipelines scalable and type safe.

Speakers
avatar for Neville Li

Neville Li

Software Engineer, Spotify
Neville is a software engineer at Spotify who works mainly on data infrastructure and tools for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and... Read More →


Thursday November 16, 2017 2:10pm - 2:50pm PST
Data

2:10pm PST

The Functor, Applicative, Monad talk
Functors, applicatives, and monads are fundamental tools for some programmers, yet for many others they are considered immaterial. Indeed there are extremely few languages which offer support for even talking about these concepts. Why then are these programmers so fixated on them? What about them makes them so desirable and necessary? In this talk we will explore the what and why of these concepts and hopefully leave you understanding, if not convinced of, their utility.

Speakers
avatar for Adelbert Chang

Adelbert Chang

Lead Data Engineer, Target
Adelbert Chang is a Lead Data Engineer at Target where he works on infrastructure systems for the Data Science and Optimization team. Previously he worked at U.C. Santa Barbara doing research in large-scale graph querying and modeling, and in industry on machine learning systems... Read More →


Thursday November 16, 2017 2:10pm - 2:50pm PST
Functional

2:10pm PST

Introduction To Deep Learning

The speaker will provide a primer on Deep Learning.  The following topics will be covered:

1) What is deep learning?

2) What is deep learning capable of and what are the limits of deep learning in terms of technological advancement?

3) How is deep learning related to machine learning and artificial intelligence?

4) How did deep learning originate and progress to it's present state?

5)  How does deep learning work?

6) How can deep learning lead to the automation of intellectual and non-intellectual tasks and processes?

7) What are some barriers to entry and how can these barriers to entry be overcome?


Speakers
avatar for Alexander Tsyplikhin

Alexander Tsyplikhin

Chief Operating Monster, Data Monsters
Alexander Tsyplikhin, PhD, has been working in the area of data science for fifteen years, building consumer and enterprise products. Alexander is currently the Chief Operating Monster of Data Monsters, a machine intelligence and data engineering consulting and implementation company... Read More →


Thursday November 16, 2017 2:10pm - 2:50pm PST
Reactive

3:00pm PST

Multi Runtime Serving Pipelines
tl;dr: ML Functions as a Service: Envoy proxy powered Machine Learning Lambdas. 
Once a machine learning model has been trained, it can be used to generate predictions for online requests. Real world use cases require different machine learning frameworks and ad-hoc transformers to be chained into a single prediction pipeline.
Imagine NLP use case when we have custom feature extractors followed by weak random forest classifier and fastText model that prepares input vectors for the final neural network built with TensorFlow. All these stages have to be executed in a real time multi-framefork cross-runtime serving meta pipelines.
There is a solution to export the every single model into PMML/PFA and then import it into a separated scoring engine. It has multiple challenges, such as:
- Code duplication. Two different implementations and exporting functions of the same model in different languages needs to be maintained. This leads to inconsistencies, bugs and a lot of excess work.
- Limited extensibility. The prediction pipeline is not only a machine learning model. It has pre-processing steps, ad-hoc logic and very specific dependencies that could not be encoded in XML. Myriad of ML frameworks such as fastText, deeplearnign4j could not fit into a single scoring engine a priori.
- Inconsistency. Different implementations for streaming, large scale and scoring use cases cause different outputs in different contexts for the same inputs.
- Extra moving parts. Exporting/importing pipelines create additional complexity and points of failure. It should be eliminated rather than automated.

In this demo based talk with live coding we’ll propose an alternative solution and build a real life use case chaining different machine learning frameworks and runtimes that generate real time predictions.

Speakers
avatar for Stepan Pushkarev

Stepan Pushkarev

CTO, Hydrosphere.io
hydrosphere.io CTOAutomation of AI/ML Operations: deployment, serving, monitoring, subsampling, retraining.


Thursday November 16, 2017 3:00pm - 3:20pm PST
Reactive

3:00pm PST

Real Time ML Pipelines in Multi-Tenant Environments
Serving Machine Learning results in real time has always been a difficult process. The inherent sophistication required by ML models is naturally at odds with the low latency requirements of a real-time pipeline. Multi-tenancy adds yet another level of complexity since instead of a few global models, tenants each require their own model trained on their respective datasets, resulting in a potentially hundreds of thousands of models (at Salesforce scale) that current big data frameworks are not designed for. On top of that, one pitfall of many ML pipelines lies in the departure of feature engineering logic between the online and offline world since even though they are accepting the same type of data, the format and process handling them can be drastically different, resulting in incorrect application of models. To address these concerns, we designed and implemented a system that isolates feature engineering of incoming data into a separate process that updates a global feature store while at the same time maintaining the computation consistent with the offline batch training process. The pre-computed features in the feature store, as well as the multi-tenant models, are then inputted into a machine learning framework we developed on top of Spark Streaming, generating scores and insights for thousands of tenants concurrently in real-time. 

Speakers
avatar for Karl Skucha

Karl Skucha

Director of Engineering, Einstein, Salesforce
avatar for Yan Yang

Yan Yang

Lead Data Engineer, Einstein, Salesforce
Lead Data Engineer, Einstein


Thursday November 16, 2017 3:00pm - 3:40pm PST
Data

3:00pm PST

Scala Meta Live Coding Session
In this talk, we will have an interactive live coding session where we will use Scala Meta to code something useful (a code generator for your REST API). Hopefully this will embolden other Scala users to explore macros for profit and fun without fear.

Speakers
avatar for Pathikrit Bhowmick

Pathikrit Bhowmick

Head of Engineering, Coatue Management
Pathikrit writes Scala full-time at a hedge fund. He is also the author of many widely used Scala libraries: https://github.com/pathikrit such as better-files and metarest and is a committee member of the Scala Platform.


Thursday November 16, 2017 3:00pm - 3:40pm PST
Functional

3:20pm PST

Serverless Type Safe Data Pipelines With OpenWhisk
Operationalizing data pipelines is difficult at scale. It often requires standing up large compute and memory resources around the clock in order to do small computations on demand. This architecture makes it difficult to isolate pain points in the pipeline, understand resource usage and orchestrate events in the pipeline. Functions As A Service (FaaS) has helped many in the Internet of Things and web applications space to operationalize usage patterns and overcome this problem. In this talk, I discuss how I used OpenWhisk to operationalize data pipelines, increase orchestration and decrease costs. I talk about the benefits of using Scala and it's functional paradigms and type safety for communication across functions, building event driven data pipelines and deploying it with open source technologies. Half of the talk will be background information about OpenWhisk and the other half will be deploying a machine learning model using Spark and OpenWhisk.

Speakers
avatar for Jowanza Joseph

Jowanza Joseph

Data Architect, One Click Retail
Jowanza Joseph is a senior software engineer at One Click Retail, a Business Intelligence company in Salt Lake City. Jowanza's work is focused on distributed data and streaming architectures.


Thursday November 16, 2017 3:20pm - 3:40pm PST
Reactive

4:00pm PST

Introduction to Machine Learning
Machine Learning is all the rage today with many different options and paradigms. This session will walk through the basics of Machine Learning and show how to get started with the open source Spark ML framework. Through Scala code examples you will learn how to build and deploy learning systems like recommendation engines.

Speakers
avatar for James Ward

James Ward

Engineering and Open Source Ambassador, Salesforce
James Ward (www.jamesward.com) is the Engineering and Open Source Ambassador at Salesforce.com. James frequently presents at conferences around the world such as JavaOne, Devoxx, and many other Java get-togethers. Along with Bruce Eckel, James co-authored First Steps in Flex. He has... Read More →


Thursday November 16, 2017 4:00pm - 4:40pm PST
Data

4:00pm PST

How to Elm-ify Your ML
Functional Reactive Programming for Feature Engineering in Machine Learning

I will discuss the system we built at Stripe to enable modelers to quickly define complex features and have them for training and also in realtime for scoring.

Speakers
avatar for Oscar Boykin

Oscar Boykin

Machine Learning Infrastructure, Stripe
Oscar is the creating of Scalding, Summingbird, and Algebird, and is an overall professor and mathematician turned software magician.


Thursday November 16, 2017 4:00pm - 4:40pm PST
Functional

4:00pm PST

Scaling to Billions of Real-time Recommendations Daily
Twitter is a very real-time product, where the people from across the world are talking about what is happening. Recommendation systems (delivered via push notifications for example), often need to trade off relevance against real-timeness. In this talk, we will discuss a state of the art system built in-house at Twitter for generating highly relevant candidates, with low latency and at scale.

Speakers
avatar for Ajeet Grewal

Ajeet Grewal

Sr. Manager, Twitter


Thursday November 16, 2017 4:00pm - 4:40pm PST
Reactive

5:00pm PST

The Legends of Twitter and Beyond: Real-World Architectures at Scale
For the first time in history, we bring you the top engineers who built storage, cluster management, RPC stacks, and engineering glue to run the largest OSS-based stacks at Twitter and more.  The majority of the panelists started leading engineering companies based on their favorite place in the stack.

Speakers
avatar for Helena Edelson

Helena Edelson

CEO, The Axis Initiative
Helena is using AI and complex adaptive systems to study and help endangered species under climate change, biodiversity loss, human-wildlife conflict and illegal wildlife trade. Bridging academia and industry, she is a member of the Environmental Intelligence team of the Interagency... Read More →
avatar for Marius Eriksen

Marius Eriksen

infrastructure-infrastructure engineer, Grail
Marius is the author of such ideas as Finagle, Zipkin, Your Server is a Function, and many others.
avatar for Ben Hindman

Ben Hindman

Mesosphere Founder - Apache Mesos Co-Creator, Mesosphere
Ben is one of the creators of Apache Mesos, a platform for building and running resource-efficient distributed systems at scale. Ben started working on Mesos as a PhD student at Berkeley before he brought it to Twitter where it runs on thousands of machines. An academic at heart... Read More →
avatar for Mark McBride

Mark McBride

Founder and CEO, Turbine Labs, Inc
avatar for William Morgan

William Morgan

CEO, Buoyant
William Morgan is the CEO of Buoyant. Prior to founding Buoyant, he was an infrastructure engineer at Twitter, where he ran several teams building on product-facing backend infrastructure. He has worked at Powerset, Microsoft, adap.tv, and MITRE Corp, and has been contributing to... Read More →
avatar for Evan Weaver

Evan Weaver

CEO, Fauna
Evan is cofounder and CEO of Fauna, makers of FaunaDB, the first adaptive operational database. FaunaDB was inspired by Evan's experience leading the team that scaled Twitter, where he was Director of Infrastructure and employee #15.


Thursday November 16, 2017 5:00pm - 6:00pm PST
Functional

6:00pm PST

Happy Hour
We have a Happy Hour every day.  We're happy people!

Thursday November 16, 2017 6:00pm - 8:00pm PST
Functional

6:30pm PST

Fireside Chat with Ben Hindman and Ian Downes: Mesos, Then and Now
In this fireside chat, Ben Hindman, the creator of Apache Mesos and the original lead at Twitter, will discuss Mesos history, present, and future with Ian Downes, the current Mesos team lead.

Speakers
avatar for Ian Downes

Ian Downes

Director of Engineering, Twitter
Head of Mesos team, Twitter
avatar for Ben Hindman

Ben Hindman

Mesosphere Founder - Apache Mesos Co-Creator, Mesosphere
Ben is one of the creators of Apache Mesos, a platform for building and running resource-efficient distributed systems at scale. Ben started working on Mesos as a PhD student at Berkeley before he brought it to Twitter where it runs on thousands of machines. An academic at heart... Read More →
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit


Thursday November 16, 2017 6:30pm - 7:00pm PST
Reactive
 
Friday, November 17
 

8:00am PST

Breakfast + Great Coffee
We serve full breakfast every day in the morning, and excellent coffee all day, every day.

Friday November 17, 2017 8:00am - 9:00am PST
Functional

9:00am PST

Composable Parallel Processing in Apache Spark and Weld
The main reason people are productive writing software is composability: engineers can take libraries and functions written by other developers and easily combine them into a program. However, composability has taken a back seat in early parallel processing APIs. For example, composing MapReduce jobs required writing the output of every job to a file, which was both error-prone and slow. Apache Spark helped simplify cluster programming largely because it enabled efficient composition of parallel functions, leading to a large standard library and high-level APIs in various languages. In this talk, I'll explain how composability has evolved in Spark's newer APIs, and I’ll present Weld, a new research project I'm leading at Stanford to enable much richer composition of software on emerging parallel hardware (multicores, GPUs, etc). Systems like Weld and Spark will allow engineers to focus on building their application rather than the intricacies of parallel hardware, and might represent one of the best ways we have to tame the ever-diversifying hardware landscape.

Speakers
avatar for Matei Zaharia

Matei Zaharia

Chief Technologist, Databricks
Matei Zaharia is an Assistant Professor of Computer Science at Stanford and Co-founder and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley, and has worked on other widely used open source data analytics and AI software including... Read More →


Friday November 17, 2017 9:00am - 9:40am PST
Functional

9:50am PST

Apache SystemML: State of the Project and Future Plans
Apache SystemML is a system and language that supports rapid development of custom machine learning algorithms for large scale problems. SystemML allows data scientists to write code once in terms of high-level linear algebra operations, then automatically generate low-level parallel versions of the program that are tuned to the characteristics of the data and different parallel execution frameworks. The system consists of two major components: An optimizer that automatically parallelizes high-level code; and a runtime that evaluates the resulting execution plans at scale on Apache Hadoop, on Apache Spark, on large multi-core systems, and, more recently, on GPUs. This talk will start by describing the history of the project. I'll explain how the original research team from IBM advanced the state of the art in automatic parallelization and scalable linear algebra to build the optimizer and runtime, and how we turned the resulting research code into Apache SystemML. I'll describe how Apache SystemML has been used to implement state-of-the-art algorithms in the field. Finally, I'll talk about recent work on enhancing the system with compressed linear algebra, automatic generation of custom linear algebra kernels, and support for deep learning.

Speakers
avatar for Fred Reiss

Fred Reiss

Chief Architect, IBM Spark Technology Center


Friday November 17, 2017 9:50am - 10:30am PST
Data

9:50am PST

Adjunctions in Everyday Life
This talk introduces adjunctions, a category theory concept underlying and unifying monads, products, coproducts, algebraic data types, and folds. All monads have adjoint functors underlying them, as do products, coproducts, algebraic data types, and folds. We'll see that adjoint functors really do arise everywhere. Adjunctions are really about finding efficient solutions, which is something close to the hearts of software developers. When we ask either "what is the most efficient solution to this problem?" or "what is the most general problem that this solves?" we are really looking for an adjunction. The talk will give a crash course on categories and functors, followed by lots of examples of adjunctions, a particular kind of relationship between functors that arises virtually everywhere. Then we'll discuss some practical implications. The idea of adjunctions gives us an abstract and precise concept we can leverage when designing and using libraries.

Speakers
avatar for Rúnar Bjarnason

Rúnar Bjarnason

Cofounder, Unison
My name is Rúnar. I’m a software engineer in Boston, an author of a book, Functional Programming in Scala, and cofounder of Unison Computing. We're making a distributed programming language called Unison.Talk to me about functional programming, relational database theory, compilers... Read More →


Friday November 17, 2017 9:50am - 10:30am PST
Functional

9:50am PST

Streamlio: Towards a Unified End-to-end Real Time Solution

Handling real-time data has become a critical capability for data-driven organizations. However, today’s reality is a

disconnected patchwork of incomplete technologies that make it a struggle to deliver real-time solutions because of

frustrating complexity, inefficiency, and incompleteness. In this talk, we address these challenges with an unified solution for real-time data. An end-to-end real time system needs

• Messaging: receive and distribute streaming data with support for publish-subscribe and queuing scenarios with built-in durability, scalability, and performance using the Apache Pulsar (incubating) messaging solution.

• Processing: process data transformations and analytics with the Heron real-time processing engine, built for performance and scalability.

• Storage: leverage Apache BookKeeper streaming log storage to ensure durability, resiliency, and performance for streaming data.

In our talk, we will provide an overview of the underlying three systems and how they are used the core for a unified end-to-end real time solution.

 

 


Speakers
avatar for Sijie Guo

Sijie Guo

Co-Founder, Streamlio
Sijie Guo is the cofounder of Streamlio, a company focused on building next generation real time processing engines. Streamlio mainly focus on 3 open source projects, which include Apache BookKeeper, Apache Pulsar, and Heron. Previously, he was the tech lead for messaging group... Read More →
avatar for Sanjeev Kulkarni

Sanjeev Kulkarni

CTO, Streamlio
Sanjeev Kulkarni is the co-founder of Streamlio that focuses on building next generation real time processing engines. Before Streamlio, he was the technical lead for real-time analytics at Twitter where he co-created Twitter Heron. Before that, we was at Locomatix where he handled... Read More →


Friday November 17, 2017 9:50am - 10:30am PST
Reactive

10:40am PST

Does Your Privacy Scale?
The state and nature of privacy is changing and getting more difficult for companies, but better for consumers. This is an not a Product Management issue. These new regulations are putting restrictions on how Engineers build systems. You hear terms like Privacy by Design from regulators. What does this mean? In this talk, I will do a survey of the overall Privacy trends and dig into specific regulations that impact how systems are designed and built.

Speakers
avatar for Devin Loftis

Devin Loftis

VP Engineering, ValiMail


Friday November 17, 2017 10:40am - 11:00am PST
Data

10:40am PST

Adaptive Scrooge - Adaptive Thrift Decoding
Deserialization of thrift blobs is an important cost for our realtime data processing jobs. Many of the jobs read only a small subset of the fields but pay the price of deserializing the entire payload. We can reduce this cost with AdaptiveScrooge in both cpu and memory efficiency. The basic idea of AdaptiveScrooge is that we should pay less price for deserializing data that we don’t use. AdaptiveScrooge relies on the fact that we can simply find out which fields are getting accessed by sampling a few events. Based on this we modify the parsing logic to cheaply skip un-accessed fields and thus reduce cpu cost. By not creating objects for the skipped portions we also reduce GC pressure. For workflows where a very small portion of the entire event is accessed this is an order of magnitude cheaper.

Speakers
avatar for Pankaj Gupta

Pankaj Gupta

Engineering Manager, Twitter


Friday November 17, 2017 10:40am - 11:00am PST
Functional

10:40am PST

Serverless Tensorflow 1.0.0 on AWS. Wait what?
This year tensorflow 1.0.0 was finally released. I succeeded to fit it into AWS Lambda and built recognition of images trained on imagenet. Why? Given the price tag lambda one run (recognition of one picture) will cost $0.00005. That is, for a dollar you can recognize 20,000 images. It is much cheaper than almost any alternatives, though completely scalable (1000 functions can be run in parallel), and can be easily integrated into cloud infrastructure. If you have a trained model in your pocket my talk will help you to fit it into AWS Lambda.

Update: I've ported Tensorflow 1.4 on AWS Lambda so I will include content about it and Keras.

Link to github: https://github.com/ryfeus/lambda-packs/tree/master/Tensorflow

Speakers
avatar for Rustem Feyzkhanov

Rustem Feyzkhanov

-, -
I am a data scientist at Astro Digital and work on creating analytical models on the top of satellite imagery. I am also passionate about serverless infrastructure and AI deployment on it. I have ported several packages on AWS Lambda from Tensroflow/Keras/Sklearn for ML to PhantomJS/Selenium/WRK... Read More →



Friday November 17, 2017 10:40am - 11:00am PST
Reactive

11:10am PST

Futures: Twitter vs Scala
It's been enough of confusion around Twitter Futures. Let's clear the air and talk frankly about historic and technical reasons they exist. We'll see how the difference in the API, behavior, and performance not only makes Twitter Futures competitive with Scala Futures but also the obvious choice for IO systems with the corresponding degree of throughput requirements (i.e., Finagle).

Speakers
avatar for Vladimir Kostyukov

Vladimir Kostyukov

Software Engineer, Twitter, Inc
Hacking Finagle @Twitter.


Friday November 17, 2017 11:10am - 11:30am PST
Data

11:10am PST

Auto code generation using AI & Scala Macros
Ever wondered you need an AI to automatically write programs for you given some requirements. Given the awesome Scala Macros, this is totally possible. In this talk, we will discuss using Scala Macros, how to convert Natural Language requirements to Scala programs. Will do a live demo at the end

Speakers
avatar for Rahul Chitturi

Rahul Chitturi

Principal Software Engineer, Coatue


Friday November 17, 2017 11:10am - 11:30am PST
Functional

11:10am PST

Fearless Multitenancy with Istio
Ever wished you could let your users run their own event processing code on your logging platform? Or allowed ecommerce customers the ability to run their own analytics backend? Container orchestration and service meshes offer seamless ways to run untrusted code in multitenant platforms. This talk will offer a brief overview of what service meshes can offer, along with a live demonstration of sandboxing untrusted services according to policy at both the container and network level. Leave this talk with new found confidence in your ability to grant more interesting and powerful opportunities on your own platform without compromising security.

Speakers
avatar for Ben Edwards

Ben Edwards

Tech Lead, starbucks


Friday November 17, 2017 11:10am - 11:30am PST
Reactive

11:40am PST

Fantastic ML apps and how to build them
Building efficient machine learning applications is not a simple task. The typical engineering process is an iteration of data wrangling, feature generation, model selection, hyperparameter tuning and evaluation. The amount of possible variations of input features, algorithms and parameters makes it too complex to perform efficiently even by experts. Automating this process is especially important when building machine learning applications for thousands of customers. In this talk I demonstrate how we build effective ML models using AutoML capabilities we develop at Salesforce. Our AutoML capabilities include techniques for automatic data processing, feature generation, model selection, hyperparameter tuning and evaluation. I present several of the implemented solutions with Scala and Spark.

Speakers
avatar for Matthew Tovbin

Matthew Tovbin

Principal Engineer, Salesforce Einstein, Salesforce
Matthew Tovbin is a Principal Member of Technical Staff at Salesforce, engineering Salesforce Einstein AI platform, which powers the world’s smartest CRM. Before joining Salesforce, he acted as a Director of Engineering at Badgeville, implementing scalable and highly available real-time... Read More →


Friday November 17, 2017 11:40am - 12:20pm PST
Data

11:40am PST

Functional Programming with Effects
Effects are good, but side-effects are bad. Well, ok. What's the difference? What do we mean when we talk about effects, and what are the rules of the game? In this beginner/intermediate talk we will talk about some familiar effects like `Option` and `List`, build up some more interesting effects of our own, and talk about how to think about programs as "computations" that are values we can manipulate like any other kind of data. Along the way we will do some equational reasoning and talk about applicative and traversable functors, monads, and other intimidating words.

Speakers
avatar for Rob Norris

Rob Norris

Programmer, Gemini Observatory
Software Engineer


Friday November 17, 2017 11:40am - 12:20pm PST
Functional

11:40am PST

Immutable AWS Deployments with Packer and Jenkins
In this session I will talk about Immutable Deployments - which have become almost essential in the world of Microservices. As the frequency of deployments across multiple services increases with increasing granularity, it is critical to have repeatable, predictable and immutable deployments serving our customers. In practice, this is achieved via several DevOps tools. We will use Hashicorp Packer (packer.io) and Jenkins to build a simple, immutable AWS deployment of a hello-world microservice. Familiarity with AWS is recommended but not required for this talk.

Speakers
avatar for Manish Pandit

Manish Pandit

Director of Engineering, Marqeta
Manish is Director of Platform Engineering at Marqeta, a FinTech startup, where he is responsible for the Hybrid Cloud Architecture and Delivery of the Payments Platform. Prior to Marqeta, he has worked at Capital One, Netflix, IGN, E*Trade, and Accenture in various engineering leadership... Read More →



Friday November 17, 2017 11:40am - 12:20pm PST
Reactive

12:00pm PST

Unconference
We'll have an Unconference track where you can propose a talk and others can hear it!  The talk selection will be done by the attendees in the track from the submissions, moderated by Remy, Travis, and Jon.  Please submit talks at

http://chief.sc/unconf2017

Speakers
TB

Travis Brown

Software Engineer at Stripe, previously OSS at Twitter
avatar for Remy DeCausemaker

Remy DeCausemaker

Open Source Program Manager, Twitter
As a Civic Hacker, Hackademic, and Program Manager of Twitter Open Source, @Remy_D builds communities that use their powers for good.
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Friday November 17, 2017 12:00pm - 5:00pm PST
Unconference

12:20pm PST

Lunch
We serve great lunch every day.

Friday November 17, 2017 12:20pm - 1:10pm PST
Functional

1:10pm PST

VEGAS: The missing Matplotlib for Spark
In this talk, we'll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

Speakers
avatar for Roger Menezes

Roger Menezes

Senior Research Engineer, Netflix


Friday November 17, 2017 1:10pm - 1:30pm PST
Data

1:10pm PST

Real World Serverless
Serverless is a hot topic in the software architecture world and also one of the points of contention. Serverless let us run our code without provisioning or managing servers. We don't have to think about servers at all. Things like elasticity or resilience might not longer be our problem anymore. On the other hand, we have to embrace a little bit different approach how to design our applications. We also have to give up a lot of control we might want and the most importantly we have to use technology which just might not be ready. In this talk, I’d like to discuss if it is worth to use serverless in our applications, what are the advantages and disadvantages of this approach. Secondly, I'd like to describe various use cases we were considering serverless and what was the result. And finally, I’d like to talk about how Scala fits into this. This talk should be interesting for everyone who is considering using serverless or just heard this word somewhere and would like to learn more. The talk is a little bit more focused on AWS but the understanding of the concepts I’m going to talk about should be beneficial even if you prefer a different service provider.

Speakers
avatar for Petr Zapletal

Petr Zapletal

Lead Consultant, Cake Solutions
Petr is a Software Engineer who specialises in the design and implementation of highly scaleable, reactive and resilient distributed systems. He is a functional programming and open source enthusiast and has expertise in the area of big data and machine classification techniques. Petr... Read More →


Friday November 17, 2017 1:10pm - 1:30pm PST
Reactive

1:10pm PST

Scala 2.13
... and beyond! Join me for the latest on our progress towards Scala 2.13.0, which will bring you a faster compiler, a nicer REPL and a completely reworked collections library. Actually, we couldn't wait to deliver the faster compiler, so all improvements that we felt could safely land in the 2.12 series are available there! 2.13.0-RC1 is scheduled for April 2018, so there's still time to share your feedback on the collections rework, or request a few last-minute improvements for the REPL, before we go into feature freeze with 2.13.0-M4 (end of January).

Speakers
avatar for Adriaan Moors

Adriaan Moors

Scala Team Lead, Lightbend, Inc
Scala Team Lead, Lightbend


Friday November 17, 2017 1:10pm - 1:50pm PST
Functional

1:40pm PST

Lawful AI
Modern practical data science, NLP, and AI has almost zero overlap with pure functional programming.  Why would this be a good thing and what can the Scala community do to help?

Speakers
avatar for Adam Pingel

Adam Pingel

Senior Director of Software Engineering, LexisNexis (via Ravel Law)


Friday November 17, 2017 1:40pm - 2:00pm PST
Data

1:40pm PST

Intelligent system optimizations
This talk introduces engineering and machine learning techniques for achieving optimized performance, resilience, availability, cost or other attributes of a large scale distributed system. Firstly, the presentation introduces the topic by discussing the complexities of large scale production system properties as well as important optimization techniques used in state of the art distributed systems, microservice architectures, databases and stream and fast data processing frameworks such as Akka Streams, Spark or Flink. As the next step, a framework using machine learning and artificial intelligence approaches - specifically supervised, reinforcement and meta deep learning - is introduced as a tool for optimization, continuous evolution, learning, and improvement of the specific system and in the specific runtime environment and conditions. The discussion includes details of the problems, data, techniques, code in Keras or TensorFlow, examples and experience with some other modern machine learning and data processing tools. The attendees will gain an understanding of optimization approaches, including some novel machine learning techniques, and ultimately will be able to apply some of the techniques to optimize their own system, whether general distributed systems written in Scala or any other language, systems based on Akka or any of the aforementioned technologies.

Speakers
avatar for Martin Zapletal

Martin Zapletal

CTO, Cake Solutions Inc.
Martin Zapletal is heading up the technical team of Cake Solutions Inc. in New York. Martin specialises in the design and implementation of reactive, scalable, resilient distributed systems, machine learning and working with large amounts of data. He also has background in functional... Read More →


Friday November 17, 2017 1:40pm - 2:00pm PST
Reactive

2:10pm PST

A framework for rapid reporting API development
We’ll briefly talk about the reasons behind our decision to develop a framework to rapidly develop APIs for reporting. We’ll talk about the features for such a framework and how we make use of it. We’ll dive into an example application and talk through configuration, definitions, and how you can also easily embed this in an existing API. We’ll also talk through some of the design decisions and our ability to do resource isolation at the API layer to adhere to SLAs across multiple use cases. We'll also talk about how we analyze the usage of these APIs in order to do continuous optimizations and identify bugs.

Speakers
avatar for Hiral Patel

Hiral Patel

Director of Engineering, Oath
Hiral's been working with Scala for the past 7 years and Big Data for the past 13 years. He's built data platform's, data intensive applications, and real-time analytics frameworks. Hiral is currently a Senior Principal Architect/Director of Engineering at Yahoo/Oath (a Verizon c... Read More →


Friday November 17, 2017 2:10pm - 2:30pm PST
Reactive

2:10pm PST

A Tale of Two Graph Engines on Spark: GraphFrames and TinkerPop OLAP
Graph is on the rise and it's time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif API as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we'll be explaining everything from the ground up!

Speakers
avatar for Russell Spitzer

Russell Spitzer

Software Engineer, DataStax
Spark, Cassandra, or Dogs.


Friday November 17, 2017 2:10pm - 2:50pm PST
Data

2:10pm PST

Defusing the configuration time bomb with PureConfig and Refined
Hiding inside your configuration files are latent errors waiting to take down your app in a series of cascading failures. Let’s take a look at the academic paper which made that claim and the authors’ suggestion for how to avoid those problems. Then we'll look at how we can use PureConfig and Refined to automate those safer practices.

Speakers
avatar for Leif Wickland

Leif Wickland

Software, Rubicon Project
In one of his first jobs, Leif wrote software for livestock auction yards. His boss also owned the town newspaper. When the only reporter fell ill, Leif played journalist for a couple days. He published pieces he's glad don't appear online. His second most noteworthy accomplishment... Read More →


Friday November 17, 2017 2:10pm - 2:50pm PST
Functional

2:30pm PST

Druid Lookups for High Cardinality Dimensions
Druid is a high-performance, column-oriented, distributed data store. Lookups are a concept in Druid where dimension values are (optionally) replaced with new values. The common use case of query-time lookups is to replace one dimension value (e.g. an ID) with another value (e.g. a human-readable Name). This is similar to a star-schema join. Druid has limited sup- port for joins through query-time lookups. Very small lookups (count of keys on the order of a few dozen to a few hundred) can be passed at query time as a "map" lookup as per dimension specs. For large lookups, Druid has an extension called Namespaced lookups. Namespaced lookups are appropriate for lookups that cannot be passed at query time due to their size, or are not desired to be passed at query time because the data is to reside in and be handled by the Druid servers. But Druid’s namespaced lookups has following limitations, • It is not suitable for high cardinality dimensions • It is not scalable for large data in the order of hundreds of millions of rows • Namespaced lookup support is limited to one key column with a corresponding value column • Real time updates to the lookup data is not possible These limitations encouraged us to develop a highly scalable, multi-column, configurable Druid lookup framework that supports real time updates on lookup data. Framework uses embeddable persistent key-value data store, kafka for messaging and HDFS for deep storage. 

Speakers
avatar for Pavan

Pavan

Principal Software Development Engineer, Oath


Friday November 17, 2017 2:30pm - 2:50pm PST
Reactive

3:00pm PST

Continuous Delivery Principles for Machine Learning
Real world Software Engineering is an iterative process and one of its main objectives is to get changes all of types - including new features, configuration changes, bug fixes and experiments into production and into the hands of the users, safely, quickly and in a sustainable way. Continuous Delivery (CD), a software engineering discipline, with its principled approach allows you to solve this exact problem. The core idea of CD is to create a repeatable, reliable and incrementally improving process for taking software from concept to the end user. Like software development, building real world machine learning (ML) algorithms is an also an iterative process with a similar objective - How do I get my ML algorithms into production and in the hands of the users in a safe, quick and sustainable way. The current process of building models, testing and deploying them into production is at best an ad-hoc process in most companies.

At Indix, while building the Google of Products, we have had some good success in combining the best practices of continuous delivery in building our machine learning pipelines using open source tools and frameworks. The talk will not focus on the theory of ML or about choosing the right ML algorithm but specifically on the last mile problem of taking models to production and the lessons learned while applying the concept of CD to ML.. Here are some of the key questions that the talk with try to answer.

  1. ML Models Repository as analogous to Software Artifacts Repository - Similar to a software repository, what are the features of a Models Repository to aid traceability and reproducibility? Specifically, how do you manage models end to end - managing model metadata, visualization and lineage etc? 
  2. ML Pipelines to orchestrate and visualize the end to end flow - A typical ML workflow has multiple stages. How do you model your entire workflow as a pipeline (similar to Build Pipeline in CD) to automate the entire process and help visualize the entire end to end flow? 
  3. Model Quality Assurance - What quality gates and evaluation metrics, either manual and automated, should be used before exporting (promoting) models for serving in production? What happens when several different models are in play? How do you measure the models individually and then also in combination 
  4. Serving Models in Production - How do you serve and scale these models in production? What happens when these models are heterogenous (built using different languages - Scala, Python etc.)? 
  5. Regression Testing of Models - When exporting a new models, whats the best way to compare the performance of the newer model to the one already deployed on real-world (production) data? 
  6. Maintenance and Monitoring of Models in production - Deploying models to production is only half the job done. How do you measure the performance of your model while its running in production?

Speakers
avatar for Rajesh Muppalla

Rajesh Muppalla

Co-Founder & Senior Director of Engineering, Indix
Rajesh Muppalla is a co-founder and Senior Director of Engineering at Indix, where he leads the data platform team that is responsible for collecting, organizing and structuring all the product related data collected from the web.


Friday November 17, 2017 3:00pm - 3:40pm PST
Data

3:00pm PST

FS2 Internals: Performance
Functional Streams for Scala 0.10 was recently released, featuring a streamlined API, major performance improvements, and first-class Cats integration. In this talk, we’ll explore the techniques used to make FS2 0.10 fast, including a unified stream algebra and partial evaluation via staged computation.

Speakers
avatar for Michael Pilquist

Michael Pilquist

Distinguished Engineer, Comcast
Michael Pilquist is the author of Scodec, a suite of open source Scala libraries for working with binary data, and Simulacrum, a library that simplifies working with type classes. He is a committer/maintainer on a number of other projects in the Scala ecosystem, including Cats and... Read More →


Friday November 17, 2017 3:00pm - 3:40pm PST
Functional

3:00pm PST

Build a Modern, End-to-End, Fully Open Source, Big Data Scala Reference Application
Developing an end-to-end data pipeline can be difficult, especially when it comes to integrating multiple Big Data services. Current end-to-end reference applications are extremely outdated or not Scala-focused. We will walk through a modern real-time streaming application which serves as a reference framework for Scala developers wanting to develop a big data pipeline. We review code, best practices and considerations involved when integrating different components into a complete data platform (think Kafka, Spark, Storm, NiFi, Avro serialization, HBase, etc.). From IoT data collection from the edge, to flow management, real-time stream processing and analytics, through to machine learning and prediction, this reference application aims to help developers seed their own open source solutions – fast. Developers looking for reference code or documentation for integrating platform-agnostic tools will walk away with an actively supported resource. This talk hopes to empower developers to leverage open source efforts, and not be discouraged by the sometimes overwhelming undertaking that is trying out Big Data projects. We may not be able to live code or go over developing an *entire* pipeline, but the things we don't cover will be up in a repo, 100% working and documented, for people to check out. For Scala developers that may not necessarily have much experience with things Hadoop, I can also make available some kick-start tutorials and documentation that I have written while at Hortonworks.

Speakers
avatar for Edgar Orendain

Edgar Orendain

Software Engineer, Hortonworks


Friday November 17, 2017 3:00pm - 3:40pm PST
Reactive

4:00pm PST

Fireworks - lighting up the sky with millions of Sparks
The Salesforce Einstein platform is used by internal developers to create predictive applications for Salesforce customers. The platform uses spark as its data processing engine, and runs a very large number of data flows with very large variance in size and complexity. Both large and complex flows, such as running modeling for a customer who needs tens of millions of entities scored, and small time sensitive flows, such as incrementally processing and scoring object changes for a customer with only thousands of entities in total, must be supported. These diverse and complex flows arise for every application added to Salesforce and the platform handles many applications magnifying the importance of appropriate scaling and time sensitivity. In this talk, I'll present how we handle that large amount of diversity in data flows while keeping cost to serve to a minimum. I will detail where and how we chose to leverage open source and where we decided it was important to implement our own solutions.

Speakers
avatar for Thomas Gerber

Thomas Gerber

Director of Engineering, Salesforce


Friday November 17, 2017 4:00pm - 4:40pm PST
Data

4:00pm PST

Functional Linear Algebra in Scala
It is curious that a bunch of Linear Algebra implementations are written as if it is Fortran all the way down. We can do better, and a functional implementation, with less mutability and more abstractions, may save efforts, space, and processor time. I will show pieces of easy-to-read code that works with vectors and matrices in a functional way. As an example, an efficient and practical implementation of PCA.

Speakers
avatar for Vlad

Vlad

contributor, Patryshev
Software developer with an experience in categories and toposes.Teaching logic and formal methods at Santa Clara University.Working as a data engineer at Salesforce.


Friday November 17, 2017 4:00pm - 4:40pm PST
Functional

4:00pm PST

Building a high-performance Future
Asynchronous programming is a popular choice for high-scale systems, but it often comes with a hidden performance cost. This talk will present a study of the performance characteristics of the asynchronous programming solutions on the JVM and the benefits that their abstractions provide to the user. Based on this study, the talk will present detailed report of the techniques and tools used to build the new the trane.io library, a high-performance Future implementation, and how they were recently applied to optimize Twitter's Future. It will include an overview of high-level and low-level profiling tools, JIT optimizations, and micro-benchmarking.

Speakers
avatar for Flavio Brasil

Flavio Brasil

Software Engineer, Twitter


Friday November 17, 2017 4:00pm - 4:40pm PST
Reactive

5:00pm PST

Functional Programming for Machine Learning
Functional Programming is the next frontier in Machine Learning.  We see a reamandous uptick in using Scala, Haskell, Rust, and other FP languages for ML/AI, and we start learning best practices.  How does FP help scale ML?  What is it about the common proximity to math and physics that so many folks in the space share?  How does FP help in designing, building, and deploying AI, whatever it will become?  These are the topics of this panel, moderated by Vitaly Gordon, VP Data Engineering and Data Science, Salesorce Einstein, himself a hands-on FP+ML developer and speaker at both Scala and AI By the Bay.

Moderators
avatar for Vitaly Gordon

Vitaly Gordon

VP of Engineering and Data Science, Salesforce Einstein
VP of Engineering and Data Science at Salesforce Einstein

Speakers
avatar for David Andrzejewski

David Andrzejewski

Engineering, Sumo Logic
David Andrzejewski is a Senior Engineering Manager at Sumo Logic, where he works on applying statistical modeling and analysis techniques to machine data such as logs and metrics. He also co-organizes the SF Bay Area Machine Learning meetup group. David holds a PhD in Computer Sciences... Read More →
avatar for Oscar Boykin

Oscar Boykin

Machine Learning Infrastructure, Stripe
Oscar is the creating of Scalding, Summingbird, and Algebird, and is an overall professor and mathematician turned software magician.
avatar for Justin Coffey

Justin Coffey

Engineer, Criteo
Justin Coffey is an engineering director in the SRE department of Criteo where he has led efforts in building out much of Criteo's data processing platform. In past lives he has built ecommerce, emailing and real estate platforms. He got his start in the industry way back in 1996... Read More →
avatar for Ajeet Grewal

Ajeet Grewal

Sr. Manager, Twitter
avatar for Leah McGuire

Leah McGuire

Principal Member of Technical Staff, Salesforce
Leah McGuire is a Principal Member of Technical Staff at Salesforce, working on automating as many of the steps involved in machine learning as possible. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn. She completed a PhD and a Postdoctoral... Read More →
avatar for Chris McKinlay

Chris McKinlay

Director of Engineering, Takt
Machine learning in Haskell!


Friday November 17, 2017 5:00pm - 6:00pm PST
Functional

6:00pm PST

Happy Hour
We have a Happy Hour every day.  We're happy people!

Friday November 17, 2017 6:00pm - 8:00pm PST
Functional

6:30pm PST

Entertainment
An awesome chance to catch an amazing Accordion performance!

Friday November 17, 2017 6:30pm - 7:00pm PST
Functional

7:00pm PST

Fireside Chat with Evan Weaver and Boaz Avital: Storage, Then and Now
In this fireside chat, Evan Weaver, the original storage lead at Twitter and cofounder of FaunaDB based on his experience there, talks with Boaz Avital, the current head of Twitter storage team, about their systems, then and now.

Moderators
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit

Speakers
avatar for Boaz Avital

Boaz Avital

Tech Lead of Storage at Twitter, Twitter
Head of Storage Systems, Twitter
avatar for Evan Weaver

Evan Weaver

CEO, Fauna
Evan is cofounder and CEO of Fauna, makers of FaunaDB, the first adaptive operational database. FaunaDB was inspired by Evan's experience leading the team that scaled Twitter, where he was Director of Infrastructure and employee #15.


Friday November 17, 2017 7:00pm - 7:30pm PST
Functional
 
Saturday, November 18
 

8:00am PST

Breakfast + Great Coffee
We serve full breakfast every day in the morning, and excellent coffee all day, every day.

Saturday November 18, 2017 8:00am - 9:00am PST
Functional

9:00am PST

Apache Flink and the Next Wave of Stream Processing Applications
Over the last years, data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing.
Recently, stream processing has started to also turn the space of event-driven applications inside out: We see more and more developers building applications and microservices directly on data streams, making use of sophisticated stream processors as the foundation and blurring the boundary between streaming analytics and applications.
We will discuss the key concepts behind Apache Flink's approach to stream processing and how it is a powerful abstraction for stateful event-driven applications. We will present how the Flink community sees that next wave of streaming applications change some of the core principles of how we build applications today.

Speakers
avatar for Stephan Ewen

Stephan Ewen

Stephan Ewen is PMC member and one of the original creators of Apache Flink, and co-founder and CTO of data Artisans.He holds a Ph.D. from the Berlin University of Technology.


Saturday November 18, 2017 9:00am - 9:40am PST
Functional

9:50am PST

Democratizing data with an internal data pipeline platform
This talk is about why and how we built our internal data pipeline platform.

At Indix we have data in different formats - html pages, thrift records, avro records and the usual culprits - CSVs and other plain text formats. We have data in TBs and in a few KBs and data consisting of billions of records and data consisting of a few hundred rows. And all this data - in one form or another - is consumed by the engineers, the product managers, the customer success team and even our CEO.

Our biggest challenge was in knowing which data exists and where, and how to access it efficiently while balancing costs and productivity of the people involved. We had to make do with adhoc Scalding jobs. There was no single place where people can discover the different "datasets" that we had, what format they were in, where they were stored and how frequently a new version was published. Running jobs was also not straightforward since things like finding a cluster to use were not trivial. In order to democratize the access to data and make it easy for anyone within the organization to work and play with the data we had, we went about building a data pipeline platform for our internal users.

Leveraging the power of Spark, the platform allows the users to define datasets (along with their schema) and create pipelines to work with the datasets. The pipelines can be configured via a wizard based UI or a JSON config and all the jobs run on dedicated and auto scaled Spark clusters. Predefined transformations to filter, project, sample and even type in sql queries have made it powerful but simple to use for any type of user. Support for S3, Sftp and even Google sheets made it usable for different internal and customer use cases. The platform also enables us to load the same data and perform similar operations on them via notebooks with just couple of lines of client code. Today we run over 300 pipelines across over 100 datasets and thousands of versions of the datasets using this platform.

The data pipeline platform has truly changed the way we ingest, manipulate, analyze and egress data across the organization, and is on course to be converted into a self-serve platform for our (external) customers too.

Speakers
avatar for Manoj Mahalingam

Manoj Mahalingam

Principal Engineer, Indix


Saturday November 18, 2017 9:50am - 10:30am PST
Data

9:50am PST

Functional Patterns and Gems in Scala
Scala has evolved from being "a more concise, more strongly typed, Java" to being a deeply powerful language, capable of utilizing instruments of purely functional programming, with the help of richly featureful Open Source libraries. A random walk through one of the open-source projects by new-comers to the Scala language or those with a theoretical background in Category Theory alike is sure to be met with both astonishment and confusion. In this talk we will delve into some FP patterns and little known gems that are available to both simplify and enrich our lives as we go about our day-to-day routine of building software.

Speakers
avatar for Ryan Delucchi

Ryan Delucchi

Principal Engineer, Infrastructure, Verizon Labs
Specializes in building scalable web services and application back-ends using composeable streaming, computation, algebraic data-types and pure functional programming.


Saturday November 18, 2017 9:50am - 10:30am PST
Functional

9:50am PST

Nelson: Rigorous Deployment for a Functional World

Functional programming finds its roots in mathematics - the pursuit of purity and completeness. We functional programmers look to formalize system behaviors in an algebraic and total manner. Despite this, when it comes time to deploy ones beautiful monadic ivory towers to production, most organizations cast caution to the wind and use a myriad of bash scripts and sticky tape to get the job done. In this talk, the speaker will introduce you to Nelson, an open-source project from Verizon that looks to provide rigor to your large distributed system, whilst offering best-in-class security, runtime traffic shifting and a fully immutable approach to application lifecycle. Nelson itself is entirely composed of free algebras and coproducts, and the speaker will show not only how this has enabled development, but also how it provided a frame with which to reason about solutions to fundamental operational problems.

 


Speakers
avatar for Tim Perrett

Tim Perrett

Head of Infrastructure Engineering, Verizon
Avid functional programmer, experienced distributed systems engineer and published author. Primarily interested in schedulers, datacenter design, low-latency data access and the application of functional paradigms in large enterprise applications.


Saturday November 18, 2017 9:50am - 10:30am PST
Reactive

10:40am PST

Deep distributed decision trees on Apache Spark
Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. We present Yggdrasil, a new tree learning method implemented in Scala on Apache Spark which scales favorably as data dimensionality and tree depth grows. By partitioning the dataset by columns rather than rows, training directly on compressed data, and minimizing communication using sparse bitvectors, Yggdrasil outperforms existing distributed tree learning algorithms by up to 24x. On a high-dimensional dataset at Yahoo, Yggdrasil is shown to be faster by up to an order of magnitude.

Speakers
avatar for Feynman Liang

Feynman Liang

Director of Engineering, Gigster
Feynman is the engineering manager at Gigster and a statistics PhD student at UC Berkeley. His research lies at the intersection between industry and academia, focusing on distributed machine learning and practical systems for deploying machine learning in production. He is a contributor... Read More →


Saturday November 18, 2017 10:40am - 11:00am PST
Data

10:40am PST

Index your State for Safer Functional APIs
What if there was a way to enforce the order of how people can call into your API? Most folks are familiar or have at least heard of the State monad, but here I wills how how an IndexedState is even more powerful and lets you craft APIs with even more compile time safety and constraints.

Speakers
avatar for Vincent Marquez

Vincent Marquez

Lead Software Engineer, YTel


Saturday November 18, 2017 10:40am - 11:00am PST
Functional

10:40am PST

Building systems on top of Kubernetes
With the rise of microservices and IaaS, interest in container orchestration systems - like Kubernetes, Mesos, and Docker Swarm - has increased significantly. While many engineers will use these systems as a target to deploy their service to, they also expose an API that can be integrated in an application, allowing it to interact with it. In this talk, we will create a simple reverse proxy in Scala that can act as a Kubernetes Ingress Controller, similar to Traefik and linkerd. We are going to use some features from Kubernetes such as its event streams, ingresses, and services. This will provide a good opportunity to expose how an engineer can use the Kubernetes API to build systems needing to be both dynamic and responsive to what is happening on it.

Speakers
avatar for Alexandre Bergeron

Alexandre Bergeron

Software Engineer


Saturday November 18, 2017 10:40am - 11:00am PST
Reactive

11:10am PST

End-to-End Computation on the GPU with a GPU Data Frame
A revolution is occurring across the GPU software stack, driven by the disruptive performance gains GPUs have seen generation after generation. The modern field of deep learning would have not been possible without GPUs, and as a database we are often seeing two-or-more orders of magnitude performance gains compared to CPU systems - but for all of the innovation occurring in the GPU software ecosystem, the systems and platforms themselves still remain isolated from each other. Even though the individual components are seeing significant acceleration from running on the GPU, they must intercommunicate over the relatively thin straw of the PCIe and then through CPU memory. In this session, Todd Mostak will make a case for the open source community to enable efficient intra-GPU communication between different processes running on the GPUs. He will discuss (and provide examples) how this integration will allow developers to build new functions to cluster or perform analysis on queries, and will make seamless workflows that combine data processing, machine learning (ML), and visualization possible without ever needing to leave the GPU.

Speakers
avatar for Todd Sundsted

Todd Sundsted

CTO, SumAll
Todd Sundsted is a hands-on technical leader with 25 years of professional experience covering all aspects of software development, machine learning and engineering. He currently serves as the CTO of SumAll, an award winning analytics and business intelligence tool used by brands... Read More →


Saturday November 18, 2017 11:10am - 11:30am PST
Data

11:10am PST

Playing with Shapeless
Play-Formless is a small library for Play Framework that automatically generates type-safe Form Mappings for case classes. We use Shapeless’s LabelledGeneric representation to abstract over the structure of a case class; and Shapeless record types to specify mappings in a type-checked way. Some additional features such as type-driven default field mappings, the use of refinement types, and type-safe named parameters will be demonstrated. More broadly, we want to show how Shapeless can be a practical tool in real-world applications.

Speakers
avatar for Thomas Kim

Thomas Kim

Principal Engineer, Iterable
Tom is a senior engineer at Iterable where he focuses on data engineering with Elasticsearch. He's a long-time Scala and FP enthusiast. He was previously at Workday and Salesforce. He's lived in San Francisco since 2001. Being a bandwagon Warriors fan makes his wife laugh.
avatar for Charles Ruhland

Charles Ruhland

Software Engineer, Iterable
Charles is a software engineer at Iterable, focusing on scaling their backend architecture with Elasticsearch, RabbitMQ, and Postgres. Previously he worked at Mesosphere on Cosmos, the open-source package management API for DC/OS. He became hooked on functional programming while getting... Read More →



Saturday November 18, 2017 11:10am - 11:30am PST
Functional

11:10am PST

Just enough DevOps for data scientists.
In this talk, I've distilled the best practices we've learned while building the Spark & microservice architecture for Salesforce Einstein.  You'll come away from this talk with a peek into the secret life of SREs.

Speakers
avatar for Anya Bida

Anya Bida

Senior Member of Technical Staff, Salesforce
Anya loves her position as Senior Member of Technical Staff (SRE) at Salesforce. She's also a co-organizer of the SF Big Analytics meetup group, and is always looking for ways to make platforms more scalable / cost efficient / secure. Before Salesforce, Anya enjoyed contributing... Read More →


Saturday November 18, 2017 11:10am - 11:30am PST
Reactive

11:40am PST

Stream All The Things!
While stream processing is now popular, streaming architectures must be highly reliable and scalable as never before, more like microservice architectures. Using specific use cases, I'll define the requirements for streaming systems and how they are met by popular tools like Kafka, Spark, Flink, and Akka. I'll argue that streaming and microservice architectures are actually converging.

Speakers
avatar for Dean Wampler

Dean Wampler

VP of Rocket Surgery, Lightbend
Dean Wampler, Ph.D., is the VP of Fast Data Engineering at Lightbend. He leads the development of Lightbend Fast Data Platform, a distribution of scalable, distributed stream processing tools including Spark, Flink, Kafka, and Akka, with machine learning and management tools. Dean... Read More →


Saturday November 18, 2017 11:40am - 12:20pm PST
Data

11:40am PST

Declarative concurrent programming with Chymyst
Chymyst is a new open-source framework for industry-strength declarative concurrent programming in Scala. Chymyst implements the Abstract Chemical Machine (a.k.a. Join Calculus) concurrency paradigm, which radically improves upon the well-known Actor model by making actors type-safe, stateless, and automatically managed. I show concise and fully declarative Chymyst solutions for classic concurrency problems such as the "dining philosophers" or recursive "fork/join". Chymyst is in active development; next steps on the roadmap include providing comprehensive industry-friendly features such as APIs for unit testing, performance monitoring, and fault tolerance.

Speakers
avatar for Sergei Winitzki

Sergei Winitzki

Senior Software Engineer, Workday Inc.
Theoretical physicist turned software engineer, passionate for functional programming, functional type theory, and declarative domain-specific languages.


Saturday November 18, 2017 11:40am - 12:20pm PST
Functional

11:40am PST

Building model testing infrastructure and scaling AI predictions with AWS Lambda
AI model training and validation is just a small part of the story. After the model is built, the real work begins. To service millions of customers seamlessly, every application must scale. In this talk I will showcase how customers can build multi arm bandit strategy to test their models at scale and also scale their prediction pipeline using AWS Lambda.

Speakers
avatar for Sunil Mallya

Sunil Mallya

Deep Learning, Amazon Web Services


Saturday November 18, 2017 11:40am - 12:20pm PST
Reactive

12:20pm PST

Lunch
We serve great lunch every day.

Saturday November 18, 2017 12:20pm - 1:10pm PST
Functional

1:10pm PST

Strato: Twitter’s Virtual Database Powered by Microservices
Developing software against a large collection of services is often harder and more difficult than it needs to be. Inconsistencies between service interfaces mean each type of data is queried in a slightly different way, making abstraction difficult and leading to boilerplate. By enforcing a consistent data and access model over these heterogeneous microservices we can simplify feature development. Strato exposes data from other services according to a unified data model and a single logical interface, enabling generic infrastructure for automatically generated GraphQL, REST, and Thrift APIs; drop-in caching; simplified access control; deploy-free updates; and more! Come see how we simplify life for data owners and data consumers alike!


Speakers
avatar for Michael Solomon

Michael Solomon

Software Engineer, Twitter
Mike Solomon is a software engineer on Twitter's Strato team where he uses Scala to generate uniform GraphQL, REST, and Scala APIs, and tries to make building new API services unnecessary.In his spare time he makes an audio-based choose-your-own adventure mobile game called Road Trip... Read More →


Saturday November 18, 2017 1:10pm - 1:30pm PST
Data

1:10pm PST

Error Handling Without Throwing Exceptions
In a microservice environment, it is painful to handle exceptions thrown by external services. We designed a mechanism for type-safe error handling using EitherT monad transformer to stack Future and Either. Calls to external services are implemented in a stack of EitherT for fail-fast error handling. Errors are separated for each logical external services. Error hierarchy is defined to expose different layers of the system. Location and reason of an error are provided for debugging. In addition, we validated input to each call of external service by Validated to accumulate errors. Validation rules are tested by property-based testing with ScalaCheck. We implemented both fail-fast and accumulated error handling with Cats in payment service for shopbot.

Speakers
avatar for Haeley Yao

Haeley Yao

Software Engineer, eBay
Haeley is a software engineer at eBay who works on building human assistant shopping bot and a shopping app targeted at Chinese customers, using Scala, Cats and microservice-based architecture. Haeley is interested in applying pure FP in large enterprise applications.


Saturday November 18, 2017 1:10pm - 1:30pm PST
Functional

1:10pm PST

How to Pick Your Next Streaming Architecture
Streaming is the major trend in the evolution of Big Data technologies and it is becoming a necessary part of systems designed to handle high volume of events. In this talk, we begin with characteristics of Stream Processing, and then discuss the dominant architectures for Streaming systems and continue into examples of technologies and application solutions for streaming applications. We compare the important technologies, such as Akka Streams, Storm, Kafka Streaming, Spark, Flink, Beam. We explore Quality Attributes of such systems as a mechanism for making architectural decisions and then discuss an approach for choosing a streaming systems. Outline: · Stream Processing: What is it? · Architectures for Stream Processing · Akka, Streaming, and Backpressure · Apache Storm: The Dedicated Stream system · Kafka Streaming: Taking Streaming Mainstream · Apache Spark Streaming: A versatile in-memory batch/streaming system · Apache Flink: Novel integration of batch and streaming · Apache Beam: The common API layer · Architectural Scenarios for Streaming · Choosing your Streaming System (Without Remorse!)

Speakers
avatar for Vladimir Bacvanski

Vladimir Bacvanski

Principal Architect, Strategic Architecture, PayPal
Dr. Vladimir Bacvanski's interest is in better and more productive ways to develop highly scalable and reliable software systems. Before joining PayPal, he was the CTO and founder of SciSpike, a company doing custom development and consulting. His recent projects include Big Data... Read More →


Saturday November 18, 2017 1:10pm - 1:30pm PST
Reactive

1:40pm PST

Magnolia: Generic Derivation 2.0


In the last few years, typeclasses have become an increasingly popular tool for solving a wide variety of problems Scala developers encounter every day. But while typeclasses can be composed in entirely predictable ways from smaller primitive types to support larger abstract datatypes, there's no support offered from the language to do this.

For a while, Shapeless has been an admirable enabler, leveraging implicit search and a couple of macros to provide automatic derivations, but its approach to the problem is the cause of very slow compile times; the definitions needed to derive a typeclass are often verbose, type-heavy and complex; and, when derivation fails, the user gets no debugging feedback.

Magnolia basically fixes all of these problems.
The talk will explore how Impromptu is implemented, and show how dependent types allow the framework to be written in just 30 lines of code. I will then demonstrate how a similar approach may be used to concisely implement typed actors.

Furthermore, we take advantage of current research into implicit functions in Dotty to remove the last remaining boilerplate from Impromptu's API.

Speakers
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Saturday November 18, 2017 1:40pm - 2:00pm PST
Data

1:40pm PST

Fixing the rusty core of the internet with functional programming.
The security of the internet is based on crusty old libraries with poor test coverage and code which cannot be reasoned about. Types bring more safety to our code and functional programming allows us to translate equations directly, so why are domains which are critical to the safety of the internet not using either? The idea that old code is secure because it's been around for awhile is not true and everyday new vulnerabilities are found. Cryptography, for example, is just math, which is easily translated using functional programming techniques. Using these techniques we can make the code easier to read, to test and prove, safer to use, and easier to debug. In this talk we'll dive into some best practices for providing maximal safety and show some of the common cryptographic primitives written this way. We'll discuss why it makes so much sense to rewrite critical pieces of Internet infrastructure using these techniques. Finally we'll talk about tradeoffs in speed, security and readability.

Speakers
avatar for Colt Frederickson

Colt Frederickson

Chief Types Officer, IronCore Labs


Saturday November 18, 2017 1:40pm - 2:00pm PST
Functional

1:40pm PST

Managing Nation-Wide Traffic Cameras and Sensors
Belgium is deploying a large network of traffic cameras and sensors. This nation-wide infrastructure aims at implementing smart city initiatives, better controlling air pollution, responding faster to emergency situations, and increasing security on the roads by better enforcing traffic law. This presentation describes the big data architecture put in place to collect and process in near-real time data generated by thousand of cameras and sensors and how this architecture is implemented using Apache Kafka, and the Play! and Akka Scala frameworks. It discusses the main challenges met during the project and the approaches selected to overcome them and ensure the reliability, scalability, and robustness of the overall infrastructure.

Speakers
avatar for David Massart

David Massart

Solution Architect, D.E.Solution



Saturday November 18, 2017 1:40pm - 2:00pm PST
Reactive

2:10pm PST

Complex Machine Learning Pipelines Made Easy

What if you had to build more machine learnt models than there are data scientists in the world? At enterprise companies like Salesforce, customer data comes in vastly different shapes and forms, making it impossible to build one catch-all model even when focusing on a single problem. Instead, it becomes necessary to build thousands of personalized, per-customer models for any single data-driven application.  At Salesforce, we have built solutions to these problems into a project called Optimus Prime which we are using to develop robust, production-quality machine learning applications much more quickly than using Spark alone. 

In this talk, we will demonstrate two applications of this platform. The first is AutoML which enables building simple yet powerful models for any use case even without having any background in data science. We will describe the underlying challenges of automating machine learning ranging from the user interface to data extraction and model building, touching more deeply on how we automate feature selection and model selection. The result is a system where users only need domain expertise to build production-ready machine learning applications.

 The second demonstration will be of a data product more finely tuned to a specific application. We will demonstrate a product currently in development, Case Classification - automatic classification of service cases. This application is built to not only train and predict on each customer’s individual data, but it is also able to scale the ML pipeline dynamically to accommodate any number of prediction fields; it is multi-tenant, multi-label, multi-model, multi-class predictions. We’ll contrast our implementation using Optimus Prime against one in pure Spark and then show the resulting pipeline performance on real customer data.


Speakers
avatar for Till Bergmann

Till Bergmann

Sr. Data Scientist, Salesforce
avatar for Chris Rupley

Chris Rupley

Sr. Data Scientist, Salesforce


Saturday November 18, 2017 2:10pm - 2:50pm PST
Data

2:10pm PST

Analyzing Functional Programs
Techniques like the Free Monad and Tagless Final allow us to build algebras that abstract Monad choices from our business logic. This abstraction allows unit tests to be run in a simple synchronous Monad while production code can still take advantage of complex asynchronous Monads. What else can we do with this abstracted business logic? In this talk, I'll explore the use of the Gen Monad from Scalacheck to produce random walks over our business logic consisting of the steps taken in our abstract algebras.

Speakers
avatar for David Cleaver

David Cleaver

Senior Principal Engineer, Comcast
Dave Cleaver is a Senior Principal Engineer at Comcast designing and implementing scalable Web Services and Platforms. He has spent the last two years developing and championing solutions in Scala. His interests include AI planning, distributed systems, programming languages, and... Read More →


Saturday November 18, 2017 2:10pm - 2:50pm PST
Functional

2:10pm PST

Live coding server-less, single codebase iOS, Android and Web Apps using ScalaJS, React-Native and GraphQL
Scala-JS just hit 1.0.0 and React-native is maturing well. Combining the advantages of both yields the promise of end-to-end FP to generate 3 different apps from a single code base. Add to the mix GraphQL in a server-less configuration and we have a real shot at building a meaningful app in 40 minutes. Come early.

Speakers
avatar for Irfan Ahmad

Irfan Ahmad

Founder, Magnition
Irfan Ahmad is a full-stack founder of two Silicon Valley startups. He loves new code smell and started developing with Scala back before it was cool. Prior to this, Irfan was at VMware hacking on hypervisors and at Transmeta working on the Crusoe software microprocessor. Irfan... Read More →


Saturday November 18, 2017 2:10pm - 2:50pm PST
Reactive

3:00pm PST

Deep Dive: Continuous Delivery for AI Applications with ECS
Deep learning (DL) is a computer science field derived from the Artificial Intelligence discipline. DL systems are usually developed by data scientists, who are good at mathematics and computer science. But to deploy and operationalize these models for broader use, you need the DevOps mindset and tools. In this tech talk, we’ll show you how to connect the workflow between the data scientists and DevOps. We’ll explore basic continuous integration and delivery concepts and how they can be applied to deep learning models. Using a number of AWS services, we will showcase how you can take the output of a deep learning model and deploy it to perform predictions in real time with low latency and high availability. In particular, we will showcase the ease of deploying DL predict functions using Apache MXNet (a deep learning library), Amazon ECS, Amazon S3, and Amazon ECR, Amazon developer tools, and AWS CloudFormation

Speakers
avatar for Asif Khan

Asif Khan

Containers, Deep learning, Amazon Web Service
Asif Khan is an Cloud Architect with Amazon Web Services. He provides technical guidance, design advice and thought leadership to some of the largest and successful AWS customers and partners on the planet. His deepest expertise spans application architecture, containers, devops... Read More →


Saturday November 18, 2017 3:00pm - 3:40pm PST
Data

3:00pm PST

Building a Tagless Final DSL for WebGL in Scala
In functional programming we very often find ourselves wanting to use some kind of library that doesn’t really expose a functional API. That’s where embedded domain specific languages come to the rescue. Embedded Domain Specific Languages or eDSLs allow us to build a data structure that represents the expressions of the target language. In this talk we’re going to discover the tagless final approach for building DSLs. We will also compare other styles of DSLs like ADTs and Free Monads and have a look at the respective trade-offs. Finally we will build the purely functional DSL for WebGL using Scala.js and create a small, but awesome 3d app in the browser.

Speakers
avatar for Luka Jacobowitz

Luka Jacobowitz

Software Engineer, codecentric AG


Saturday November 18, 2017 3:00pm - 3:40pm PST
Functional

3:00pm PST

Declare, verify and execute microservices-based process flows with Baker
At ING we have a microservices-based architecture. It certainly sounds cool, but how can we enable developers create innovative customer experiences even faster? How can they compose APIs in a functional fashion, reason about their system complexity and change functionality comfortably without breaking it? I'll talk about an open-source Scala library we developed to declare, verify and execute microservices-based orchestration flows. I'll demonstrate how: - to write a recipe for such a flow with an internal DSL (domain specific language); - visualize the recipe as a graph to communicate the steps of the process with business and technical stakeholders; - and run the resulting flow within a RESTful API in an event-driven architectural model; - talk about near future plans to develop an external DSL and host recipes in a serverless architecture;

Speakers
avatar for Nikola Kasev

Nikola Kasev

Software Engineer, ING Bank


Saturday November 18, 2017 3:00pm - 3:40pm PST
Reactive

4:00pm PST

Scaling From Research to Production with Skymind DL4J and ScalNet

DeepLearning4J (Deep Learning for Java - DL4J, inception 2013) was specifically designed with Enterprise and Production in mind, as a first-class citizen to the JVM.  Skymind develops and maintains the complete DL4J stack and the abstraction for Scala (ScalNet) with a focal point on scalability and vendor integrations.  

This session will focus on the challenges in migrating a research prototype to a more production ready system within the JVM.  Specifically, migrating/importing an alternative Deep Learning Framework based on python bindings (e.g. Keras via Tensorflow) to DL4J/ScalNet within a distributed environment using Apache Spark. 

A walkthrough of a temporal IoT use case modeling an LSTM Network demonstrating the different phases of a project will be shown.  Furthermore, the different workflow capabilities in crossing the language boundaries.  

 


Speakers
avatar for Ari Kamlani

Ari Kamlani

AI Technology Strategist and Architect


Saturday November 18, 2017 4:00pm - 4:40pm PST
Data

4:00pm PST

The Futures are Calling and I Must Choose
Scala has a Future type in the standard library, but when we use other libraries we have competing abstractions: Scala Futures, Twitter Futures, Java CompletableFutures, Guava ListenableFutures. Other libraries like Scalaz, Monix, and FS2 bring us Task abstractions, which are similar to Futures but lazily executed. How do all these variants relate to one another? When should we use one over another? How do we interoperate when we have to deal with code that uses a different variant? Starting from the basics of Scala Futures, we'll build up an understanding of how to select the best one for a codebase, how to make them work together, and some of the pitfalls. Along the way we'll even see some other tricks for making these useful abstractions when combined with other effects.

Speakers
avatar for Chris Phelps

Chris Phelps

Principal Software Engineer, Splunk
I've been coding in Java since the early days of the language, and in Scala for the last 4 years. My main areas of focus are in microservices and reactive approaches. As our organization is a polyglot development environment, I'd also love to talk to you about adopting and evangelizing... Read More →


Saturday November 18, 2017 4:00pm - 4:40pm PST
Functional

4:00pm PST

Implementing a Lazy Functional Language with Combinatory Logic
Topic Change: Slick Schema Code Generation

In this session, we discuss the use of code generation to remove the pain of writing out Slick database schema when working with or evolving an existing database schema. We'll demo a plugin that, with only a few lines of configuration, enables:
  • generation of tagged id types for primary and foreign keys
  • restricting schema generation to specific tables
  • separation of row and table specifications into distinct compilation units
  • customizable mappings between application and database types

This work arises from real-world experience working with Slick and legacy database schemas. The feelings of safety of having a database schema described in Scala are sometimes dwarfed by the pain of actually having to write and maintain the Scala mappings. This becomes obvious when building a new project with an existing database.

The code generation component is bundled with Slick. However, the documentation and default implementation show only the most trivial use-case and require a more complecated build. With every new project we have updated and expanded on the use-cases the plugin covers. Now, we would like to share what we have built and learned.


Speakers
avatar for Stewart Stewart

Stewart Stewart

Software Consultant, Inner Product LLC
Stewart Stewart is a software developer at Driver, a San Francisco based startup that analyzes tumors and connects cancer patients with personalized medicine. He also helps organize events at SF Scala.


Saturday November 18, 2017 4:00pm - 4:40pm PST
Reactive

5:00pm PST

The Fun of Functional Programming
This panel will close the conference by exploring the root cause -- why are we doing all of this, and what about Functional Porgramming makes it such fun?  What drives innovation, rigor, why are we insisting it all be done this way, and what makes us tick?

Moderators
avatar for Dick Wall

Dick Wall

Scala Community Guy
Dick founded the Bay Area Scala Enthusiasts (BASE)— One of the first Scala user groups. Dick is also the first recipient of the Phil Bagwell Memorial Scala Community Award. He is a committer on several Scala open source projects and creator of SubCut - a dependency injection... Read More →

Speakers
avatar for John A. De Goes

John A. De Goes

Solution Architect, De Goes Consulting
John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects.Known for his ability to take very... Read More →
avatar for Stew O'Connor

Stew O'Connor

Stew O'Connor learned about then quickly became addicted to Scala about 3 years ago, and was drawn right to the pure functional programming endeavors in Scala, he made many contributions to the scalaz library, and now is one of the primary contributors to the Cats library. He works... Read More →
avatar for Michael Pilquist

Michael Pilquist

Distinguished Engineer, Comcast
Michael Pilquist is the author of Scodec, a suite of open source Scala libraries for working with binary data, and Simulacrum, a library that simplifies working with type classes. He is a committer/maintainer on a number of other projects in the Scala ecosystem, including Cats and... Read More →
avatar for Julie Pitt

Julie Pitt

Director, Machine Learning Infrastructure, Netflix
Julie leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international... Read More →
avatar for Bill Venners

Bill Venners

Principal, Artima
Bill Venners is president of Artima, Inc., publisher of Scala consulting, training, books, and developer tools. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality... Read More →


Saturday November 18, 2017 5:00pm - 6:00pm PST
Functional
 
Filter sessions
Apply filters to sessions.