Back To Schedule
Friday, November 17 • 3:00pm - 3:40pm
Build a Modern, End-to-End, Fully Open Source, Big Data Scala Reference Application

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Developing an end-to-end data pipeline can be difficult, especially when it comes to integrating multiple Big Data services. Current end-to-end reference applications are extremely outdated or not Scala-focused. We will walk through a modern real-time streaming application which serves as a reference framework for Scala developers wanting to develop a big data pipeline. We review code, best practices and considerations involved when integrating different components into a complete data platform (think Kafka, Spark, Storm, NiFi, Avro serialization, HBase, etc.). From IoT data collection from the edge, to flow management, real-time stream processing and analytics, through to machine learning and prediction, this reference application aims to help developers seed their own open source solutions – fast. Developers looking for reference code or documentation for integrating platform-agnostic tools will walk away with an actively supported resource. This talk hopes to empower developers to leverage open source efforts, and not be discouraged by the sometimes overwhelming undertaking that is trying out Big Data projects. We may not be able to live code or go over developing an *entire* pipeline, but the things we don't cover will be up in a repo, 100% working and documented, for people to check out. For Scala developers that may not necessarily have much experience with things Hadoop, I can also make available some kick-start tutorials and documentation that I have written while at Hortonworks.

avatar for Edgar Orendain

Edgar Orendain

Software Engineer, Hortonworks

Friday November 17, 2017 3:00pm - 3:40pm PST