Loading…
Saturday, November 18 • 10:40am - 11:00am
Deep distributed decision trees on Apache Spark

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. We present Yggdrasil, a new tree learning method implemented in Scala on Apache Spark which scales favorably as data dimensionality and tree depth grows. By partitioning the dataset by columns rather than rows, training directly on compressed data, and minimizing communication using sparse bitvectors, Yggdrasil outperforms existing distributed tree learning algorithms by up to 24x. On a high-dimensional dataset at Yahoo, Yggdrasil is shown to be faster by up to an order of magnitude.

Speakers
avatar for Feynman Liang

Feynman Liang

Director of Engineering, Gigster
Feynman is the engineering manager at Gigster and a statistics PhD student at UC Berkeley. His research lies at the intersection between industry and academia, focusing on distributed machine learning and practical systems for deploying machine learning in production. He is a contributor... Read More →


Saturday November 18, 2017 10:40am - 11:00am PST
Data