Loading…
Thursday, November 16 • 1:10pm - 1:30pm
Avoiding Spark Pitfalls at Scale

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
There’s no doubt that Apache Spark is a very powerful tool for scalable data, but beware, forces lurk in the shadows to bring upon your downfall, especially so at scale! In this talk, we’ll talk about some of the challenges and pitfalls encountered when writing data pipelines with Spark and how we’ve learned to deal with them. Our tales will involve battles with memory management, dataset typesafety, lazy versus strict evaluations, and beyond. This talk will use only the Scala API of Spark, but the tips and tricks presented will apply to Spark in general.

Speakers
avatar for Long Cao

Long Cao

Software Engineer, Coatue Management
Long is a software engineer on the data science team at Coatue Management, where he builds scalable data pipelines in Scala and Spark that consume alternative data to provide insight and market signals. He has been based in New York for the last 5 years by way of Texas and obsesses... Read More →


Thursday November 16, 2017 1:10pm - 1:30pm PST
Data