Loading…
Attending this event?
View analytic
Thursday, November 16 • 1:10pm - 1:30pm
Avoiding Spark Pitfalls at Scale

Sign up or log in to save this to your schedule and see who's attending!

Log in to leave feedback.
There’s no doubt that Apache Spark is a very powerful tool for scalable data, but beware, forces lurk in the shadows to bring upon your downfall, especially so at scale! In this talk, we’ll talk about some of the challenges and pitfalls encountered when writing data pipelines with Spark and how we’ve learned to deal with them. Our tales will involve battles with memory management, dataset typesafety, lazy versus strict evaluations, and beyond. This talk will use only the Scala API of Spark, but the tips and tricks presented will apply to Spark in general.

Speakers
avatar for Long Cao

Long Cao

Software Engineer, Coatue Management
Long is a software engineer on the data science team at Coatue Management, where he builds scalable data pipelines in Scala and Spark that consume alternative data to provide insight and market signals. He has been based in New York for the last 5 years by way of Texas and obsess... Read More →


Thursday November 16, 2017 1:10pm - 1:30pm
Data

Attendees (43)