Name: Avoiding Spark Pitfalls at Scale
Start: 2017-11-16T13:10:00-0800
End: 2017-11-16T13:30:00-0800

Avoiding Spark Pitfalls at Scale

Feedback form is now closed.

There’s no doubt that Apache Spark is a very powerful tool for scalable data, but beware, forces lurk in the shadows to bring upon your downfall, especially so at scale! In this talk, we’ll talk about some of the challenges and pitfalls encountered when writing data pipelines with Spark and how we’ve learned to deal with them. Our tales will involve battles with memory management, dataset typesafety, lazy versus strict evaluations, and beyond. This talk will use only the Scala API of Spark, but the tips and tricks presented will apply to Spark in general.

Speakers

Long Cao

Software Engineer, Coatue Management

Long is a software engineer on the data science team at Coatue Management, where he builds scalable data pipelines in Scala and Spark that consume alternative data to provide insight and market signals. He has been based in New York for the last 5 years by way of Texas and obsesses... Read More →

Thursday November 16, 2017 1:10pm - 1:30pm PST
Data

Data

Attendees (43)

S
K
I
V
L
S
P
R
j
C
J
View All →

Scale By the Bay 2017

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Long Cao

Attendees (43)