Scaling with Apache Spark (or a lesson in unintended consequences)

Location: Salon C
April 19th, 2017
11:30 AM - 12:30 PM

Apache Spark is one the most popular general purpose distributed systems in the past few years. Apache Spark has APIs in Scala, Java, Python and more recently a few different attempts to provide support for R, C#, and Julia. This talk looks at Apache Spark from a performance/scaling point of view and the work we need to do to be able to handle large datasets. In essence parts of this talk could be considered "the impact of design decisions from years ago and how to work around them." It's not all doom and gloom though, we will explore the new

Holden Karau

co-author, Learning Spark and High Performance Spark; engineer, Spark Technology Center, IBM