Name: Large Scale Data Analysis with Spark
Start: 2014-08-13T10:30:00-0700
End: 2014-08-13T12:00:00-0700

Large Scale Data Analysis with Spark

Spark is a programming model for doing large-scale data analysis in parallel, without focusing on the details of distributed computing; The same program you write for one computer will also work across many computers. Spark builds on the MapReduce framework by providing an interactive environment that has a more general set of functions for manipulating data efficiently in-memory. The result is a highly scalable way of quickly exploring large data sets interactively. This tutorial will give you a general overview of the Spark programming model. There will also be several hands-on exercises, including a few that use Spark's machine learning library, using an IPython Notebook and the PySpark API.

Speakers

Monte Lunacek

University of Colorado Boulder

Wednesday August 13, 2014 10:30am - 12:00pm PDT
Wolf 305

Intermediate HPC

RMACC HPC Symposium

Monte Lunacek

Attendees (0)

RMACC HPC Symposium

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Monte Lunacek

Attendees (0)