RMACC HPC Symposium has ended
Back To Schedule
Wednesday, August 13 • 10:30am - 12:00pm
Large Scale Data Analysis with Spark

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Spark is a programming model for doing large-scale data analysis in parallel, without focusing on the details of distributed computing; The same program you write for one computer will also work across many computers.   Spark builds on the MapReduce framework by providing an interactive environment that has a more general set of functions for manipulating data efficiently in-memory.  The result is a highly scalable way of quickly exploring large data sets interactively. This tutorial will give you a general overview of the Spark programming model.  There will also be several hands-on exercises, including a few that use Spark's machine learning library, using an IPython Notebook and the PySpark API.


Monte Lunacek

University of Colorado Boulder

Wednesday August 13, 2014 10:30am - 12:00pm PDT
Wolf 305

Attendees (0)