Spark is a programming model for doing large-scale data analysis in parallel, without focusing on the details of distributed computing; The same program you write for one computer will also work across many computers. Spark builds on the MapReduce framework by providing an interactive environment that has a more general set of functions for manipulating data efficiently in-memory. The result is a highly scalable way of quickly exploring large data sets interactively. This tutorial will give you a general overview of the Spark programming model. There will also be several hands-on exercises, including a few that use Spark's machine learning library, using an IPython Notebook and the PySpark API.