Ubiquity in mobile devices, social networks, sensors, advanced computing and instruments have created a complex data-rich environment ripe for new scientific and engineering advances. In this world of computational and data-enabled science and engineering, a dynamic yet cohesive cyberinfrastructure of technologies, services, and people, is fundamental to all aspects of the discovery process. This talk will focus on NSF’s vision, strategy and support of collaborative cyberinfrastructure.
Topics to be covered include delivering your elevator speech, honing your presentation skills, working with others, establishing credibility, and finding strong mentors. Attendees will have the opportunity to practice their elevator speech, learn why presentation skills are essential in sharing knowledge and expertise, demonstrating leadership, and creating visibility for their contributions, discover how their preferred work style meshes with those of their colleagues, and the secret of approaching a potential mentor.
Once you understand the basics of Linux, it's time to learn how to create, remove, and manipulate files and the data within them. We'll cover a range of filesystem-related commands, plus explore utilities for pattern matching, editing, searching, and more.
If you would like to follow along with the examples, please bring a laptop that a) runs Linux or Mac OSX, or b) allows you to log in to a Linux server using ssh.
Peter Ruprecht from CU's Research Computing will again be giving this tutorial.
The workshop will introduce faculty to the materials and methods for introducing computational science into the curriculum. This will include a review of existing educational materials, an introduction to simple tools that can be used to teach modeling and simulation principles, and approaches to integrating those materials into the undergraduate curriculum. The workshop will also provide a guide to educational resources and opportunities through the XSEDE project as well as a review of emerging efforts to share technical courses online with the broader community.
The Raspberry Pi platform removes the cost barriers long associated with introducing education and research in high performance computing (HPC) into smaller institutions. A Raspberry Pi consists a credit card sized computer that contains an ARM-based process and runs a full Linux operating system. With it, one can construct miniature cluster computers, hadoop clusters, create software environments analogous to those found at major supercomputing centers, and deploy applications that are also run on larger Linux computers. In this workshop we will provide detailed information on how to use the Raspberry Pi platform to teach linux cluster administration, parallel computing, and big data concepts (Hadoop, PIG, …). At the completion of the workshop, participants will be able to download images provided by an effort lead by NCAR to deploy their own Raspberry Pi clusters.
Immersive Environments historically have required significant investments in both specialized hardware and expert operational staff. The current generation of 3D display hardware and graphics cards have created an environment where this capability is now available and practical for essentially all users. This tutorial will present an overview of the critical components of an immersive environment: 3D display/projection, 3D glasses, 6DOF head and wand tracking, and representative virtual reality toolkits that very effectively connect the user to the 3D renditions of their data. A representative system will be assembled and demonstrated in order to best communicate the capabilities and value of immersive environments as tools for scientific data exploration, discovery, and communication.
In the first session we will discuss the importance of parallel computing to high performance computing. We will by example, show the basic concepts of parallel computing. The advantages and disadvantages of parallel computing will be discussed. We will present an overview of current and future trends in HPC hardware.
The second session will provide an introduction to MPI, the most common package used to write parallel programs for HPC platforms. As tradition dictates, we will show how to write "Hello World" in MPI. Attendees will be shown how to and allowed to build and run relatively simple examples on a consortium resource.
The third session will briefly discuss other important HPC topics. This will include a discussion of OpenMP, hybrid programming, combining MPI and OpenMP. Some computational libraries available for HPC will be highlighted. We will briefly mention parallel computing using graphic processing units (GPUs).
Given recent initiatives from funding agencies and a push to move academic research to be more openly accessible, managing research data has become a critical part of the research process. This tutorial will discuss how to adequately manage your data to ensure optimum visibility for you and your project, but also how to be more competitive when applying for research grants. Topics will include: data storage, metadata, writing a successful data management plan, accessibility, and ways to use data to promote your research.
The workshop will introduce faculty to the materials and methods for introducing computational science into the curriculum. This will include a review of existing educational materials, an introduction to simple tools that can be used to teach modeling and simulation principles, and approaches to integrating those materials into the undergraduate curriculum. The workshop will also provide a guide to educational resources and opportunities through the XSEDE project as well as a review of emerging efforts to share technical courses online with the broader community.
Spark is a programming model for doing large-scale data analysis in parallel, without focusing on the details of distributed computing; The same program you write for one computer will also work across many computers. Spark builds on the MapReduce framework by providing an interactive environment that has a more general set of functions for manipulating data efficiently in-memory. The result is a highly scalable way of quickly exploring large data sets interactively. This tutorial will give you a general overview of the Spark programming model. There will also be several hands-on exercises, including a few that use Spark's machine learning library, using an IPython Notebook and the PySpark API.
The Xeon Phi 3100 will be capable of more than 1 teraflops of double precision floating point instructions with 240 GB/sec memory bandwidth at 300 W.The Xeon Phi 5110P will be capable of 1.01 teraflops of double precision floating point instructions with 320 GB/sec memory bandwidth at 225 W.The Xeon Phi 7120P will be capable of 1.2 teraflops of double precision floating point instructions with 352 GB/sec memory bandwidth at 300 W.
The Xeon Phi uses the 22 nm process size.The Xeon Phi 3100 will be priced at under US$2,000 while the Xeon Phi 5110P will have a price of US$2,649 and Xeon Phi 7120 at US$4129.00. On June 17, 2013, the Tianhe-2 supercomputer was announcedby TOP500 as the world's fastest. It uses Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 PetaFLOPS.
This workshop will give an overview about Hadoop, an open source software framework for large scale data processing and the Hadoop Distributed File System (HDFS). Pig, a high-level data processing language will be used to perform data analysis exercises. Please bring your own laptop; a virtual machine with a single-node Hadoop installation will be provided.
The workshop will introduce faculty to the materials and methods for introducing computational science into the curriculum. This will include a review of existing educational materials, an introduction to simple tools that can be used to teach modeling and simulation principles, and approaches to integrating those materials into the undergraduate curriculum. The workshop will also provide a guide to educational resources and opportunities through the XSEDE project as well as a review of emerging efforts to share technical courses online with the broader community.
Many of today’s most difficult computing problems require petabyte-scale search and analysis on unstructured data, which may be text or other symbolic data. This class of computation is not handled well by traditional CPU and memory system architectures; it requires a fundamentally new approach to computing. The Micron Automata Processor (AP) is a completely new architecture for accelerating the analysis of information and generating statistical characterizations of that data. It scales to tens of thousands, even millions of processing elements for the largest challenges, with energy efficiency far greater than traditional CPUs and GPUs. It is much easier to program than FPGAs. The AP adds new thrust to this class of computing. It’s a disruptive acceleration technology that can dramatically improve throughput in many Big Data application domains.
The Automata Processor (AP) is a software-programmable silicon device, providing immensely parallel search, pattern-matching and analysis. It is designed for complex, unstructured data streams, such as text or other symbolic data. The processor leverages Micron’s expertise in the intrinsic parallelism of DRAM architectures to provide uniquely fast and highly scalable level throughput, plus extreme cost-effectiveness and energy-efficiency. It has a linear-scalable, two-dimensional fabric comprised of thousands to millions of interconnected symbol processing elements. What is unique is that each incoming symbol can be accessed by any of the compute elements in the huge array, on every clock edge. Combining simultaneous delivery of input symbols with single-clock-cycle processing enables predictable, finite execution time and massive throughput. Micron’s Software Development Kit (SDK) allows modular macros to be created, perfected, and replicated, enabling collaborative re-use in increasing scales of parallelism. The SO-DIMM board form factor makes it easy to provision PCIe adapters with the compute power needed, so it can fit in full-size GPU slots, down to the smallest server mezzanine slots. Micron’s initial development board will be a PCIE express board loaded with up to 32 AP processors. The AP is truly a massively parallel and powerful computing system available at a fraction of the cost and power of conventional computational clusters.
The Linux shell is much more than just a way to enter individual commands. In this session, we'll learn to use bash's built-in programming elements, including loops, tests and conditions, variables, and functions. With the full power of the shell at your fingertips, your efficiency and productivity will skyrocket!
If you would like to follow along with the examples, please bring a laptop that a) runs Linux or Mac OSX, or b) allows you to log in to a Linux server using ssh.
Peter Ruprecht from CU's Research Computing will again be giving this great Linux tutorial
Fortran (Fortran 2003) is a programming language that is efficient for numerical computations on supercomputers and has all elements of a modern general programming language. In this tutorial we will introduce features of Fortran 2003 that will include: style, allocatable arrays, structures, array syntax, module-based programming. The performance implications and pitfalls of some of these Fortran features will be demonstrated by several examples that will be available for download.
The goal of the tutorial is to introduce researchers and systems administrators to the easy-to-use Globus services for moving, sharing, and publishing large amounts of data. Increasingly computational- and data-intensive science makes data movement and sharing across organizations inevitable. The cloud-hosted Globus service offers dropbox-like simplicity for big data.
In this tutorial, attendees will learn how to perform fire-and-forget file transfer, sharing, and synchronization between their local machine, campus clusters, regional supercomputers and national cyberinfrastructure using Globus, via both Web and command line interfaces.
Tutorial attendees will also learn how to install Globus Connect Server on their campus cluster to provide data transfer endpoints to their users. The tutorial will include instruction on using Globus via the CLI, using scripts for controlling Globus operations; and how to use the Globus transfer REST API, for programmatic interaction with Globus. By the end of the tutorial, participants will have the tools and information required to provide their users with Globus’s full range of benefits. Attendees will also get a preview of new Globus data publication and discovery functionality that will be delivered later this year.
There are numerous reports identifying a critical need to prepare current and future generations of researchers and practitioners to utilize high performance computing systems to advance scientific discovery.
Programs to teach the skills needed among the HPC community are inadequate to address the needs of the international community. Very few colleges and universities are teaching the advanced topics needed to prepare the HPC workforce. As a result a handful of research universities and HPC centers are developing and offering formal and informal sessions to address the gap. In order to engage and prepare a larger and more diverse community of practitioners, considerable effort is being devoted to developing on-line web-based learning opportunities. Numerous academic, governmental, and industrial organizations are developing these programs to respond to this need nationally and internationally through formal credit-bearing courses, MOOCs, summer schools, professional development programs, and so forth.
The goal of this session is to engage the community in discussing the challenges and opportunities for providing high quality on-line training and education programs. The session will include a presentation and discussion of the following topics:
There are many recent additions to Python that make it an excellent programming language for data analysis. This tutorial has two goals. First, we introduce several of the recent Python modules for data analysis. We provide hands-on exercises for manipulating and analyzing data using pandas, scikit-learn, and other modules. Second, we execute examples using the IPython notebook, a web-based interactive development environment that facilitates documentation, sharing, and remote execution. Together these tools create a powerful, new way to approach scientific workflows for data analysis on HPC systems.
This tutorial will present new users with a background on general parallel computing practices, and the specifics of using parallel computing in the Matlab programming language. This discussion is intended for users who are new to parallel computing in general but not new to Matlab. Examples of converting serial code to parallel code will be given.