Stanford CME 213/ME 339 Spring 2021 homepage

Introduction to parallel computing using MPI, openMP, and CUDA

This is the website for CME 213 Introduction to parallel computing using MPI, openMP, and CUDA. This material was created by Eric Darve, with the help of course staff and students.

Syllabus

Policy for late assignments

Extensions can be requested in advance for exceptional circumstances (e.g., travel, sickness, injury, COVID-related issues) and for OAE-approved accommodations.

Submissions after the deadline and late by at most two days (+48 hours after the deadline) will be accepted with a 10% penalty. No submissions will be accepted two days after the deadline.

See Gradescope for all the current assignments and their due dates. Post on Slack if you cannot access the Gradescope class page. The 6-letter code to join the class is given on Canvas.

Datasheet on the Quadro RTX 6000

Final Project

Final project instructions and starter code:

Slides and videos explaining the final project:

Overview of the final project; Slides
33 Final Project 1, Overview; Video
34 Final Project 2, Regularization; Video
35 Final Project 3, CUDA GEMM and MPI; Video

Class modules and learning material

Introduction to the class

CME 213 First Live Lecture; Video, Slides

C++ tutorial

Module 1 Introduction to Parallel Computing

Slides
01 Homework 1; Video
02 Why Parallel Computing; Video
03 Top 500; Video
04 Example of Parallel Computation; Video
05 Shared memory processor; Video
Reading assignment 1
Homework 1; starter code

Module 2 Shared Memory Parallel Programming

C++ threads; Slides; Code
Introduction to OpenMP; Slides; Code
06 C++ threads; Video
07 Promise and future; Video
08 mutex; Video
09 Introduction to OpenMP; Video
10 OpenMP Hello World; Video
11 OpenMP for loop; Video
12 OpenMP clause; Video
Reading assignment 2

Module 3 Shared Memory Parallel Programming, OpenMP, advanced OpenMP

OpenMP, for loops, advanced OpenMP; Slides; Code
OpenMP, sorting algorithms; Slides; Code
13 OpenMP tasks; Video
14 OpenMP depend; Video
15 OpenMP synchronization; Video
16 Sorting algorithms Quicksort Mergesort; Video
17 Sorting Algorithms Bitonic Sort; Video
18 Bitonic Sort Exercise; Video
Reading assignment 3
Homework 2; starter code; radix sort tutorial

Module 4 Introduction to CUDA programming

Introduction to GPU computing; Slides
Introduction to CUDA and nvcc; Slides; Code
19 GPU computing introduction; Video
20 Graphics Processing Units; Video
21 Introduction to GPU programming; Video
22 icme-gpu; Video
23 a First CUDA program; Video
23 b First CUDA program part 2; Video
24 nvcc CUDA compiler; Video
Reading assignment 4
Homework 3; starter code

Module 5 Code performance on NVIDIA GPUs

GPU memory and matrix transpose; Slides; Code
CUDA occupancy, branching, homework 4; Slides
25 GPU memory; Video
26 Matrix transpose; Video
27 Latency, concurrency, and occupancy; Video
28 CUDA branching; Video
29 Homework 4; Video
Reading assignment 5
Homework 4; starter code

Module 6 NVIDIA guest lectures, openACC, CUDA optimization

30 NVIDIA guest lecture, openACC; Video; Slides
31 NVIDIA guest lecture, CUDA optimization; Video; Slides
Reading assignment 6

Module 7 NVIDIA guest lectures, CUDA profiling

32 NVIDIA guest lecture, CUDA profiling; Video; Slides
Reading assignment 7

Module 8 Group activity and introduction to MPI

The slides and videos below are needed for the final project.

Introduction to MPI; Slides; Code
37 MPI Introduction; Video
38 MPI Hello World; Video
39 MPI Send Recv; Video
40 MPI Collective Communications; Video

Material for the May 17 group activity:

generate_sequence.cpp
36 Instructions for Monday, May 17 group activity; Video; Slides

Module 9 Advanced MPI

MPI Advanced Send and Recv; Slides; Code
41 MPI Process Mapping; Video
42 MPI Buffering; Video
43 MPI Send Recv Deadlocks; Video
44 MPI Non-blocking; Video
45 MPI Send Modes; Video
Parallel efficiency and MPI communicators; Slides; Code
46 MPI Matrix-vector product 1D schemes; Video
47 MPI Matrix vector product 2D scheme; Video
48 Parallel Speed-up; Video
49 Isoefficiency; Video
50 MPI Communicators; Video
Reading assignment 8

Stanford CME 213/ME 339 Spring 2021 homepage

Syllabus

Policy for late assignments

Final Project

Class modules and learning material

Introduction to the class

C++ tutorial

Module 1 Introduction to Parallel Computing

Module 2 Shared Memory Parallel Programming

Module 3 Shared Memory Parallel Programming, OpenMP, advanced OpenMP

Module 4 Introduction to CUDA programming

Module 5 Code performance on NVIDIA GPUs

Module 6 NVIDIA guest lectures, openACC, CUDA optimization

Module 7 NVIDIA guest lectures, CUDA profiling

Module 8 Group activity and introduction to MPI

Module 9 Advanced MPI

Module 10 SLAC guest lecture, Task-based parallel programming

Reading and links

C++ threads

OpenMP

CUDA

MPI

Task-based parallel languages and APIs

Sorting algorithms