The assignments below are due on the date listed. Please come to each class having completed the reading and assignment(s) for that date shown below.

Friday January 17 – A canonical problem: matrix-matrix multiplication

Read: Chacon & Straub Chapters 1, 2 and Eijkhout (HPC) Chapter 20

Assignment: As you read Eijkhout (HPC) Chapter 20, use one of the workstations (or your own Linux or mac computer) to try as many of the in-line exercises as you need to so that you are comfortable working the Linux command line.

Wednesday, January 22 – Introduction to HPC (KOS 244)

Read: Eijkhout (HPC) Chapters 21–22

Assignment: Using one of the workstations (or your own Linux or mac computer), create a git repository and practice working with it as outlined in Chapter 2 of Chacon and Straub. In particular, practice adding files, making commits, editing files, committing the changes, examining the status, etc.

Friday, January 24 – A brief history and overview of HPC

Read: Barlas 1.1–1.3

Assignment: (HW01) Finish the hands-on exercise: Using your notes from the hands-on exercise, write a report that provides your observed timings and FLOP rates along with your observations about the runs. Include what seems important to you, but be sure to address at least these questions: What parallel program and which ijk ordering performed the best? Did others observe similar behavior? What conclusions can draw?

Also, explore HPC University's student internship listings

Monday, January 27 – Performance metrics, prediction, and measurement

Read: Barlas 1.4–1.5 and Eijkhout (HPC) 2.2 through 2.2.3

Wednesday, January 29 – Debugging and profiling programs (KOSC 244)

Read: Familiarize yourself with the Gprof and Valgrind user manuals. Spend enough time to develop a basic understanding of the purpose of each tool.

Assignment: (HW02) Barlas Chapter 1: Exercises 4, 5, 6 (pg 26)

Friday, January 31 – Dense matrix algebra and libraries

Read: Eijkhout (HPC) 2.4

Assignment: Finish the hands-on exercise if you were not able to complete it in Wednesday. Also, check out the assignment for Monday (listed just below) and start working on it as you are able.

Monday, February 3 – Dealing with data; data files and HDF5

Read: Eijkhout (HPC) Chapter 24 and familiarize yourself with Chapter 1 of the HDF5 Users Guide).

Assignment: (HW03) Modify the program mult-template.c so that it uses the CBLAS routine cblas_dgemm() to perform the matrix-matrix product. Documentation on CBLAS is a little sketchy, but I'm sure if you search the web for “cblas examples” you'll find what you need. Be sure and load the atlas module so that the CBLAS library is accessible:

module load atlas
You'll need to link your program with the CBLAS and Atlas libraries; do this by including -lcblas -latlas on the compiler command line.

If you want to use smake to compile the program you'll need to make sure the smake module is loaded:

module load smake
The Smake line for the compile the program will be something like
/*
 * $Smake: gcc -Wall -O3 -o %F %f -lcblas -latlas
 */

Wednesday, February 5 – Reading and writing HDF5 files (KOSC 244)

Assignment: (HW04) Barlas Chapter 1 Exercise 7 (pg 26), modified as follows: (1) Rather than reading and writing data from/to files, create and sort a list of random integers. The file makeList.cc can be used as a starter program for this assignment. (2) Feel free to base your mergesort functions on the functions mergesort() and the “Efficient variant” of merge() found at http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/merge/mergen.htm. Note: You should modify both functions so the first parameter is int a[] (or the equivalent to int* a), a pointer to the array to be sorted. You should also modify merge() to add the line int* b = new int [m-lo+1]; before the first loop and add the line delete [] b; just before the end of the function.

Friday, February 7 – A model HPC problem: Finite difference solution of the unsteady heat equation in multiple dimensions

Read: The first two pages (up to 15.2.1) of Finite Difference Approximation of Derivatives

Assignment: (HW05) Do both the following:

  1. Finish the HDF5 hands-on exercise and email the professor with (1) the full path to the source and executable version of your program, and (2) the full paths to the HDF5 files produced by your program when run with the small.h5 and big.h5 files.
  2. Modify the program you wrote for Barlas Chapter 1 Exercise 7 so that it reads integer data from an HDF5 file and writes the sorted list to another HDF5 file. The file containing the unsorted data is stored in /gc/cps343/random_list.dat on the workstations. You can use the h5dump utility with the -H flag to examine the file header and determine the dataset name.

Monday, February 10 – Parallel algorithm analysis and design

Read: Barlas 2.1–2.3. See also Eijkhout (HPC) 2.3

Assignment: Start working on Barlas Chapter 2 Exercise 1 (pg 54). (See the note immediately below that clarifies this assignment).

Wednesday, February 12 – Memory hierarchy & data organization (KOS 244)

Read: Eijkhout (HPC) 1.7

Assignment: (HW06) Barlas Chapter 2 Exercise 1 (pg 54). Note: Compute the total communication volume, as was done in equation (2.4). This is not the same as the number of communication operations. You need to compute the total amount of data that is communicated.

Friday, February 14 – Parallel algorithm analysis and design; Parallel architectures

Read: Barlas 2.4–2.5

Assignment: (HW07) Working with a partner, complete the hands-on exercise for Memory hierarchy and data organization. Your team should submit a report as described in the assignment section of the exercise.

Monday, February 17 – Shared memory programming: threads, semaphores & monitors

Read: Barlas 3.1–3.5

Assignment: Start working on Parallel program design: PCAM

Wednesday, February 19 – Using threads and OpenMP (KOS 244)

Read: Barlas 3.6–3.7

Assignment: (HW08) Parallel program design: PCAM

Friday, February 21 – Shared memory programming made easy: OpenMP

Read: Barlas 4.1–4.4

Assignment: Project 1 Due. Also, turn in (HW09), your report from the hands-on exercise Using OpenMP.

Monday, February 24 – Distributed memory programming: Introduction to MPI

Read: Barlas 5.1–5.3

Wednesday, February 26 – Introduction to cluster computing with MPI (KOS 244)

Read: Barlas 5.4–5.7

Friday, February 28 – MPI collective communication

Read: Barlas 5.8–5.11

Assignment: (HW10) Complete the hands-on exercise Introduction to cluster computing with MPI and turn in your report, including printed copies of your well-documented source code for the ring-pass3 program.

Monday, March 2 – MPI derived datatypes

Read: Barlas 5.12–5.13

Assignment: (HW11) Consider the problem of forming the transpose of an N×N matrix A. Suppose that there are N processes and process i contains an N-element array u that is a ith row of A. Write a single MPI_Alltoall() function call so that after the function is called, process i contains the ith row of AT stored in the N-element array v. Draw “before” and “after” diagrams to show the contents of u and v and show that A is transformed into AT.

Wednesday, March 4 – Midterm Exam

Assignment: Review for exam

Monday, March 16 – COVID-19 Reset day

Wednesday, March 18 – Project 2 work day

Friday, March 20 – MPI derived datatypes example: Cartesian grids
Read: Barlas 5.12–5.13

Monday, March 23 – Parallel I/O in MPI with HDF5
Read: Barlas 5.15

Wednesday, March 25 – Working with Cartesian grids in MPI (Zoom)

Friday, March 27 – MPI Example: Parallel sorting

Monday, March 30 – Project 3 work day

Wednesday, April 1 – Parallel sorting with MPI on Canaan cluster (Zoom)

Friday, April 3 – Introduction to GPU programming and CUDA
Read: Barlas 6.1–6.3
Assignment: Complete hands-on exercise Using the Canaan parallel cluster.

Monday, April 6 – CUDA memory types
Read: Barlas 6.4–6.5

Wednesday, April 8 – Introduction to CUDA (Zoom)
Read: Barlas 6.6 (focus on 6.6.1 and 6.6.2)

Friday, April 10 – Good Friday

Monday, April 13 – Easter Monday

Wednesday, April 15 – Global and shared memory in CUDA (Zoom)
Assignment: Complete hands-on exercise Introduction to CUDA.

Friday, April 17 – CUDA optimization
Read: Barlas 6.7.1–6.7.3
Assignment: Complete hands-on exercise CUDA shared memory.

Monday, April 20 – CUDA optimization example: parallel reduction
Read: Barlas 6.7.4–6.7.7

Wednesday, April 22 – CUDA profiling and debugging
Read: Barlas 6.9–6.10

Friday, April 24 – Introduction to the Thrust template library
Read: Barlas 7.1–7.4.1

Monday, April 27 – Thrust algorithms
Read: Barlas 7.4.2–7.4.5

Wednesday, April 29 – Using Thrust (Zoom)

Friday, May 1 – OpenACC: Accelerator programming made easy

Monday, May 4 – More about OpenACC

Wednesday, May 6 – Using OpenACC (Zoom)