Make sure your repository is up-to-date:
git fetch && git pull
Change into the 09-cuda-shmem
directory.
Examine the source code in matmul.cu
. It has two
different kernels that perform matrix-matrix multiplication. The
first uses only the GPU device's global memory while the second
uses the GPU's shared memory.
Most of the CUDA runtime calls in the main()
function should look familiar, but now they are
“wrapped” in a function that checks their return value
to make sure no error occurred. Two new CUDA functions not seen
in the previous hands-on exercise
are cudaDeviceSynchronize()
(waits for kernel to
finish) and cudaGetLastError()
(checks if error
occurred during kernel execution).
Compile the program with make
. This will build
three different executables, each with a different CUDA block
size. Each program takes the matrix dimension as a command-line
argument. Run the programs with several different values; try 10,
100, 1000, 2000, and others, but do not exceed 4000. Notice the
times reported for the matrix-matrix products and the speed-up
afforded by using the GPU device's shared memory.
Examine the shell script collect_data.sh
. This will
run each of the three programs with a range of matrix dimensions
(1580 through 1620). The tee
program will split the
output stream into two streams, one directed to the terminal (or
wherever standard output is going) and the other to the named file.
The script then separates the global-memory kernel and shared-memory
kernel timing data into individual files.
Run the shell script with
./collect_data.sh
This will take a minute or so. Once it's finished you'll find a
bunch of new files in your directory with names
like Global.QuadroK620.16
and Shared.QuadroK620.16
. The first part of the name
indicates whether the data is from global or shared memory
kernels. The second part indicates the GPU device, and the third
part is the CUDA block dimension.
Assuming you've done this on Zechariah
(a.k.a. nabi.cs.gordon.edu
), use ssh
to
connect to Haggai, Hosea, Joel, Jonah, Malachi, Micah, Nahum, or
Obadiah and run the command again. Once you've collected the data
you can exit the shell to get back to Zechariah. For the curious,
you can also try this on Amos.
Start the gnuplot
program and try the following
graphing commands typed at the gnuplot>
prompt
(replace “QuadroK620” with
“QuadroK2000” or “QuadroP400” as
appropriate):
set key center left set style data linespoints set xlabel "Matrix dimension" set ylabel "Time (seconds)" set title "QuadroK620 matrix-matrix timing data" plot 'Global.QuadroK620.16','Global.QuadroK620.32','Shared.QuadroK620.16','Shared.QuadroK620.32'
Be sure you plot data from each of the data files and compare them with one another. Look for interesting patterns. If possible, compare your results with someone who is working on a machine with the other type of GPU.