Using the Canaan parallel cluster

Before we start

Open the Canaan user guide in a new browser window (hold the shift key down while you click on the link). Also open a terminal window (Ctrl-Alt-T) and arrange your windows so you have this page and the other two windows all visible at the same time.

Set up SSH

Create an SSH public/private key pair on your workstation account

To facilitate logging in and transferring files to and from remote machines like Canaan, it is useful to create an SSH pubic/private key pair for your account on the workstations. After installing your public key on a remote machine you will be able to make secure connections authenticated by the key-pair rather than a password. The instructions here presume you are working on one of our Minor Prophet workstations, but should also work from any other Linux machine or Mac running OS/X.

Before beginning, it's worth checking to see if you already have a key-pair. This is easy; just list your ~/.ssh directory with

ls -l ~/.ssh

and see if the files id_rsa and id_rsa.pub are present. If so then you already have a key-pair. If not, type

ssh-keygen

and press Enter at each prompt to accept the offered defaults.

It's important that the ownership and permissions of the ~/.ssh directory and its contents are set correctly. You can do this with the commands

  chown -R $(whoami) ~/.ssh
  chmod 0700 ~/.ssh
  chmod 0600 ~/.ssh/id_rsa

The first of these commands will generate an error if you are not the owner of your ~/.ssh directory and all its contents. If this happens, let the instructor know so the ownerships can be set correctly.

Create an SSH config file

Create or edit the file ~/.ssh/config and make sure it includes the lines (replacing firstname.lastname with your Gordon username):

Host canaan.gordon
    HostName canaan.phys.gordon.edu
    User firstname.lastname
    ServerAliveInterval 120
    ServerAliveCountMax 2
    GSSAPIAuthentication no
    ForwardX11 yes

The first line, starting with Host, not only starts the configuration block for this host, but also defines an alias for the host. You can now either use ssh to connect to canaan.phys.gordon.edu or canaan.gordon.

The User line sets the username to use when connecting to the host. This is necessary when the usernames vary across machines. For example, you can configure your accounts on your personal computer and the workstations to allow you to connect via ssh without having to type a password but you will probably need to add lines like those above into ~/.ssh/config on your personal computer so your Gordon username is used when connecting to the workstations.

Enabling passwordless login from the workstations to Canaan

Now let's try logging in to Canaan. Type

ssh canaan.phys.gordon.edu

(You could use canaan.gordon if you set that in your ~/.ssh/config file.) You may be warned this is a new connection to an unknown machine, just say “yes” and keep going. Type in your ID number when prompted for your password. If all goes well you should then be logged in. Please change your password now to something other than your ID number; do this with the yppasswd command (please do not use passwd):

yppasswd

You'll be asked for your old password and then a new password twice. Be sure at the end of the process you are told your password was updated successfully.

It's very convenient to configure your account so you can log in from your account on the workstations without having to type your password. To do this we need to copy some files from the workstations to Canaan. Log out of Canaan to get back to your workstation account prompt.

Be sure your working on one of the workstations and type the following commands to configure your account on Canaan to accept logins from your account on the workstations. You will be prompted for the password to your account on Canaan several times; use the password you chose to use on Canaan. The first of these commands will likely generate an error message since you probably already have an ~/.ssh directory on Canaan – you can just ignore it.

ssh canaan.phys.gordon.edu mkdir .ssh
ssh canaan.phys.gordon.edu chmod go-rwx .ssh
scp ~/.ssh/id_rsa.pub canaan.phys.gordon.edu:.ssh/authorized_keys
ssh canaan.phys.gordon.edu chmod go-rwx .ssh/authorized_keys

You should now be able to type

ssh canaan.phys.gordon.edu

and be immediately logged in without having to type your password.

Logging into the Canaan cluster remotely

It is not possible to connect directly to Canaan from outside Gordon's network. To work on Canaan, then, you can (1) use SSH to connect to the workstations as you normally do, and then (2) use SSH to connect to Canaan. This works, but has several limitations, the most notable being that you're limited to working in a non-GUI terminal environment. You also have to remember to log off twice, once from Canaan and then again from the workstations.

It is possible to configure some Remote Desktop clients to connect directly to Canaan via an SSH tunnel, contact the professor if you want to know more. Another alternative is X2Go, set up for which is described next. You're welcome to try other approaches than those mentioned here (terminal/putty ssh connections and X2Go), but it will be important for you to find a system that works for you.

Using X2Go

X2Go provides a remote-desktop-like experience for connecting to Linux/Unix machines running a X server. One advantage it has over Remote Desktop is that it is optimized for WAN and slower LAN connections – which makes it quite useful when working remotely.

One-time setup

Browse to http://wiki.x2go.org/doku.php/doc:installation:x2goclient and download an X2go client for your system. Clients are available for Windows, OS-X, and many distributions of Linux.
Install the client according to instructions on the client download page.
Start the X2go client and click on the "New Session" icon in the upper left corner and enter in the following information:
- Session name: Canaan (although you can name this anything you want)
- Host: canaan.phys.gordon.edu
- Login: firstname.lastname (use your's, of course)
- Click the checkbox for Try auto login (ssh-agent or default ssh key)
- Session type: GNOME (select using the pull-down menu)
- Click the checkbox for Use Proxy server for SSH connection. Inside the Proxy server section enter:
  - Host: our publicly-accessible workstation
  - Click the checkbox for Same login as on X2Go Server
  - Click the checkbox for SSH Agent or default SSH key
Click "OK" to create the session launcher. The new session launcher will appear on the right side of the client window.
Select the desired window size from the pull-down menu inside the session launcher. The default is 800x600 but you may prefer to use a larger size such as 1280x1024. This will stay at whatever you set it to until changed. You may need to experiment to find a setting that works well for you.

Starting an X2Go session

Configured X2Go sessions will appear on the right side of the X2Go window.

Click on the session name to start a session. If you are prompted for your password, type it in (remember that it is case-sensitive). After entering it a new virtual desktop window should appear and you should be logged into Canaan.
If an authentication window appears asking you "Password for root:" you can click "cancel" to dismiss the window.
Much of the work on Canaan is done using commands typed into a terminal window. To open a terminal window, use Applications -> System Tools -> Terminal. (You can drag the terminal icon from the System Tools menu to the panel at the top of the desktop; this allows you to open a terminal window with a single click.)

Ending an X2Go session

To end your session, click on your name in the upper right of the virtual desktop, select Quit..., and then click on Log Out. Important: Please be sure to do this otherwise your session will remain active.

Cloning our class Git repository on Canaan

Let's get a copy of the class repository on Canaan. This is done three familiar steps: (1) decide where you want it (I suggest creating the directory ~/cps343 and putting it there), (2) changing to the destination location, and (3) using the git clone command. The following steps assume you use the suggested location:

mkdir ~/cps343
cd ~/cps343
git clone https://github.com/gordon-cs/cps343-hoe

Although this clones the current version of the repository, it doesn't include any of the files you've created on the minor prophet workstations. If you want to “mirror” your repository from there, you can use the rsync command. I suggest doing this in two steps: (1) check what will be copied, and (2) do the copy. In the steps below replace "sally.smith" with your username on the workstations. Here we also assume that your repository on the workstations is stored in ~/cps343/cps343-hoe, you will need to modify the commands below if it's stored at another location.

cd ~/cps343/cps343-hoe
rsync -nav sally.smith@files.cs.gordon.edu:cps343/cps343-hoe/ ./

Note: the trailing slashes are important! The -nav switch is actually three different switches: -n (this is “dry run”; nothing is actually transferred), -a (archive mode - preserve file permissions and dates), and -v (be verbose; show what files are being copied). If the list of files that would be copied seems reasonable, you can reissue the same command but without the -n option:

rsync -av sally.smith@files.cs.gordon.edu:cps343/cps343-hoe/ ./

Now your repository files on Canaan should match those on the workstation cluster.

Cluster orientation

Okay, let's explore a little bit. Take a look at the Canaan the cluster configuration description. You'll notice that the cluster has a head node (this is "Canaan"), a storage/administration node, 18 compute nodes, and two network switches.

The compute nodes are arranged into two partitions, one called phys with 16 compute nodes and the other called chem with only 2 compute nodes. The sixteen phys nodes each have 24G RAM and most have 12 cores while three only have 8 cores. The chem partition, on the other hand, only has two nodes, but each has 16 cores and 64G RAM.

You've already learned a little about the Lmod Environment Modules and the SLURM resource manager back in the Cluster Computing with MPI hands-on exercise. Much of the same material is included in the Canaan user guide; please find and scan it quickly now.

Try using the module avail and module list commands to see what modules are available and loaded.

Next, use sinfo to explore the available partitions and use squeue to see if any jobs are currently running. Notice that the output from sinfo shows that there is an additional partition called allNodes: this includes all 18 compute nodes, meaning all 212 compute cores can potentially be used on a single job.

If you're using X2go, try starting sview. You can leave this running to provide a visual snapshot of the cluster's job status.

Next we'll be using srun to run parallel programs, but as noted in our previous hands-on exercise, this command can be used to run sequential programs on the cluster nodes as well. Try the following and talk about the results of each command with someone near you.

srun --ntasks=4 hostname
srun -n 4 hostname
srun --nodes=4 hostname
srun -N 4 hostname
srun --ntasks=4 --tasks-per-node=2 hostname
srun --ntasks=4 --tasks-per-node=1 hostname

You are encouraged to read the srun manual page and try other switches.

Running parallel programs

Okay, time to do something in parallel! We'll start by running the parallel Laplace solvers from last week's exercise. First, let's make sure the OpenMPI and Parallel HDF5 modules are loaded

module load openmpi hdf5

Next, change in the cps343-hoe/06-mpi-cartgrid and type make to build the programs. As a quick check, run the cart program with

srun -n 4 ./cart

You should see the same output as you saw last week.

Note: We used salloc rather than srun on the minor prophets cluster because srun does not work properly there in certain situations. If you prefer you can continue to use commands like

salloc -Q -n 4 mpiexec ./cart

on Canaan so everything works the same as on the minor prophets cluster. In the examples that follow I'll be using srun.

Last week when we work working on the minor prophets cluster we found that when we ran four processes (tasks) on a single node the Laplace MPI program ran much faster than when we placed the four processes on different nodes. Let's see if that's true on Canaan. Type

  srun --ntasks=8 --exclusive ./laplace-mpi -n 100
  srun --nodes=8 --exclusive ./laplace-mpi -n 100

(remember we have at least eight cpu cores on each node). You should see it takes more than twice as long to solve the problem - this is solely due to the interconnect communication time. However, it's exacerbated by the relatively small problem size. When we increase the grid size the times are still slower for when working across nodes but the relative impact is less:

  srun --ntasks=8 --exclusive ./laplace-mpi -n 200
  srun --nodes=8 --exclusive ./laplace-mpi -n 200

If you want to go back to the minor prophets cluster and try this experiment (but remember you can use only 4 tasks on a node), you'll see that the relative slow-down due to network communication was much worse.

Now that we know that communication between nodes is much faster on Canaan than on the minor prophets cluster, let's see what impact using nonblocking communication can have. To do this we'll generate some data and then plot it. To start, use cut-and-paste to run the following two shell loop commands:

for ((n=1;n<=16;n++))
do
    echo $n $(srun --nodes=$n --exclusive ./laplace-mpi -n 200 | awk '{print $7}')
done | tee bb-200-nodes.dat

and

for ((n=1;n<=16;n++))
do
    echo $n $(srun --nodes=$n --exclusive ./laplace-mpi-nb -n 200 | awk '{print $7}')
done | tee nb-200-nodes.dat

To plot the data we've just created, start gnuplot and type the following commands at the gnuolot> prompt:

set xlabel "Nodes (tasks)"
set ylabel "Time (seconds)"
set title "Blocking vs Nonblocking communication in Laplace Solver"
set key top right
plot "bb-200-nodes.dat" with linespoints, "nb-200-nodes.dat" with linespoints

Notice that the times in bb-200-nodes.dat (blocking) are nearly always larger than the times in bb-200-nodes.dat (nonblocking).

To save your graph as a PDF document type something like

set term png
set output "graph.png"

then reissue the plot command (you can use the up-arrow to get back previous commands). To get back to plotting on the screen use

set term "x11"
set output

To quit gnuplot, type exit at the prompt.

You can view the PNG file from the command line using the display program:

display graph.png

A parallel sorting program

Change into cps343-hoe/07-parallel-sorting directory of the class repository. Take some time to examine the source code of the psrs_qsort_timing.cc you'll find there. In particular, notice that the psrsSort() function carries out the following steps:

sort the input list (remember, this is just the portion of the list managed by a single process).
collect p samples (here p is the number of processes) from the list and use MPI_Gather() to collect them all on the master process.
(master only) sort complete sample list and select p−1 splitters. Use MPI_Bcast() to distributed them to all processes.
use splitters to determine which portions of the input list will need to be sent to each processes. Also record the length of each sublist.
use MPI_Allgather() to collect all sublist length information, determine length of output list, and allocate memory for it.
Use MPI_Sendrecv() to exchange list data with other processes.
sort the output list.

Compile the program with

mpic++ -O2 -o psrs_qsort_timing psrs_qsort_timing.cc

(don't use smake for this). To run the program you will need to supply a single positive integer argument that is the total length of the list. Start small (less than 100) and gradually increase the list size.

srun -n 8 ./psrs_qsort_timing 10
srun -n 8 ./psrs_qsort_timing 10000
srun -n 8 ./psrs_qsort_timing 100000000

When the list is short you'll see the final sorted list displayed. In each case you'll see the overall time required by the psrsSort() function.

Now recompile the program using smake or by adding the -DSHOW_TIMING_DATA to the compiler command line. Try

srun -n 8 ./psrs_qsort_timing 100000000

and observe the timing data that is displayed. You should see that most of the time is spent doing the serial quicksorts. The final column displays the ratio of (communication time) / (sort time + communication time).

Run the program with even larger list sizes and request more cpu cores. Notice how the timing data, especially the ratio in the right column, changes. Some examples might be

srun -n 20 ./psrs_qsort_timing 400000000
srun -n 60 ./psrs_qsort_timing 400000000
srun -p chem -n 32 ./psrs_qsort_timing 400000000

where the last example uses the two 16-core computers in the chem partition. Play about a bit!

Assignment

Submit an image or PDF version of labeled graph showing the timing comparison between the blocking and nonblocking Laplace solvers. To do this, log into blackboard from Canaan and submit your image directly from there. Alternatively, transfer the image file to the workstations and then your computer using scp, sftp, or some other transfer mechanism.