Pawsey

Info about Pawsey can be found here.

There are 3 supercomputers available for use, these are magnus, zeus, and zythos.
magnus is the supercomputer which is used for running parallel applications such as WRF.
zeus and zythos are meant to be used more for single processor and memory intensive jobs.

Extensive documentation for the supercomputers can be found here

Detailed documentation for magnus can be found here

To log-on to magnus, zeus, or zythos:

ssh –Y user@magnus.ivec.org
ssh -Y user@zeus.ivec.org
ssh -Y user@zythos.ivec.org

data.pawsey.org.au houses our data

If you want to access our data, you can do this via a web interface (go to https://data.pawsey.org.au/ and click on the "My Data" link at the top of the page), or you can use the command line tool to access this data from magnus. Instructions on the command line tool are available from data.ivec.org under the "Tools" link at the top of the page.

(if you are not confident in Linux, please refer to our Linux page)

Filesystems

  • All 3 supercomputers (magnus, zeus, zythos) share the same 3 filesystems, /home, /scratch, and /group
    • /home/your_username/ contains system files such as your .bashrc file
    • /scratch/y98 is where we run stuff. /scratch has a 30 day purge policy so nothing should go here that you want to keep. Use /scratch to process data and run jobs and then move the data somewhere else.
    • /group/y98 has group read, write, and execute permissions by default, to make sharing easier, which /scratch does not. Currently, we only have an allocation of 1Tb on /group so this space fills up quickly.

Software

  • All 3 supercomputers used the module tool
    • module list - list currently loaded modules
    • module avail - list available modules
    • module load/remove module_name - add or remove modules

Running Jobs

There is extensive documentation available online for running jobs on magnus. The Magnus user guide can be found on the Pawsey web site.

Unlike the old supercomputer, epic, which used PBS Pro as its job scheduler, Magnus uses SLURM. More information on SLURM which is specific to Magnus can be found on the SLURM pages at ivec.org. Here is a quick example of how a job script to run on magnus should look.

#!/bin/bash -l
#SBATCH --account=y98
#SBATCH --ntasks=168
#SBATCH --ntasks-per-node=24
#SBATCH --time=18:00:00
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=juliaandrys@gmail.com
#SBATCH --export=NONE
module swap PrgEnv-cray PrgEnv-intel
module load cray-netcdf
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
export NETCDF=/opt/cray/netcdf/4.3.0/INTEL/130/
cd /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrfbdy_d01_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrfbdy_d01
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrffdda_d01_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrffdda_d01
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrfinput_d01_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrfinput_d01
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrfinput_d02_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrfinput_d02
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrfinput_d03_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrfinput_d03
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrflowinp_d01_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrflowinp_d01
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrflowinp_d02_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrflowinp_d02
cp /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/run_real_2000/wrflowinp_d03_2000_06 /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/wrflowinp_d03
cp tmp_namelist_2000_06 namelist.input
aprun -B ./wrf.exe >& wrf_2000_06.out </dev/null
mv rsl.error.0000 rsl_error_2000_06 ;mv rsl.out.0000 rsl_out_2000_06; rm rsl.*
./run_wrf_2000_07 >& run_wrf_2000_07.out; exit
  • This job will run on 7 nodes (168 cores / 24 cores per node). The queue that the job will run on is determined by how you submit the job to the job scheduler. Assuming I am running on magnus and want the job to run on the magnus work queue, I would submit this job using the following:
sbatch era_MI_2000_06
  • If you want to submit this job when you are logged into another supercomputer, you need to specify which system and which job queue you want the job to run on. It is good practice to always fully specify where you want a job to run so that any scripts you run will always work. To submit the above script to magnus using specified options:
sbatch -M magnus -p workq era_MI_2000_06
  • To query jobs on either work queue:
squeue -M magnus -p workq -u <your username>
squeue -M zeus -p work -u <your username>

Moving Data

  • If you have a lot of data that you want to move, from data.pawsey.org.au, or from any other source, you would need to use the copy queue (copyq) on zeus. Here is a script to move some data from /scratch to data.pawsey.org.au:
#!/bin/bash
#SBATCH --account=y98
#SBATCH --ntasks=1 --ntasks-per-node=1
#SBATCH --time=12:00:00
#SBATCH --mail-type=END --mail-type=FAIL
#SBATCH --mail-user=juliaandrys@gmail.com
#SBATCH --export=NONE
cd /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/
mkdir -p /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/2000
mv wrfout_d0*_2000-04-* /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/2000/
tar cfvp era_wrfrst_2000-04-01.tar wrfrst_d0*_2000-04-01_00:00:00 >& tar_era_wrfrst_2000-04-01.out
tar cfvp era_wrfrst_2000-04-15.tar wrfrst_d0*_2000-04-15_00:00:00 >& tar_era_wrfrst_2000-04-15.out
cd /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/2000/
tar cfvp era_wrfout_d01-2000-04.tar wrfout_d01_2000-04-* >& tar_era_wrfout_d01-2000-04.out
tar cfvp era_wrfout_d02-2000-04.tar wrfout_d02_2000-04-* >& tar_era_wrfout_d02-2000-04.out
tar cfvp era_wrfout_d03-2000-04.tar wrfout_d03_2000-04-* >& tar_era_wrfout_d03-2000-04.out
cd /home/julia/bin/
module load java
ashell.py "cf /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/ERA-10Y-MI/wrf_rst + put /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/era_wrfrst_2000-04-01.tar" >& /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/rsync_wrfrst_2000_04-01.out
ashell.py "cf /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/ERA-10Y-MI/wrf_rst + put /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/era_wrfrst_2000-04-15.tar" >& /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/rsync_wrfrst_2000_04-15.out
ashell.py "cf /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/ERA-10Y-MI/wrf_out + put /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/2000/era_wrfout_d01-2000-04.tar" >& /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/rsync_wrfout_d01_2000_04.out
ashell.py "cf /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/ERA-10Y-MI/wrf_out + put /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/2000/era_wrfout_d02-2000-04.tar" >& /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/rsync_wrfout_d02_2000_04.out
ashell.py "cf /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/ERA-10Y-MI/wrf_out + put /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/2000/era_wrfout_d03-2000-04.tar" >& /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/rsync_wrfout_d03_2000_04.out
  • To submit this job to the copy queue:
sbatch -M zeus -p copyq <job name>

data.pawsey.org.au

data.pawsey.org.au is the new interface through which we will access our data stored on Pawsey infrastructure. Data can be viewed, managed and shared via the online interface accessible at data.pawsey.org.au or it can be accessed from magnus and zeus using a program called ashell.py. This program, and the instructions on how to use it, can be found in the Tools section here. Some more detailed documentation on ashell can be found via the help pages on the Pawsey website. Look for the "How do I use the Command Line Tool" button at the bottom of the page.

The documentation available on the Pawsey website is very detailed with respect to ashell so I am not going to go over it here. If you do use ashell, you will need to set up a delegate. Information on how to do this is available on the help page, near the bottom. This will allow you to use ashell without having to log in all the time.

The script above uses the "put" command in ashell to move data to data.pawsey.org.au (here's a line of that script):

ashell.py "cf /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/ERA-10Y-MI/wrf_rst + put /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/era_wrfrst_2000-04-01.tar" >& /scratch/y98/julia/WRF-ERA-10Y/WRFV3/run/MI/rsync_wrfrst_2000_04-01.out

and here is an example of a script which is using "get" to download data from from data.pawsey.org.au:

#!/bin/bash
#SBATCH --account=y98
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=02:00:00
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=juliaandrys@gmail.com
cd $SLURM_SUBMIT_DIR

mm=(01 02 03 04 05 06 07 08 09 10 11 12)
dd=(31 28 31 30 31 30 31 31 30 31 30 31)
for i in {0..11}
        do
        cd /home/julia/bin
        ashell.py "cd /scratch/y98/julia/WRF_CLIM_PROCESSING/CCSM + get /projects/SWWA Downscaled Climate/WRF-CLIM-OUT/CCSM-20C/wrf_hrly/20C_wrfhrly_d02-1970-${mm[$i]}.tar"
        cd /scratch/y98/julia/WRF_CLIM_PROCESSING/CCSM
        tar -xvf 20C_wrfhrly_d02-1970-${mm[$i]}.tar
        rm 20C_wrfhrly_d02-1970-${mm[$i]}.tar
        #check files
        files=$(ls -l wrfhrly_d02_1970-${mm[$i]}* | wc -l)
        if [ "$files" -ne ${dd[$i]} ]; then
        echo "File issue in ${mm[$i]} 1970"
        echo "File issue in ${mm[$i]} 1970" | mailx -s "ISSUE WITH FILES" juliaandrys@gmail.com
        fi
        #check file sequence
        if ./issequential.sh "wrfhrly_d02_1970-${mm[$i]}*" 21-22; then
                echo "Files are in sequence"
                else
                echo "Files are not in sequence"
        echo "File sequence issue in ${mm[$i]} 1970" | mailx -s "ISSUE WITH FILE SEQUENCE" juliaandrys@gmail.com
        fi
done

Virtual Machine (Ubuntu Server)

When you log-on to the Virtual Machine (VM), you will automatically go to your $HOME directory. Do not put any data under your $HOME, instead, create a directory under /data.

The /data drive has only 10TB of space allocated, and /home has only 39 GB. To check disk space:

df -h /data
df -h /home

Please do NOT fill up /home, as the system will become unusable until space is freed on /home

To run NCL on the VM, add the following to your $HOME/.bashrc at the very end:

export NCARG_ROOT=/usr/local/ncl-6.4.0
export PATH=$NCARG_ROOT/bin:$PATH

To run Python on the VM, it is best to install miniconda for you to install various python libs.

To run scripts in the background, e.g., scripts which will take a long time to run, use nohup as follows, but do Not close the terminal on your machine, but let it loose its ssh connection on its own

nohup ./some_script.sh >& some_script.log &
nohup ncl some_script.ncl >& some_script.log &

The VM has 16 cores, and you can run multiple scripts at the same time, in parallel. Before running anything in parallel, please check how many cores are in use using the htop command.

Additionally, before running several scripts in parallel, check how much memory a single script uses, by running the script, and using the htop command, which will tell you how much memory is used. The VM only has 47.2 GB of RAM. If you run several scripts, and each takes a lot of RAM, the the system may crash if you exceed what's available.

Please to do not use all 16 cores, but leave at least 1 or 2 cores free. To run multiple scripts all at once, use the following syntax (this example will leave 2 cores free):

nohup parallel -j -2 < list_of_commands.txt 2>&1

The list_of_commands.txt file may look something like this:

ncl tmax_1981.ncl >& tmax_1981.log
ncl tmax_1982.ncl >& tmax_1982.log
ncl tmax_1983.ncl >& tmax_1983.log
ncl tmax_1984.ncl >& tmax_1984.log
ncl tmax_1985.ncl >& tmax_1985.log
ncl tmax_1986.ncl >& tmax_1986.log
ncl tmax_1987.ncl >& tmax_1987.log
ncl tmax_1988.ncl >& tmax_1988.log
ncl tmax_1989.ncl >& tmax_1989.log
ncl tmax_1990.ncl >& tmax_1990.log
ncl tmax_1991.ncl >& tmax_1991.log
ncl tmax_1992.ncl >& tmax_1992.log
ncl tmax_1993.ncl >& tmax_1993.log
ncl tmax_1994.ncl >& tmax_1994.log
ncl tmax_1995.ncl >& tmax_1995.log
ncl tmax_1996.ncl >& tmax_1996.log
ncl tmax_1997.ncl >& tmax_1997.log
ncl tmax_1998.ncl >& tmax_1998.log
ncl tmax_1999.ncl >& tmax_1999.log
ncl tmax_2000.ncl >& tmax_2000.log
ncl tmax_2001.ncl >& tmax_2001.log
ncl tmax_2002.ncl >& tmax_2002.log
ncl tmax_2003.ncl >& tmax_2003.log
ncl tmax_2004.ncl >& tmax_2004.log
ncl tmax_2005.ncl >& tmax_2005.log
ncl tmax_2006.ncl >& tmax_2006.log
ncl tmax_2007.ncl >& tmax_2007.log
ncl tmax_2008.ncl >& tmax_2008.log
ncl tmax_2009.ncl >& tmax_2009.log
ncl tmax_2010.ncl >& tmax_2010.log
ncl tmax_2011.ncl >& tmax_2011.log
ncl tmax_2012.ncl >& tmax_2012.log
ncl tmax_2013.ncl >& tmax_2013.log
ncl tmax_2014.ncl >& tmax_2014.log
ncl tmax_2015.ncl >& tmax_2015.log
ncl tmax_2016.ncl >& tmax_2016.log
ncl tmax_2017.ncl >& tmax_2017.log

The list_of_commands.txt file can have as many commands as you want. In the example above, 14 will run at the same time, when they finish, then the next 14 will start.

It can be useful to run WRF on the VM for quick trouble shooting. WRFv3.8.1 has already been compiled and available here:

/data/WRF_BUILDS/WRFv3.8/Build_WRF/

To compile WRF yourself on the VM, please refer to this web-page. Please note that all libraries have already been compiled under:

/data/WRF_BUILDS/WRFv3.8/Build_WRF/LIBRARIES

One can simply define the necessary PATHS as follows, rather than re-build all libraries again:

#!/bin/bash -l
export DIR=/data/WRF_BUILDS/WRFv3.8/Build_WRF/LIBRARIES
export CC=gcc
export CXX=g++
export FC=gfortran
export FCFLAGS=-m64
export F77=gfortran
export FFLAGS=-m64
export PATH=$DIR/netcdf/bin:$PATH
export NETCDF=$DIR/netcdf
export PATH=$DIR/mpich/bin:$PATH
export LDFLAGS=-L$DIR/grib2/lib 
export CPPFLAGS=-I$DIR/grib2/include 
export JASPERLIB=$DIR/grib2/lib
export JASPERINC=$DIR/grib2/include
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License