,

Running Production Jobs

Overview of the MOAB Batch System

All production jobs on the kronos cluster is run using the MOAB batch system. Generally, each user is allowed to submit a maximum number of 2 jobs to MOAB. If you need to increase the job limit, please contact us to send such a request. You can run interactive jobs (MOAB queue "gdebug") to debug your progrm before submitting it to the cluster. Or, Unlike interactive jobs, batch jobs are controlled via scripts. These scripts tell the system which resources a job will require and how long they will be needed, and then, are submitted to MOAB queue manager to be processed. This table lists all the queues available in MOAB.

Table 9: Common MOAB commands and description
Command Purpose
msub Submits a job to MOAB to be run
mshow/showq Displays jobs in the queue. This includes jobs running, waiting, held, etc. (see also man mshow)
canceljob Removes a job or jobs from the class. (see also man canceljob)
showstart gives an estimate of when your job will start to run.
checkjob [-v] jobid This command allows users to check their job in the event of problems with the job


Example Usage:

The command msub "script" will submit the given script for processing. You must write a script containing the information MOAB needs to allocate the resources your job requires, to handle standard I/O streams, and to run the job. Please see the example scripts below. On submission, MOAB will return the job id.

[user@kronos]:>msub test.job

3676

The commands mshow/showq will show all jobs currently running or queued on the system.

[user@kronos ~]$ showq

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME

67986                 shong    Running     1    20:16:25  Thu Aug  6 09:51:46
67760               mnswain    Running    24  1:19:25:49  Tue Aug  4 22:51:10
67542                 xchen    Running     1  1:23:36:47  Sun Aug  2 22:57:08
68154                murthy    Running     8  2:05:48:33  Wed Aug  5 19:23:54
68155                murthy    Running     8  2:05:50:15  Wed Aug  5 19:25:36
67572              sschurer    Running    32  3:21:45:28  Mon Aug  3 11:20:49
67630                akumar    Running    18  5:00:37:42  Tue Aug  4 14:13:03
67729                akumar    Running     6  5:02:54:51  Tue Aug  4 16:30:12
67999              sschurer    Running    32  6:02:01:28  Wed Aug  5 15:36:49
68157                akumar    Running    18  6:08:11:57  Wed Aug  5 21:47:18

10 active jobs          148 of 240 processors in use by local jobs (61.67%)
                          21 of 30 nodes active      (70.00%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME

68008                   him       Idle   144    23:50:00  Wed Aug  5 15:45:02

1 eligible job    

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 blocked jobs   

Total jobs:  11
For details about your particular job, issue the command checkjob job id where job id is obtained from the "Id" field of the llq output. The command canceljob job id where job id is obtained from the "Id" field of the showq output. This command will remove the job from the class and terminate the job if it is running.
[user@kronos]:>canceljob 3676
job '3677' cancelled

An example script for a serial Job

#!/bin/bash
#
#MOAB -l nodes=1:ppn=1
#MOAB -l walltime=2:00:00
#MOAB-l mem=1GB
#MOAB -o output_filename
#MOAB -j oe
#MOAB -m bea
#MOAB -M uid@miami.edu
#MOAB -V
#MOAB -q gsmall
cd ${HOME}/sample_prog
sample_prog a b c

Here is a line-by-line breakdown of the keywords and their assigned values listed in this script:

#!/bin/bash

Specifies the shell to be used when executing the command portion of the script. The default is korn shell.

#MOAB -l nodes=1:ppn=1

Specifies a resource requirement of 1 compute node and 1 processor per node.

#MOAB -l walltime=2:00:00

Specifies a resource requirement of 2 hours of wall clock time to run the job.

#MOAB-l mem=1GB

Specifies a resource requirement of at least 1 GB to run the job.

#MOAB -o output_filename

Specifies the name of the file where job output is to be saved. May be omitted to generate filename appended with jobid number.

#MOAB -j oe

Specifies that job output and error messages are to be joined in one file.

#MOAB -m bea

Specifies that MOAB send email notification when the job begins (b), ends (e), or aborts (a).

#MOAB -M uid@miami.edu

Specifies the email address where MOAB notification is to be sent.

#MOAB -V

Specifies that all environment variables are to be exported to the batch job.

#MOAB -q gsmall

The -q directive specifies a queue for job submission, here the job is submitted to the gsmall queue

MOAB stops reading directives at the first executable (i.e. non-blank, and doesn't begin with #) line. The last two lines simply say to change to the directory /sample_prog and then run the executable sample_prog with arguments a b c.

An example script for an MPI Job

#!/bin/bash
#MOAB -l nodes=8:ppn=2
#MOAB -l walltime=2:00:00
#MOAB -l mem=1GB
#MOAB -o output_filename
#MOAB -j oe
#MOAB -m bea
#MOAB -M uid@miami.edu
#MOAB -V
#MOAB -q gsmall

#! Full path to executable + executable name
executable="<executable>"

#! Run options for the application
options="<options>"

#! Work directory
workdir="<work dir>"

###############################################################
### You should not have to change anything below this line ####
###############################################################

#! change the working directory (default is home directory)

cd $workdir

echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo MOAB job ID is $PBS_JOBID
echo This jobs runs on the following machines:
echo `cat $PBS_NODEFILE | uniq`

#! Create a machine file for MPI
cat $PBS_NODEFILE | uniq > machine.file.$PBS_JOBID

numnodes=`wc $PBS_NODEFILE | awk '{ print $1 }'`

#! Run the parallel MPI executable (nodes*cores/node)

echo "Running $executable -procs $numnodes -hostfile machine.file.$MOAB_JOBID $options"
mpirun $executable  -procs $numnodes -hostfile machine.file.$PBS_JOBID $options

Here is a line-by-line breakdown of the keywords and their assigned values listed in this script:

#!/bin/bash

Specifies the shell to be used when executing the command portion of the script. The default is korn shell.

#MOAB -l nodes=8:ppn=2

Specifies a resource requirement of 8 compute nodes and 2 cores per node.

#MOAB -l walltime=2:00:00

Specifies a resource requirement of 2 hours of wall clock time to run the job.

#MOAB-l mem=1GB

Specifies a resource requirement of at least 1 GB/task to run the job.

#MOAB -o output_filename

Specifies the name of the file where job output is to be saved. May be omitted to generate filename appended with jobid number.

#MOAB -j oe

Specifies that job output and error messages are to be joined in one file.

#MOAB -m bea

Specifies that MOAB send email notification when the job begins (b), ends (e), or aborts (a).

#MOAB -M uid@miami.edu

Specifies the email address where MOAB notification is to be sent.

#MOAB -V

Specifies that all environment variables are to be exported to the batch job.

#MOAB -q gsmall

The -q directive specifies a queue for job submission, here the job is submitted to the gsmall queue

At the end of the MOAB protion of the script a machine file is built using the $PBS_NODEFILE variable in the work directory. This variable is supplied by the queueing system and contains the node names that are reserved by the queueing system for the particular job. The machine file is then passed as an argument to the mpirun command to launch the job

An example script for an OpenMP Job

#!/bin/bash
#MOAB -l nodes=1:ppn=8
#MOAB -l walltime=2:00:00
#MOAB -l mem=1GB
#MOAB -o output_filename
#MOAB -j oe
#MOAB -m bea
#MOAB -M uid@miami.edu
#MOAB -V
#MOAB -q gsmall

cd ${HOME}/sample_OpenMP_prog
export OMP_NUM_THREADS=8
sample_OpenMP_prog a b c

Here is a line-by-line breakdown of the keywords and their assigned values listed in this script:

#!/bin/bash

Specifies the shell to be used when executing the command portion of the script. The default is korn shell.

#MOAB -l nodes=1:ppn=8

Specifies a resource requirement of 1 compute nodes and 8 cores.

#MOAB -l walltime=2:00:00

Specifies a resource requirement of 2 hours of wall clock time to run the job.

#MOAB-l mem=1GB

Specifies a resource requirement of at least 1 GB/task to run the job.

#MOAB -o output_filename

Specifies the name of the file where job output is to be saved. May be omitted to generate filename appended with jobid number.

#MOAB -j oe

Specifies that job output and error messages are to be joined in one file.

#MOAB -m bea

Specifies that MOAB send email notification when the job begins (b), ends (e), or aborts (a).

#MOAB -M uid@miami.edu

Specifies the email address where MOAB notification is to be sent.

#MOAB -V

Specifies that all environment variables are to be exported to the batch job.

#MOAB -q gsmall

The -q directive specifies a queue for job submission, here the job is submitted to the gsmall queue

At the end of the MOAB protion of the script the environment variable OMP_NUM_THREADS is set to 8. This will result in the job running over 8 threads.