bsub

Submits a job to LSF

Synopsis

bsub [options] command [arguments]
bsub -pack job_submission_file
bsub [-h | -V]

Option list

-ar
-B
-H
-I | -Ip | -Is [-tty]
-IS | -ISp | -ISs | -IX [-tty]
-K
-N
-r | -rn
-ul
-x
-a "esub_application ..."
-app application_profile_name
-b [[year:][month:]day:]hour:minute
-C core_limit
-c [hour:]minute[/host_name | /host_model]
-clusters "all [~cluster_name] ... | cluster_name[+[pref_level]] ... [others[+[pref_level]]]"
-cwd "current_working_directory"
-D data_limit
-E "pre_exec_command [arguments ...]"
-Ep "post_exec_command [arguments ...]"
-e error_file
-eo error_file
-ext[sched] "external_scheduler_options"
-F file_limit
-f " local_file operator [remote_file]" ...
-G user_group
-g job_group_name
-i input_file | -is input_file
-J job_name | -J "job_name[index_list]%job_slot_limit"
-Jd "job_description"
-jsdl file_name | -jsdl_strict file_name
-k "checkpoint_dir [init=initial_checkpoint_period]
[checkpoint_period] [method=method_name]"
-L login_shell
-Lp ls_project_name
-M mem_limit
-m "host_name[@cluster_name][[!] | +[pref_level]] | host_group[[!] | +[pref_level | compute_unit[[!] | +[pref_level]] ..."
-mig migration_threshold
-n min_proc[,max_proc]
Start of change-network " network_res_req"End of change
-o output_file
-oo output_file
-outdir output_directory
-P project_name
-p process_limit
-pack job_submission_file
-Q "[exit_code ...] [EXCLUDE(exit_code ...)]"
-q "queue_name ..."
-R "res_req" [-R "res_req" ...]
-rnc resize_notification_cmd
-S stack_limit
-s signal
-sla service_class_name
-sp priority
-T thread_limit
-t [[[year:]month:]day:]hour:minute
-U reservation_ID
-u mail_user
-v swap_limit
-W [hour:]minute[/host_name | /host_model]
-We [hour:]minute[/host_name | /host_model]
-w 'dependency_expression'
-wa 'signal'
-wt '[hour:]minute'
-XF
-Zs
-h
-V

Description

Submits a job for execution and assigns it a unique numerical job ID.

Runs the job on a host that satisfies all requirements of the job, when all conditions on the job, host, queue, application profile, and cluster are satisfied. If LSF cannot run all jobs immediately, LSF scheduling policies determine the order of dispatch. Jobs are started and suspended according to the current system load.

Sets the user’s execution environment for the job, including the current working directory, file creation mask, and all environment variables, and sets LSF environment variables before starting the job.

When a job is run, the command line and stdout/stderr buffers are stored in the directory home_directory/.lsbatch on the execution host. If this directory is not accessible, /tmp/.lsbtmp user_ID is used as the job’s home directory. If the current working directory is under the home directory on the submission host, then the current working directory is also set to be the same relative directory under the home directory on the execution host.

By default, if the current working directory is not accessible on the execution host, LSF finds a working direction to run the job in the following order:

  1. $HOME on this host

  2. $PWD

  3. Strip /tmp_mnt if it is exists in the path

  4. Replace the first component with a key in /etc/auto.master and try each key

  5. Replace the first 2 components with a key in /etc/auto.master and try for each key

  6. Strip the first level of the path and try the rest (for example, if the current working directory is /abc/x/y/z, try to change directory to the path /x/y/z)

  7. /tmp

If the environment variable LSB_EXIT_IF_CWD_NOTEXIST is set to Y and the current working directory is not accessible on the execution host, the job exits with the exit code 2.

If no command is supplied, bsub prompts for the command from the standard input. On UNIX, the input is terminated by entering CTRL-D on a new line. On Windows, the input is terminated by entering CTRL-Z on a new line.

To kill a job submitted with bsub, use bkill.

Use bmod to modify jobs submitted with bsub. bmod takes similar options to bsub.

Jobs submitted to a chunk job queue with the following options are not chunked; they are dispatched individually:

  • -I (interactive jobs)

  • -c (jobs with CPU limit greater than 30)

  • -W (jobs with run limit greater than 30 minutes)

To submit jobs from UNIX to display GUIs through Microsoft Terminal Services on Windows, submit the job with bsub and define the environment variables LSF_LOGON_DESKTOP=1 and LSB_TSJOB=1 on the UNIX host. Use tssub to submit a Terminal Services job from Windows hosts. See Using LSF on Windows for more details.

If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, and you use the -o or -oo option, the standard output of a job is written to the file you specify as the job runs. If LSB_STDOUT_DIRECT is not set, and you use -o or -oo, the standard output of a job is written to a temporary file and copied to the specified file after the job finishes. LSB_STDOUT_DIRECT is not supported on Windows.

Default behavior

LSF assumes that uniform user names and user ID spaces exist among all the hosts in the cluster. That is, a job submitted by a given user runs under the same user’s account on the execution host. For situations where nonuniform user names and user ID spaces exist, account mapping must be used to determine the account used to run a job.

bsub uses the command name as the job name. Quotation marks are significant.

Options related to file names and job spooling directories support paths that contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows.

Options related to command names and job names can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows.

Options for the following resource usage limits are specified in KB:
  • Core limit (-C)

  • Memory limit (-M)

  • Stack limit (-S)

  • Swap limit (-v)

Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (MB, GB, TB, PB, or EB).

If fairshare is defined and you belong to multiple user groups, the job is scheduled under the user group that allows the quickest dispatch.

The job is not checkpointable.

bsub automatically selects an appropriate queue. If you defined a default queue list by setting LSB_DEFAULTQUEUE environment variable, the queue is selected from your list. If LSB_DEFAULTQUEUE is not defined, the queue is selected from the system default queue list specified by the LSF administrator with the DEFAULT_QUEUE parameter in lsb.params.

LSF tries to obtain resource requirement information for the job from the remote task list that is maintained by the load sharing library. If the job is not listed in the remote task list, the default resource requirement is to run the job on a host or hosts that are of the same host type as the submission host.

bsub assumes only one processor is requested.

bsub does not start a login shell but runs the job file under the execution environment from which the job was submitted.

The input file for the job is /dev/null (no input).

bsub sends mail to you when the job is done. The default destination is defined by LSB_MAILTO in lsf.conf. The mail message includes the job report, the job output (if any), and the error message (if any).

bsub charges the job to the default project. The default project is the project you define by setting the environment variable LSB_DEFAULTPROJECT. If you do not set LSB_DEFAULTPROJECT, the default project is the project specified by the LSF administrator with DEFAULT_PROJECT parameter in lsb.params. If DEFAULT_PROJECT is not defined, then LSF uses default as the default project name.

Options

-ar

Specifies that the job is autoresizable.

-B

Sends mail to you when the job is dispatched and begins execution.

-H

Holds the job in the PSUSP state when the job is submitted. The job is not scheduled until you tell the system to resume the job (see bresume(1)).

-I | -Ip | -Is [-tty]

Submits an interactive job. A new job cannot be submitted until the interactive job is completed or terminated.

Sends the job’s standard output (or standard error) to the terminal. Does not send mail to you when the job is done unless you specify the -N option.

Terminal support is available for an interactive job.

When you specify the -Ip option, submits an interactive job and creates a pseudo-terminal when the job starts. Some applications (for example, vi) require a pseudo-terminal in order to run correctly.

When you specify the -Is option, submits an interactive job and creates a pseudo-terminal with shell mode support when the job starts. This option should be specified for submitting interactive shells, or applications which redefine the CTRL-C and CTRL-Z keys (for example, jove).

If the -i input_file option is specified, you cannot interact with the job’s standard input via the terminal.

If the -o out_file option is specified, sends the job’s standard output to the specified output file. If the -e err_file option is specified, sends the job’s standard error to the specified error file.

If used with -tty, also displays output/error (except pre-exec output/error) on the screen.

You cannot use -I, -Ip, or -Is with the -K option.

Interactive jobs cannot be checkpointed.

Interactive jobs cannot be rerunnable (bsub -r).

The options that create a pseudo-terminal (-Ip and -Is) are not supported on Windows.

-IS | -ISp | -ISs | -IX [-tty]

Submits an interactive job under a secure shell (ssh). A new job cannot be submitted until the interactive job is completed or terminated.

Sends the job’s standard output (or standard error) to the terminal. Does not send mail to you when the job is done unless you specify the -N option.

Terminal support is available for an interactive job.

When you specify the -ISp option, submits an interactive job and creates a pseudo-terminal when the job starts. Some applications (for example, vi) require a pseudo-terminal to run correctly.

When you specify the -ISs option, submits an interactive job and creates a pseudo-terminal with shell mode support when the job starts. This option should be specified for submitting interactive shells, or applications that redefine the CTRL-C and CTRL-Z keys (for example, jove).

When you specify the -IX option, submits an interactive X-window job. The session between X-client and X-server is encrypted; the session between the execution host and submission host is also encrypted. The following must be satisfied:

  • openssh must be installed and sshd must be running on the X-server

  • xhost + localhost or xhost + displayhost.domain.com on the X-server

  • ssh must be configured to run without a password or passphrase ($HOME/.ssh/authorized_keys must be set up)

    Note: In most cases ssh can be configured to run without a password by copying id_rsa.pub as authorized_keys with permission 600 (-rw-r--r--). Test by manually running ssh host.domain.com between the two hosts both ways and confirm there are no prompts using fully qualified host names.

If the -i input_file option is specified, you cannot interact with the job’s standard input via the terminal.

If the -o out_file option is specified, sends the job’s standard output to the specified output file. If the -e err_file option is specified, sends the job’s standard error to the specified error file.

If used with -tty, also displays output/error on the screen.

You cannot use -I, -ISp, or -ISs with the -K option.

Interactive jobs cannot be checkpointed.

Interactive jobs cannot be rerunnable (bsub -r).

The options that create a pseudo-terminal (-ISp and -ISs) are not supported on Windows.

-K

Submits a job and waits for the job to complete. Sends the message "Waiting for dispatch" to the terminal when you submit the job. Sends the message "Job is finished" to the terminal when the job is done. If LSB_SUBK_SHOW_EXEC_HOST is enabled in lsf.conf, also sends the message "Starting on execution_host" when the job starts running on the execution host.

You are not able to submit another job until the job is completed. This is useful when completion of the job is required to proceed, such as a job script. If the job needs to be rerun due to transient failures, bsub returns after the job finishes successfully. bsub exits with the same exit code as the job so that job scripts can take appropriate actions based on the exit codes. bsub exits with value 126 if the job was terminated while pending.

You cannot use the -K option with the -I, -Ip, or -Is options.

-N

Sends the job report to you by mail when the job finishes. When used without any other options, behaves the same as the default.

Use only with -o, -oo, -I, -Ip, and -Is options, which do not send mail, to force LSF to send you a mail message when the job is done.

-r | -rn

Reruns a job if the execution host or the system fails; it does not rerun a job if the job itself fails.

  • If the execution host becomes unavailable while a job is running, specifies that the job be rerun on another host. LSF requeues the job in the same job queue with the same job ID. When an available execution host is found, reruns the job as if it were submitted new, even if the job has been checkpointed. You receive a mail message informing you of the host failure and requeuing of the job.

  • If the system goes down while a job is running, specifies that the job is requeued when the system restarts.

-rn specifies that the job is never rerunnable. bsub –rn disables job rerun if the job was submitted to a rerunnable queue or application profile with job rerun configured. The command level job rerunnable setting overrides the application profile and queue level setting. bsub –rn is different from bmod -rn, which cannot override the application profile and queue level rerunnable job setting.

Members of a chunk job can be rerunnable. If the execution host becomes unavailable, rerunnable chunk job members are removed from the queue and dispatched to a different execution host.

Interactive jobs (bsub -I) cannot be rerunnable.

-ul

Passes the current operating system user shell limits for the job submission user to the execution host. User limits cannot override queue hard limits. If user limits exceed queue hard limits, the job is rejected.

Restriction: UNIX and Linux only. -ul is not supported on Windows.
The following bsub options for job-level runtime limits override the value of the user shell limits:
  • Per-process (soft) core file size limit (-C)

  • CPU limit (-c)

  • Per-process (soft) data segment size limit (-D)

  • File limit (-F)

  • Per-process (soft) memory limit (-M)

  • Process limit (-p)

  • Per-process (soft) stack segment size limit (-S)

  • Limit of the number of concurrent threads (-T)

  • Total process virtual memory (swap space) limit (-v)

  • Runtime limit (-W)

LSF collects the user limit settings from the user's running environment that are supported by the operating system, and sets the value to submission options if the value is no unlimited. If the operating system has other kinds of shell limits, LSF does not collect them. LSF collects the following operating system user limits:
  • CPU time in milliseconds

  • Maximum file size

  • Data size

  • Stack size

  • Core file size

  • Resident set size

  • Open files

  • Virtual (swap) memory

  • Process limit

  • Thread limit

-x

Puts the host running your job into exclusive execution mode.

In exclusive execution mode, your job runs by itself on a host. It is dispatched only to a host with no other jobs running, and LSF does not send any other jobs to the host until the job completes.

To submit a job in exclusive execution mode, the queue must be configured to allow exclusive jobs.

When the job is dispatched, bhosts(1) reports the host status as closed_Excl, and lsload(1) reports the host status as lockU.

Until your job is complete, the host is not selected by LIM in response to placement requests made by lsplace(1), lsrun(1) or lsgrun(1) or any other load sharing applications.

You can force other jobs to run on the host by using the -m host_name option of brun(1) to explicitly specify the locked host.

You can force LIM to run other interactive jobs on the host by using the -m host_name option of lsrun(1) or lsgrun(1) to explicitly specify the locked host.

-a "esub_application ..."

Specifies one or more application-specific esub executables that you want LSF to associate with the job.

The value of -a must correspond to the application name of an actual esub file. For example, to use bsub -a fluent, the file esub.fluent must exist in LSF_SERVERDIR.

For example, to submit a job that invokes two application-specific esub executables named esub.license and esub.fluent, enter:
bsub -a "license fluent" my_job

mesub uses the method name license to invoke the esub named LSF_SERVERDIR/esub.license, and the method name fluent to invoke the esub named LSF_SERVERDIR/esub.fluent.

The name of an application-specific esub program is passed to the master esub. The master esub program (LSF_SERVERDIR/mesub) handles job submission requirements of the application. Application-specific esub programs can specify their own job submission requirements. The value of -a is set in the LSB_SUB_ADDITIONAL option in the LSB_SUB_PARM file used by esub.

If an LSF administrator specifies one or more mandatory esub executables using the parameter LSB_ESUB_METHOD, LSF invokes the mandatory executables first, followed by the executable named esub (without .esub_application in the file name) if it exists in LSF_SERVERDIR, and then any application-specific esub executables (with .esub_application in the file name) specified by -a.

The name of the esub program must be a valid file name. It can contain only alphanumeric characters, underscore (_) and hyphen (-).

Restriction: After LSF version 5.1, the value of -a and LSB_ESUB_METHOD must correspond to an actual esub file in LSF_SERVERDIR. For example, to use bsub -a fluent, the file esub.fluent must exist in LSF_SERVERDIR.

If you have an esub that runs an interactive or X-window job and you have SSH enabled in lsf.conf, the communication between hosts is encrypted.

-app application_profile_name

Submits the job to the specified application profile. You must specify an existing application profile. If the application profile does not exist in lsb.applications, the job is rejected.

-b [[year:][month:]day:]hour:minute

Dispatches the job for execution on or after the specified date and time. The date and time are in the form of [[year:][month:]day:]hour:minute where the number ranges are as follows: year after 1970, month 1-12, day 1-31, hour 0-23, minute 0-59.

At least two fields must be specified. These fields are assumed to be hour:minute. If three fields are given, they are assumed to be day:hour:minute, four fields are assumed to be month:day:hour:minute, and five fields are assumed to be year:month:day:hour:minute.

If the year field is specified and the specified time is in the past, the start time condition is considered reached and LSF dispatches the job if slots are available.

-C core_limit

Sets a per-process (soft) core file size limit for all the processes that belong to this job (see getrlimit(2)).

By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (MB, GB, TB, PB, or EB).

The behavior of this option depends on platform-specific UNIX or Linux systems.

In some cases, the process is sent a SIGXFSZ signal if the job attempts to create a core file larger than the specified limit. The SIGXFSZ signal normally terminates the process.

In other cases, the writing of the core file terminates at the specified limit.

-c [hour:]minute[/host_name | /host_model]

Limits the total CPU time the job can use. This option is useful for preventing runaway jobs or jobs that use up too many resources. When the total CPU time for the whole job has reached the limit, a SIGXCPU signal is first sent to the job, then SIGINT, SIGTERM, and SIGKILL.

If LSB_JOB_CPULIMIT in lsf.conf is set to n, LSF-enforced CPU limit is disabled and LSF passes the limit to the operating system. When one process in the job exceeds the CPU limit, the limit is enforced by the operating system.

The CPU limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.

The CPU time you specify is the normalized CPU time. This is done so that the job does approximately the same amount of processing for a given CPU limit, even if it is sent to host with a faster or slower CPU. Whenever a normalized CPU time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.

Optionally, you can supply a host name or a host model name defined in LSF. You must insert a slash (/) between the CPU limit and the host name or model name. If a host name or model name is not given, LSF uses the default CPU time normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured, otherwise uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured, otherwise uses the submission host.

Jobs submitted to a chunk job queue are not chunked if the CPU limit is greater than 30 minutes.

-clusters "all [~cluster_name] ..."
-clusters "cluster_name[+[pref_level]] ... [others[+[pref_level]]]"

You can specify cluster names when submitting jobs for LSF MultiCluster.

The -clusters option has four keywords:

  • all: Specifies both local cluster and all remote clusters in the SNDJOBS_TO parameter of the target queue in lsb.queues. For example:

    bsub -clusters all -q <send queue>

    LSF will go through the SNDJOBS_TO parameter in lsb.queues to check whether asked clusters (except for the local cluster) are members of SNDJOBS_TO. If any cluster except the local cluster does not exist in SNDJOBS_TO, the job is rejected with an error message.

  • others: Sends the job to all clusters except for the clusters you specify. For example:

    bsub -clusters "c1+3 c2+1 others+2"

  • ~: Must be used with all to indicate the rest of the clusters, excluding the specified clusters.

  • +: When followed by a positive integer, specifies job level preference for requested clusters. For example:

    bsub -clusters "c1+2 c2+1"

If the local cluster name is local_c1, and SNDJOBS_TO=q1@rmt_c1 q2@rmt_c2 q3@rmt_c3, then the requested cluster should be local_c1 and rmt_c3. For example:

bsub -clusters "all ~rmt_c1 ~rmt_c2"

-clusters local_cluster restricts the job for dispatch to local hosts. To run a job on remote clusters only, use:

bsub -clusters "all ~local_cluster"

A job that only specifies remote clusters will not run on local hosts. Similarly, a job that only specifies local clusters will not run on remote hosts. If a job specifies local and remote clusters, the job tries local hosts first, then remote clusters.

If there are multiple default queues, then when bsub -clusters remote_clusters is issued, the job is sent to the queue whose SNDJOBS_TO contains the requested clusters. For example:

bsub -clusters "c2" , DEFAULT_QUEUE=q1 q2, q1: SNDJOBS_TO=recvQ1@c1 recvQ2@c3, q2: SNDJOBS_TO=recvQ1@c1 recvQ2@c2

The job is sent to q2.

To have the job try to run on only local hosts:

bsub -q mc -clusters local_c1

To have the job try to run on only remote hosts (e.g., rmt_c1 and rmt_c2):

bsub -q mc -clusters rmt_c1 rmt_c2

To have the job first try local hosts, then rmt_c1 and rmt_c2:

bsub -q mc -clusters local_c1 rmt_c1 rmt_c2

To ignore preference for the local cluster (because the local cluster is always tried first, even though remote clusters have higher job level preference) and try remote clusters:

bsub -q mc -clusters local_c1 rmt_c1+2 rmt_c2+1

To have the job first try the local cluster, then remote clusters:

bsub -q mc -clusters all

To have the job first try the local cluster, then try all remote clusters except for rmt_c1:

bsub -q mc -clusters all ~rmt_c1

To have the job try only remote clusters:

bsub -q mc -clusters all ~local_c1

The -clusters option is supported in esub, so LSF administrators can direct jobs to specific clusters if they want to implement flow control. The -m option and -clusters option cannot be used together.

-cwd "current_working_directory"

Specifies the current working directory for job execution. The system creates the CWD if the path for the CWD includes dynamic patterns for both absolute and relative paths. LSF cleans the created CWD based on the time to live value set in the JOB_CWD_TTL parameter of the application profile or in lsb.params.

The path can include the following dynamic patterns:

  • %J - job ID

  • %JG - job group (if not specified, it will be ignored)

  • %I - index (default value is 0)

  • %EJ - execution job ID

  • %EI - execution index

  • %P - project name

  • %U - user name

  • %G - user group

For example, the following command creates /scratch/jobcwd/user1/<jobid>_0/ for the job CWD:

bsub -cwd "/scratch/jobcwd/%U/%J_%I" myjob

The system creates submission_dir/user1/<jobid>_0/ for the job's CWD with the following command:

bsub -cwd "%U/%J_%I" myprog

If the cluster wide CWD was defined and there is no default application profile CWD defined:

DEFAULT_JOB_CWD =/scratch/jobcwd/ %U/%J_%I

then the system creates: /scratch/jobcwd/user1/<jobid>_0/ for the job's CWD.

If the job is submitted with -app but without -cwd, and LSB_JOB_CWD is not defined, then the application profile defined JOB_CWD will be used. If JOB_CWD is not defined in the application profile, then the DEFAULT_JOB_CWD value is used.

In forwarding mode, if a job is not submitted with the -cwd option and LSB_JOB_CWD is not defined, then JOB_CWD in the application profile or the DEFAULT_JOB_CWD value for the execution cluster is used.

Start of changeLSF does not allow environment variables to contain other environment variables to be expanded on the execution side.End of change

By default, if the current working directory is not accessible on the execution host, the job runs in /tmp (on UNIX) or c:\LSFversion_num\tmp (on Windows). If the environment variable LSB_EXIT_IF_CWD_NOTEXIST is set to Y and the current working directory is not accessible on the execution host, the job exits with the exit code 2.

-D data_limit

Sets a per-process (soft) data segment size limit for each of the processes that belong to the job (see getrlimit(2)). The limit is specified in KB.

This option affects calls to sbrk() and brk() . An sbrk() or malloc() call to extend the data segment beyond the data limit returns an error.

Note: Linux does not use sbrk() and brk() within its calloc() and malloc(). Instead, it uses (mmap()) to create memory. DATALIMIT cannot be enforced on Linux applications that call sbrk() and malloc().
-E "pre_exec_command [arguments ...]"

Runs the specified pre-execution command on the execution host before actually running the job. For a parallel job, the pre-execution command runs on the first host selected for the parallel job. If you want the pre-execution command to run on a specific first execution host, specify one or more first execution host candidates at the job level using -m, at the queue level with PRE_EXEC in lsb.queues, or at the application level with PRE_EXEC in lsb.applications.

If the pre-execution command returns a zero (0) exit code, LSF runs the job on the selected host. Otherwise, the job and its associated pre-execution command goes back to PEND status and is rescheduled. LSF keeps trying to run pre-execution commands and pending jobs. After the pre-execution command runs successfully, LSF runs the job. You must ensure that the pre-execution command can run multiple times without causing side effects, such as reserving the same resource more than once.

The standard input and output for the pre-execution command are directed to the same files as the job. The pre-execution command runs under the same user ID, environment, home, and working directory as the job. If the pre-execution command is not in the user’s usual execution path (the $PATH variable), the full path name of the command must be specified.

Note: The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.
-Ep "post_exec_command [arguments ...]"

Runs the specified post-execution command on the execution host after the job finishes.

If both application-level (POST_EXEC in lsb.applications) and job-level post-execution commands are specified, job level post-execution overrides application-level post-execution commands. Queue-level post-execution commands (POST_EXEC in lsb.queues) run after application-level post-execution and job-level post-execution commands.

Note: The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.
-e error_file

Specify a file path. Appends the standard error output of the job to the specified file.

If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, the standard error output of a job is written to the file you specify as the job runs. If LSB_STDOUT_DIRECT is not set, it is written to a temporary file and copied to the specified file after the job finishes. LSB_STDOUT_DIRECT is not supported on Windows.

If you use the special character %J in the name of the error file, then %J is replaced by the job ID of the job. If you use the special character %I in the name of the error file, then %I is replaced by the index of the job in the array if the job is a member of an array. Otherwise, %I is replaced by 0 (zero).

If the current working directory is not accessible on the execution host after the job starts, LSF writes the standard error output file to /tmp/.

If the specified error_file path is not accessible, the output will not be stored.

Note: The file path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
-eo error_file

Specify a file path. Overwrites the standard error output of the job to the specified file.

If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, the standard error output of a job is written to the file you specify as the job runs, which occurs every time the job is submitted with the overwrite option, even if it is requeued manually or by the system. If LSB_STDOUT_DIRECT is not set, it is written to a temporary file and copied to the specified file after the job finishes. LSB_STDOUT_DIRECT is not supported on Windows.

If you use the special character %J in the name of the error file, then %J is replaced by the job ID of the job. If you use the special character %I in the name of the error file, then %I is replaced by the index of the job in the array if the job is a member of an array. Otherwise, %I is replaced by 0 (zero).

If the current working directory is not accessible on the execution host after the job starts, LSF writes the standard error output file to /tmp/.

If the specified error_file path is not accessible, the output will not be stored.

Note: The file path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).
-ext[sched] "external_scheduler_options"

Application-specific external scheduling options for the job.

To enable jobs to accept external scheduler options, set LSF_ENABLE_EXTSCHEDULER=y in lsf.conf.

You can abbreviate the -extsched option to -ext.

You can specify only one type of external scheduler option in a single -extsched string.

For example, Linux hosts and AlphaServer SC hosts running RMS can exist in the same cluster, but they accept different external scheduler options. Use external scheduler options to define job requirements for either Linux OR RMS, but not both. Your job runs either on Linux hosts or RMS. If external scheduler options are not defined, the job may run on an Linux host but it does not run on an RMS host.

The options set by -extsched can be combined with the queue-level MANDATORY_EXTSCHED or DEFAULT_EXTSCHED parameters. If -extsched and MANDATORY_EXTSCHED set the same option, the MANDATORY_EXTSCHED setting is used. If -extsched and DEFAULT_EXTSCHED set the same options, the -extsched setting is used.

Use DEFAULT_EXTSCHED in lsb.queues to set default external scheduler options for a queue.

To make certain external scheduler options mandatory for all jobs submitted to a queue, specify MANDATORY_EXTSCHED in lsb.queues with the external scheduler options you need or your jobs.

-F file_limit

Sets a per-process (soft) file size limit for each of the processes that belong to the job (see getrlimit(2)). The limit is specified in KB.

If a job process attempts to write to a file that exceeds the file size limit, then that process is sent a SIGXFSZ signal. The SIGXFSZ signal normally terminates the process.

-f "local_file operator [remote_file]" ...

Copies a file between the local (submission) host and the remote (execution) host. Specify absolute or relative paths, including the file names. You should specify the remote file as a file name with no path when running in non-shared systems.

If the remote file is not specified, it defaults to the local file, which must be given. Use multiple -f options to specify multiple files.

Note: The file path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.
operator

An operator that specifies whether the file is copied to the remote host, or whether it is copied back from the remote host. The operator must be surrounded by white space.

The following describes the operators:

> Copies the local file to the remote file before the job starts. Overwrites the remote file if it exists.

< Copies the remote file to the local file after the job completes. Overwrites the local file if it exists.

<< Appends the remote file to the local file after the job completes. The local file must exist.

>< Copies the local file to the remote file before the job starts. Overwrites the remote file if it exists. Then copies the remote file to the local file after the job completes. Overwrites the local file.

<> Copies the local file to the remote file before the job starts. Overwrites the remote file if it exists. Then copies the remote file to the local file after the job completes. Overwrites the local file.

Start of changeAll of the above involve copying the files to the output directory defined in DEFAULT_JOB_OUTDIR or with bsub -outdir instead of the submission directory, as long as the path is relative. The output directory is created at the start of the job, and also applies to jobs that are checkpointed, migrated, requeued or rerun. End of change

If you use the -i input_file option, then you do not have to use the -f option to copy the specified input file to the execution host. LSF does this for you, and removes the input file from the execution host after the job completes.

If you use the -o out_file,-e err_file, -oo out_file, or the -eo err_file option, and you want the specified file to be copied back to the submission host when the job completes, then you must use the -f option.

If the submission and execution hosts have different directory structures, you must make sure that the directory where the remote file and local file are placed exists.

If the local and remote hosts have different file name spaces, you must always specify relative path names. If the local and remote hosts do not share the same file system, you must make sure that the directory containing the remote file exists. It is recommended that only the file name be given for the remote file when running in heterogeneous file systems. This places the file in the job’s current working directory. If the file is shared between the submission and execution hosts, then no file copy is performed.

LSF uses lsrcp to transfer files (see lsrcp(1) command). lsrcp contacts RES on the remote host to perform the file transfer. If RES is not available, rcp is used (see rcp(1)). The user must make sure that the rcp binary is in the user’s $PATH on the execution host.

Jobs that are submitted from LSF client hosts should specify the -f option only if rcp is allowed. Similarly, rcp must be allowed if account mapping is used.

-G user_group

Only useful with fairshare scheduling.

Associates the job with the specified group. Specify any group that you belong to. You must be a direct member of the specified user group.

If ENFORCE_ONE_UG_LIMITS is enabled in lsb.params, using the -G option enforces any limits placed on the specified user group only even if the user or user group belongs to more than one group.

If ENFORCE_ONE_UG_LIMITS is disabled in lsb.params (default), using the -G option enforces the strictest limit that is set on any of the groups that the user or user group belongs to.

-g job_group_name
Submits jobs in the job group specified by job_group_name. The job group does not have to exist before submitting the job. For example:
bsub -g /risk_group/portfolio1/current myjob
Job <105> is submitted to default queue.

Submits myjob to the job group /risk_group/portfolio1/current.

If group /risk_group/portfolio1/current exists, job 105 is attached to the job group.

Job group names can be up to 512 characters long.

If group /risk_group/portfolio1/current does not exist, LSF checks its parent recursively, and if no groups in the hierarchy exist, all three job groups are created with the specified hierarchy and the job is attached to group.

You can use -g with -sla. All jobs in a job group attached to a service class are scheduled as SLA jobs. It is not possible to have some jobs in a job group not part of the service class. Multiple job groups can be created under the same SLA. You can submit additional jobs to the job group without specifying the service class name again. You cannot use job groups with resource-based SLAs that have guarantee goals.

For example, the following attaches the job to the service class named opera, and the group /risk_group/portfolio1/current:
bsub -sla opera -g /risk_group/portfolio1/current myjob
To submit another job to the same job group, you can omit the SLA name:
bsub -g /risk_group/portfolio1/current myjob2
-i input_file | -is input_file

Gets the standard input for the job from specified file. Specify an absolute or relative path. The input file can be any type of file, though it is typically a shell script text file.

Unless you use -is, you can use the special characters %J and %I in the name of the input file. %J is replaced by the job ID. %I is replaced by the index of the job in the array, if the job is a member of an array, otherwise by 0 (zero). The special characters %J and %I are not valid with the -is option.

Note: The file path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

If the file exists on the execution host, LSF uses it. Otherwise, LSF attempts to copy the file from the submission host to the execution host. For the file copy to be successful, you must allow remote copy (rcp) access, or you must submit the job from a server host where RES is running. The file is copied from the submission host to a temporary file in the directory specified by the JOB_SPOOL_DIR parameter in lsb.params, or your $HOME/.lsbatch directory on the execution host. LSF removes this file when the job completes.

By default, the input file is spooled to LSB_SHAREDIR/cluster_name/lsf_indir. If the lsf_indir directory does not exist, LSF creates it before spooling the file. LSF removes the spooled file when the job completes. Use the -is option if you need to modify or remove the input file before the job completes. Removing or modifying the original input file does not affect the submitted job.

If JOB_SPOOL_DIR is specified, the -is option spools the input file to the specified directory and uses the spooled file as the input file for the job.

JOB_SPOOL_DIR can be any valid path up to a maximum length up to 4094 characters on UNIX and Linux or up to 255 characters for Windows.

JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host. If the specified directory is not accessible or does not exist, bsub -is cannot write to the default directory LSB_SHAREDIR/cluster_name/lsf_indir and the job fails.

-J job_name | -J "job_name[index_list]%job_slot_limit"

Assigns the specified name to the job, and, for job arrays, specifies the indices of the job array and optionally the maximum number of jobs that can run at any given time.

The job name does not need to be unique.

Job names can contain up to 4094 characters.

To specify a job array, enclose the index list in square brackets, as shown, and enclose the entire job array specification in quotation marks, as shown. The index list is a comma-separated list whose elements have the syntax [start-end[:step]] where start, end and step are positive integers. If the step is omitted, a step of one is assumed. By default, the job array index starts at one.

By default, the maximum number of jobs in a job array is 1000, which means the maximum size of a job array (that is, the maximum job array index) can never exceed 1000 jobs.

To change the maximum job array value, set MAX_JOB_ARRAY_SIZE in lsb.params to any positive integer between 1 and 2147483646. The maximum number of jobs in a job array cannot exceed the value set by MAX_JOB_ARRAY_SIZE.

You may also use a positive integer to specify the system-wide job slot limit (the maximum number of jobs that can run at any given time) for this job array.

All jobs in the array share the same job ID and parameters. Each element of the array is distinguished by its array index.

After a job is submitted, you use the job name to identify the job. Specify "job_ID[index]" to work with elements of a particular array. Specify "job_name[index]" to work with elements of all arrays with the same name. Since job names are not unique, multiple job arrays may have the same name with a different or same set of indices.

-Jd "job_description"

Assigns the specified description to the job; for job arrays, specifies the same job description for all elements in the job array.

The job description does not need to be unique.

Job descriptions can contain up to 4094 characters.

After a job is submitted, you can use bmod -Jd to change the job description for any specific job array element, if required.

-jsdl file_name | -jsdl_strict file_name

Submits a job using a JSDL file to specify job submission options.

LSF provides an extension to the JSDL specification so that you can submit jobs using LSF features not defined in the JSDL standard schema. The JSDL schema (jsdl.xsd), the POSIX extension (jsdl-posix.xsd), and the LSF extension (jsdl-lsf.xsd) are located in the LSF_LIBDIR directory.

  • To submit a job that uses the LSF extension, use the -jsdl option.

  • To submit a job that uses only standard JSDL elements and POSIX extensions, use the -jsdl_strict option. You can use the -jsdl_strict option to verify that your file contains only valid JSDL elements and POSIX extensions. Error messages indicate invalid elements, including:
    • Elements that are not part of the JSDL specification

    • Valid JSDL elements that are not supported in this version of LSF

    • Extension elements that are not part of the JSDL standard and POSIX extension schemas

Note: For a more information about submitting jobs using JSDL, including a detailed mapping of JSDL elements to LSF submission options, and a complete list of supported and unsupported elements, see Administering IBM Platform LSF.
If you specify duplicate or conflicting job submission parameters, LSF resolves the conflict by applying the following rules:
  1. The parameters specified in the command line override all other parameters.

  2. A job script or user input for an interactive job overrides parameters specified in the JSDL file.

-k "checkpoint_dir [init=initial_checkpoint_period] [checkpoint_period] [method=method_name]"

Makes a job checkpointable and specifies the checkpoint directory. Specify a relative or absolute path name. The quotes (") are required is you specify a checkpoint period, initial checkpoint period, or custom checkpoint and restart method name.

The job ID and job file name are concatenated to the checkpoint dir when creating a checkpoint file.

Note: The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

When a job is checkpointed, the checkpoint information is stored in checkpoint_dir/job_ID/file_name. Multiple jobs can checkpoint into the same directory. The system can create multiple files.

The checkpoint directory is used for restarting the job (see brestart(1)). The checkpoint directory can be any valid path.

Optionally, specifies a checkpoint period in minutes. Specify a positive integer. The running job is checkpointed automatically every checkpoint period. The checkpoint period can be changed using bchkpnt. Because checkpointing is a heavyweight operation, you should choose a checkpoint period greater than half an hour.

Optionally, specifies an initial checkpoint period in minutes. Specify a positive integer. The first checkpoint does not happen until the initial period has elapsed. After the first checkpoint, the job checkpoint frequency is controlled by the normal job checkpoint interval.

Optionally, specifies a custom checkpoint and restart method to use with the job. Use method=default to indicate to use the default LSF checkpoint and restart programs for the job, echkpnt.default and erestart.default.

The echkpnt.method_name and erestart.method_name programs must be in LSF_SERVERDIR or in the directory specified by LSB_ECHKPNT_METHOD_DIR (environment variable or set in lsf.conf).

If a custom checkpoint and restart method is already specified with LSB_ECHKPNT_METHOD (environment variable or in lsf.conf), the method you specify with bsub -k overrides this.

Process checkpointing is not available on all host types, and may require linking programs with a special libraries (see libckpt.a(3)). LSF invokes echkpnt (see echkpnt(8)) found in LSF_SERVERDIR to checkpoint the job. You can override the default echkpnt for the job by defining as environment variables or in lsf.conf LSB_ECHKPNT_METHOD and LSB_ECHKPNT_METHOD_DIR to point to your own echkpnt. This allows you to use other checkpointing facilities, including application-level checkpointing.

The checkpoint method directory should be accessible by all users who need to run the custom echkpnt and erestart programs.

Only running members of a chunk job can be checkpointed.

-L login_shell

Initializes the execution environment using the specified login shell. The specified login shell must be an absolute path. This is not necessarily the shell under which the job is executed.

Login shell is not supported on Windows.

On UNIX and Linux, the file path of the login shell can contain up to 58 characters.

-Lp ls_project_name

Assigns the job to the specified License Scheduler project.

-M mem_limit

Sets a per-process (soft) memory limit for all the processes that belong to this job (see getrlimit(2)).

By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (MB, GB, TB, PB, or EB).

If LSB_MEMLIMIT_ENFORCE or LSB_JOB_MEMLIMIT are set to y in lsf.conf, LSF kills the job when it exceeds the memory limit. Otherwise, LSF passes the memory limit to the operating system. UNIX operating systems that support RUSAGE_RSS for setrlimit() can apply the memory limit to each process.

The following operating systems do not support the memory limit at the OS level:

- Windows

- Sun Solaris 2.x

-m "host_name[@cluster_name][[!] | +[pref_level]] | host_group[[!] |+[pref_level]] | compute_unit[[!] |+[pref_level]] ..."

Runs the job on one of the specified hosts, or host groups, or within the specified compute units.

By default, if multiple hosts are candidates, runs the job on the least-loaded host.

When a compute unit requirement is specified along with a host or host group preference, the host or host group preference only affects the host order within the compute unit. In addition the job will be rejected unless:

  • A host in the list belongs to a compute unit, and

  • A host in the first execution list belongs to a compute unit.

When used with a compound resource requirement, the first host allocated must satisfy the simple resource requirement string appearing first in the compound resource requirement.

To change the order of preference, put a plus (+) after the names of hosts or host groups that you would prefer to use, optionally followed by a preference level. For preference level, specify a positive integer, with higher numbers indicating greater preferences for those hosts. For example, -m "hostA groupB+2 hostC+1" indicates that groupB is the most preferred and hostA is the least preferred.

The keyword others can be specified with or without a preference level to refer to other hosts not otherwise listed. The keyword others must be specified with at least one host name or host group, it cannot be specified by itself. For example, -m "hostA+ others" means that hostA is preferred over all other hosts.

If you also use -q, the specified queue must be configured to include at least a partial list of the hosts in your host list. Otherwise, the job is not submitted. To find out what hosts are configured for the queue, use bqueues -l.

If the host group contains the keyword all, LSF dispatches the job to any available host, even if the host is not defined for the specified queue.

To display configured host groups and compute units, use bmgroup.

For the MultiCluster job forwarding model, you cannot specify a remote host by name.

For parallel jobs, specify first execution host candidates when you want to ensure that a host has the required resources or runtime environment to handle processes that run on the first execution host.

To specify one or more hosts or host groups as first execution host candidates, add the (!) symbol after the host name, as shown in the following example:

bsub -n 2 -m "host1 host2! hostgroupA! host3 host4" my_parallel_job

LSF runs my_parallel_job according to the following steps:
  1. LSF selects either host2 or a host defined in hostgroupA as the first execution host for the parallel job.
    Note: First execution host candidates specified at the job-level (command line) override candidates defined at the queue level (in lsb.queues).
  2. If any of the first execution host candidates have enough processors to run the job, the entire job runs on the first execution host, and not on any other hosts.

    In the example, if host2 or a member of hostgroupA has two or more processors, the entire job runs on the first execution host.

  3. If the first execution host does not have enough processors to run the entire job, LSF selects additional hosts that are not defined as first execution host candidates.

    Follow these guidelines when you specify first execution host candidates:
    • If you specify a host group, you must first define the host group in the file lsb.hosts.

    • Do not specify a dynamic host group as a first execution host.

    • Do not specify all, allremote, or others, or a host partition as a first execution host.

    • Do not specify a preference (+) for a host identified by (!) as a first execution host candidate.

    • For each parallel job, specify enough regular hosts to satisfy the processor requirement for the job. Once LSF selects a first execution host for the current job, the other first execution host candidates become unavailable to the current job, but remain available to other jobs as either regular or first execution hosts.

    • You cannot specify first execution host candidates when you use the brun command.

In a MultiCluster environment, insert the (!) symbol after the cluster name, as shown in the following example:

bsub -n 2 -m "host2@cluster2! host3@cluster2" my_parallel_job

When specifying compute units, the job runs within the listed compute units. Used in conjunction with a mandatory first execution host, the compute unit containing the first execution host is given preference.

In the following example one host from host group hg appears first, followed by other hosts within the same compute unit. Remaining hosts from other compute units appear grouped by compute, and in the same order as configured in the ComputeUnit section of lsb.hosts.

bsub -n 64 -m "hg! cu1 cu2 cu3 cu4" -R "cu[pref=config]" my_job

-mig migration_threshold

Specifies the migration threshold for checkpointable or rerunnable jobs in minutes. Enables automatic job migration and specifies the migration threshold, in minutes. A value of 0 (zero) specifies that a suspended job should be migrated immediately.

Command-level job migration threshold overrides application profile and queue-level settings.

Where a host migration threshold is also specified, and is lower than the job value, the host value is used.

-n min_proc[,max_proc]

Submits a parallel job and specifies the number of processors required to run the job (some of the processors may be on the same multiprocessor host).

You can specify a minimum and maximum number of processors to use. For example, this job requests a minimum of 4, but can use up to 6 slots/processors:

bsub -n 4,6 a.out

The job can start if at least the minimum number of processors is available. If you do not specify a maximum, the number you specify represents the exact number of processors to use.

If PARALLEL_SCHED_BY_SLOT=Y in lsb.params, this option specifies the number of slots required to run the job, not the number of processors.

When used with the -R option and a compound resource requirement, the number of slots in the compound resource requirement must be compatible with the minimum and maximum specified.

Jobs that request fewer slots than the minimum PROCLIMIT defined for the queue or application profile to which the job is submitted, or more slots than the maximum PROCLIMIT are rejected. If the job requests minimum and maximum job slots, the maximum slots requested cannot be less than the minimum PROCLIMIT, and the minimum slots requested cannot be more than the maximum PROCLIMIT.

For example, if the queue defines PROCLIMIT=4 8:
  • bsub -n 6 is accepted because it requests slots within the range of PROCLIMIT

  • bsub -n 9 is rejected because it requests more slots than the PROCLIMIT allows

  • bsub -n 1 is rejected because it requests fewer slots than the PROCLIMIT allows

  • bsub -n 6,10 is accepted because the minimum value 6 is within the range of the PROCLIMIT setting

  • bsub -n 1,6 is accepted because the maximum value 6 is within the range of the PROCLIMIT setting

  • bsub -n 10,16 is rejected because its range is outside the range of PROCLIMIT

  • bsub -n 1,3 is rejected because its range is outside the range of PROCLIMIT

See the PROCLIMIT parameter in lsb.queues(5) and lsb.applications(5) for more information.

In a MultiCluster environment, if a queue exports jobs to remote clusters (see the SNDJOBS_TO parameter in lsb.queues), then the process limit is not imposed on jobs submitted to this queue.

Once at the required number of processors is available, the job is dispatched to the first host selected. The list of selected host names for the job are specified in the environment variables LSB_HOSTS and LSB_MCPU_HOSTS. The job itself is expected to start parallel components on these hosts and establish communication among them, optionally using RES.

Specify first execution host candidates using the -m option when you want to ensure that a host has the required resources or runtime environment to handle processes that run on the first execution host.

If you specify one or more first execution host candidates, LSF looks for a first execution host that satisfies the resource requirements. If the first execution host does not have enough processors or job slots to run the entire job, LSF looks for additional hosts.

Start of change-network "network_res_req"End of change
Start of change

For LSF IBM Parallel Environment (PE) integration. Specifies the network resource requirements to enable network-aware scheduling for PE jobs.

If any network resource requirement is specified in the job, queue, or application profile, the job is treated as a PE job. PE jobs can only run on hosts where IBM PE pnsd daemon is running.

The network resource requirement string network_res_req has the same syntax as the NETWORK_REQ parameter defined in lsb.applications or lsb.queues.

network_res_req has the following syntax:

[type=sn_all | sn_single] [:protocol=protocol_name[(protocol_number)][,protocol_name[(protocol_number)]] [:mode=US | IP] [:usage=shared | dedicated] [:instance=positive_integer]

LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for the LSF to recognize the -network option. If LSF_PE_NETWORK_NUM is not defined or is set to 0, the job submission is rejected with a warning message.

The -network option overrides the value of NETWORK_REQ defined in lsb.applications or lsb.queues.

The following network resource requirement options are supported:
type=sn_all | sn_single
Specifies the adapter device type to use for message passing: either sn_all or sn_single.
sn_single

When used for switch adapters, specifies that all windows are on a single network

sn_all

Specifies that one or more windows are on each network, and that striped communication should be used over all available switch networks. The networks specified must be accessible by all hosts selected to run the PE job. See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about submitting jobs that use striping.

If mode is IP and type is specified as sn_all or sn_single, the job will only run on IB adapters (IPoIB). If mode is IP and type is not specified, the job will only run on Ethernet adapters (IPoEth). For IPoEth jobs, LSF ensures the job is running on hosts where pnsd is installed and running. For IPoIB jobs, LSF ensures the job the job is running on hosts where pnsd is installed and running, and that InfiniBand networks are up. Because IP jobs do not consume network windows, LSF does not check if all network windows are used up or the network is already occupied by a dedicated PE job.

Equivalent to the PE MP_EUIDEVICE environment variable and -euidevice PE flag See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information. Only sn_all or sn_single are supported by LSF. The other types supported by PE are not supported for LSF jobs.

protocol=protocol_name[(protocol_number)]
Network communication protocol for the PE job, indicating which message passing API is being used by the application. The following protocols are supported by LSF:
mpi

The application makes only MPI calls. This value applies to any MPI job regardless of the library that it was compiled with (PE MPI, MPICH2).

pami

The application makes only PAMI calls.

lapi

The application makes only LAPI calls.

shmem

The application makes only OpenSHMEM calls.

user_defined_parallel_api

The application makes only calls from a parallel API that you define. For example: protocol=myAPI or protocol=charm.

The default value is mpi.

LSF also supports an optional protocol_number (for example, mpi(2), which specifies the number of contexts (endpoints) per parallel API instance. The number must be a power of 2, but no greater than 128 (1, 2, 4, 8, 16, 32, 64, 128). LSF will pass the communication protocols to PE without any change. LSF will reserve network windows for each protocol.

When you specify multiple parallel API protocols, you cannot make calls to both LAPI and PAMI (lapi, pami) or LAPI and OpenSHMEM (lapi, shmem) in the same application. Protocols can be specified in any order.

See the MP_MSG_API and MP_ENDPOINTS environment variables and the -msg_api and -endpoints PE flags in the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about the communication protocols that are supported by IBM PE.

mode=US | IP

The network communication system mode used by the communication specified communication protocol: US (User Space) or IP (Internet Protocol). The default value is US. A US job can only run with adapters that support user space communications, such as the IB adapter. IP jobs can run with either Ethernet adapters or IB adapters. When IP mode is specified, the instance number cannot be specified, and network usage must be unspecified or shared.

Each instance on the US mode requested by a task running on switch adapters requires and adapter window. For example, if a task requests both the MPI and LAPI protocols such that both protocol instances require US mode, two adapter windows will be used.

usage=dedicated | shared

Specifies whether the adapter can be shared with tasks of other job steps: dedicated or shared. Multiple tasks of the same job can share one network even if usage is dedicated.

instance=positive_integer

The number of parallel communication paths (windows) per task made available to the protocol on each network. The number actually used depends on the implementation of the protocol subsystem.

The default value is 1.

If the specified value is greater than MAX_PROTOCOL_INSTANCES in lsb.params or lsb.queues, LSF rejects the job.

The following IBM LoadLeveller job command file options are not supported in LSF:
  • collective_groups

  • imm_send_buffers

  • rcxtblocks

See Administering IBM Platform LSF for more information about network-aware scheduling and running and managing workload through IBM Parallel Edition.

End of change
-o output_file

Specify a file path. Appends the standard output of the job to the specified file. Sends the output by mail if the file does not exist, or the system has trouble writing to it.

If only a file name is specified, LSF writes the output file to the current working directory. If the current working directory is not accessible on the execution host after the job starts, LSF writes the standard output file to /tmp/.

If the specified output_file path is not accessible, the output will not be stored.

If you use the special character %J in the name of the output file, then %J is replaced by the job ID of the job. If you use the special character %I in the name of the output file, then %I is replaced by the index of the job in the array, if the job is a member of an array. Otherwise, %I is replaced by 0 (zero).

Note: The file path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, the standard output of a job is written to the file you specify as the job runs. If LSB_STDOUT_DIRECT is not set, it is written to a temporary file and copied to the specified file after the job finishes. LSB_STDOUT_DIRECT is not supported on Windows.

If you use -o without -e or -eo, the standard error of the job is stored in the output file.

If you use -o without -N, the job report is stored in the output file as the file header.

If you use both -o and -N, the output is stored in the output file and the job report is sent by mail. The job report itself does not contain the output, but the report advises you where to find your output.

-oo output_file

Specify a file path. Overwrites the standard output of the job to the specified file if it exists, or sends the output to a new file if it does not exist. Sends the output by mail if the system has trouble writing to the file.

If only a file name is specified, LSF writes the output file to the current working directory. If the current working directory is not accessible on the execution host after the job starts, LSF writes the standard output file to /tmp/.

If the specified output_file path is not accessible, the output will not be stored.

If you use the special character %J in the name of the output file, then %J is replaced by the job ID of the job. If you use the special character %I in the name of the output file, then %I is replaced by the index of the job in the array, if the job is a member of an array. Otherwise, %I is replaced by 0 (zero).

Note: The file path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, the standard output of a job overwrites the output file you specify as the job runs, which occurs every time the job is submitted with the overwrite option, even if it is requeued manually or by the system. If LSB_STDOUT_DIRECT is not set, the output is written to a temporary file that overwrites the specified file after the job finishes. LSB_STDOUT_DIRECT is not supported on Windows.

If you use -oo without -e or -eo, the standard error of the job is stored in the output file.

If you use -oo without -N, the job report is stored in the output file as the file header.

If you use both -oo and -N, the output is stored in the output file and the job report is sent by mail. The job report itself does not contain the output, but the report advises you where to find your output.

The default network resource requirement string is:

-network "protocol=mpi: mode=US: usage=shared: instance=1"

-outdir output_directory

Creates the job output directory.

The -outdir option supports the following dynamic patterns for the output directory:

  • %J - job ID

  • %JG - job group (if not specified, it will be ignored)

  • %I - index (default value is 0)

  • %EJ - execution job ID

  • %EI - execution index

  • %P - project name

  • %U - user name

  • %G - User group new forthe job output directory

For example, the system creates the submission_dir/user1/jobid_0/ output directory for the job with the following command:

bsub -outdir "%U/%J_%I" myprog

If the cluster wide output directory was defined but the outdir option was not set, for example, DEFAULT_JOB_OUTDIR=/scratch/joboutdir/%U/%J_%I in lsb.params, the system creates the /scratch/joboutdir/user1/jobid_0/ output directory for the job with the following command:

bsub myprog

If the submission directory is /scratch/joboutdir/ on the shared file system and you want the system to create /scratch/joboutdir/user1/jobid_0/ for the job output directory, then run the following command:

bsub -outdir "%U/%J_%I" myjob

Since the command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name, the outdir option does not have its own length limitation. The outdir option supports mixed UNIX and Windows paths when LSB_MIXED_PATH_ENABLE=Y/y. LSB_MIXED_PATH_DELIMITER controls the delimiter.

The following assumptions and dependencies apply to the outdir command option:

  • The execution host has access to the submission host.

  • The submission host should be running RES or it will use EGO_RSH to run a directory creation command. If this parameter is not defined, rsh will be used. RES should be running on the Windows submission host in order to create the output directory.

-P project_name

Assigns the job to the specified project. The project does not have to exist before submitting the job.

Project names can be up to 59 characters long.

On IRIX 6, you must be a member of the project as listed in /etc/project(4). If you are a member of the project, then /etc/projid(4) maps the project name to a numeric project ID. Before the submitted job executes, a new array session (newarraysess(2)) is created and the project ID is assigned to it using setprid(2).

-p process_limit

Sets the limit of the number of processes to process_limit for the whole job. The default is no limit. Exceeding the limit causes the job to terminate.

-pack job_submission_file

Submits job packs instead of an individual job. Specify the full path to the job submission file. The job packs feature must be enabled.

In the command line, this option is not compatible with any other bsub options.

In the job submission file, define one job request per line, using normal bsub syntax but omitting the word "bsub". For requests in the file, the following bsub options are not supported:

-I -Ip -Is -IS -ISp -ISs -IX -XF -K -jsdl -h -V -pack

-Q "[exit_code ...] [EXCLUDE(exit_code ...)]"

Specify automatic job requeue exit values. Use spaces to separate multiple exit codes. The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified number or numbers from the list.

exit_code has the following form:
"[all] [~number ...] | [number ...]"

Job level exit values override application-level and queue-level values.

Jobs running with the specified exit code share the same application and queue with other jobs.

Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue. Exclusive job requeue does not work for parallel jobs.

If mbatchd is restarted, it does not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.

-q "queue_name ..."

Submits the job to one of the specified queues. Quotes are optional for a single queue. The specified queues must be defined for the local cluster. For a list of available queues in your local cluster, use bqueues.

When a list of queue names is specified, LSF attempts to submit the job to the first queue listed. If that queue cannot be used because of the job’s resource limits or other restrictions, such as the requested hosts, your accessibility to a queue, queue status (closed or open), then the next queue listed is considered. The order in which the queues are considered is the same order in which these queues are listed.

-R "res_req" [-R "res_req" ...]

Runs the job on a host that meets the specified resource requirements. A resource requirement string describes the resources a job needs. LSF uses resource requirements to select hosts for job execution. Resource requirement strings can be simple (applying to the entire job), compound (applying to the specified number of slots), or alternative.

Simple resource requirement strings are divided into the following sections. Each section has a different syntax.
  • A selection section (select). The selection section specifies the criteria for selecting execution hosts from the system.

  • An ordering section (order). The ordering section indicates how the hosts that meet the selection criteria should be sorted.

  • A resource usage section (rusage). The resource usage section specifies the expected resource consumption of the task.

  • A job spanning section (span). The job spanning section indicates if a parallel job should span across multiple hosts.

  • A same resource section (same). The same section indicates that all processes of a parallel job must run on the same type of host.

  • A compute unit resource section (cu). The compute unit section specifies topological requirements for spreading a job over the cluster.

  • Start of change

    A CPU and memory affinity resource section (affinity). The affinity section specifies CPU and memory binding requirements for tasks of a job.

    End of change
The resource requirement string sections have the following syntax:
select[selection_string] order[order_string] rusage[
usage_string [, usage_string][|| usage_string] ...]
span[span_string] same[same_string] cu[cu_string]Start of change] affinity[affinity_string]End of change

The square brackets must be typed as shown for each section. A blank space must separate each resource requirement section.

You can omit the select keyword and the square brackets, but the selection string must be the first string in the resource requirement string. If you do not give a section name, the first resource requirement string is treated as a selection string (select[selection_string]).

For example:

bsub -R "type==any order[ut] same[model] rusage[mem=1]" myjob

is equivalent to the following:

bsub -R "select[type==any] order[ut] same[model] rusage[mem=1]" myjob

Compound resource requirement strings are made up of one or more simple resource requirement strings as follows:

num1*{simple_string1} + num2*{simple_string2} + ...

where numx is the number of slots affected and simple_stringx is a simple resource requirement string.

For jobs without the number of total slots specified using bsub -n, the final numx can be omitted. The final resource requirement is then applied to the zero or more slots not yet accounted for as follows:

  • (final res_req number of slots) = (total number of job slots)-(num1+num2+ ...)

For jobs with the total number of slots specified using bsub -n num_slots, the total number of slots must match the number of slots in the resource requirement as follows:

  • num_slots=(num1+num2+num3+ ...)

For jobs with the minimum and maximum number of slots specified using bsub -n min, max , the number of slots in the compound resource requirement must be compatible with the minimum and maximum specified.

You can specify the number of slots or processors through the resource requirement specification. For example, you can specify a job that requests 10 slots or processors: 1 on a host that has more than 5000 MB of memory, and an additional 9 on hosts that have more than 1000 MB of memory:

bsub -R "1*{mem>5000} + 9*{mem>1000}" a.out

Compound resource requirements do not support use of the || operator within the component rusage simple resource requirements, multiple -R options, or use of the cu section.

Each simple resource requirement string must be contained in curly brackets. Each section has a different syntax.

The size of the resource requirement string cannot exceed 512 characters. If you need to include a hyphen (-) or other non-alphabet characters within the string, enclose the text in single quotation marks, for example, bsub -R "select[hname!='host06-x12']".

If LSF_STRICT_RESREQ=Y in lsf.conf, the selection string must conform to the stricter resource requirement string syntax described in Administering IBM Platform LSF. The strict resource requirement syntax only applies to the select section. It does not apply to the other resource requirement sections (order, rusage, same, span, or cu). When LSF_STRICT_RESREQ=Y in lsf.conf, LSF rejects resource requirement strings where an rusage section contains a non-consumable resource.

If RESRSV_LIMIT is set in lsb.queues, the merged application-level and job-level rusage consumable resource requirements must satisfy any limits set by RESRSV_LIMIT, or the job will be rejected.

Any resource for run queue length, such as r15s, r1m or r15m, specified in the resource requirements refers to the normalized run queue length.

By default, memory (mem) and swap (swp) limits in select[] and rusage[] sections are specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for these limits (MB, GB, TB, PB, or EB).

For example, to submit a job that runs on Solaris 10 or Solaris 11:
bsub -R "sol10 || sol11" myjob
The following command runs the job called myjob on an HP-UX host that is lightly loaded (CPU utilization) and has at least 15 MB of swap memory available.
bsub -R "swp > 15 && hpux order[ut]" myjob

bsub also accepts multiple -R options for the order, same, rusage (not multi-phase), and select sections. You can specify multiple strings instead of using the && operator:

bsub -R "select[swp > 15]" -R "select[hpux] order[r15m]" -R rusage[mem=100]" -R "order[ut]" -R "same[type]" -R "rusage[tmp=50:duration=60]" -R "same[model]" myjob

LSF merges the multiple -R options into one string and selects a host that meets all of the resource requirements. The number of -R option sections is unlimited, up to a maximum of 512 characters for the entire string.
Remember: Use multiple -R options only with the order, same, rusage (not multi-phase), and select sections of simple resource requirement strings and with the bsub and bmod commands.

When application-level, and queue-level cu sections are also defined, the job-level cu section takes precedence and overwrites both the application-level and queue-level requirement definitions.

EXCLUSIVE=CU[enclosure] in lsb.queues, with a compute unit type enclosure in lsf.params, and ComputeUnit section in lsb.hosts. Use the following command to submit a job that runs on 64 slots over 4 enclosures or less, and uses the enclosures exclusively:
bsub -n 64 -R "cu[excl:type=enclosure:maxcus=4]" myjob
A resource called bigmem is defined in lsf.shared as an exclusive resource for hostE in lsf.cluster.mycluster. Use the following command to submit a job that runs on hostE:
bsub -R "bigmem" myjob
or
bsub -R "defined(bigmem)" myjob
A static shared resource is configured for licenses for the Verilog application as a resource called verilog_lic. To submit a job that runs on a host when there is a license available:
bsub -R "select[defined(verilog_lic)] rusage[verilog_lic=1]" myjob
The following job requests 20 MB memory for the duration of the job, and 1 license for 2 minutes:
bsub -R "rusage[mem=20, license=1:duration=2]" myjob
The following job requests 20 MB of memory and 50 MB of swap space for 1 hour, and 1 license for 2 minutes:
bsub -R "rusage[mem=20:swp=50:duration=1h, license=1:duration=2]" myjob
The following job requests 20 MB of memory for the duration of the job, 50 MB of swap space for 1 hour, and 1 license for 2 minutes.
bsub -R "rusage[mem=20,swp=50:duration=1h, license=1:duration=2]" myjob
The following job requests 50 MB of swap space, linearly decreasing the amount reserved over a duration of 2 hours, and requests 1 license for 2 minutes:
bsub -R "rusage[swp=50:duration=2h:decay=1, license=1:duration=2]" myjob
The following job requests two resources with same duration but different decay:
bsub -R "rusage[mem=20:duration=30:decay=1, lic=1:duration=30]" myjob
The following job uses a multi-phase rusage string to request 50 MB of memory for 10 minutes, followed by 10 MB of memory for the duration of the job:
bsub -R "rusage[mem=(50 10):duration=(10):decay=(0)]" myjob

You are running an application version 1.5 as a resource called app_lic_v15 and the same application version 2.0.1 as a resource called app_lic_v201. The license key for version 2.0.1 is backward compatible with version 1.5, but the license key for version 1.5 does not work with 2.0.1.

Job-level resource requirement specifications that use the || operator take precedence over any queue-level resource requirement specifications.

  • If you can only run your job using one version of the application, submit the job without specifying an alternative resource. To submit a job that only uses app_lic_v201:
    bsub -R "rusage[app_lic_v201=1]" myjob
  • If you can run your job using either version of the application, try to reserve version 2.0.1 of the application. If it is not available, you can use version 1.5. To submit a job that tries app_lic_v201 before trying app_lic_v15:
    bsub -R "rusage[app_lic_v201=1||app_lic_v15=1]" myjob
  • If different versions of an application require different system resources, you can specify other resources in your rusage strings. To submit a job that uses 20 MB of memory for app_lic_v201 or 20 MB of memory and 50 MB of swap space for app_lic_v15:
    bsub -R "rusage[mem=20:app_lic_v15=1||mem=20:swp=50:app_lic_v201=1]" myjob

You can specify a threshold at which the consumed resource must be at before an allocation should be made. For example:

bsub -R "rusage[bwidth=1:threshold=5]" myjob

A job is submitted that consumes 1 unit of bandwidth (the resource bwidth), but the job should not be scheduled to run unless the bandwidth on the host is equal to or greater than 5.

In this example, bwidth is a decreasing resource and the threshold value is interpreted as a floor. If the resource in question was increasing, then the threshold value would be interpreted as a ceiling.

An alternative resource requirement consists of two or more individual resource requirements. Each separate resource requirement describes an alternative. If the resources cannot be found that satisfy the first resource requirement, then the next resource requirement is tried, and so on until the requirement is satisfied.

Alternative resource requirements are defined in terms of a compound resource requirement, or an atomic resource requirement:

bsub -R “{C1 | R1 } || {C2 | R2 }@D2 || ... || {Cn | Rn }@Dn” 

Where:

  • || separates one alternative resource from the next

  • C is a compound resource requirement

  • R a resource requirement which is the same as the current LSF resource requirement, except when there is:
    • No rusage OR (||).

    • No compute unit requirement cu[...]

  • D is a positive integer:
    • @D is optional: Do not evaluate the alternative resource requirement until 'D' minutes after submission time, and requeued jobs still use submission time instead of requeue time. There is no D1 because the first alternative is always evaluated immediately.

    • D2 <= D3 <= ... <= Dn

    • Not specifying @D means that the alternative will be evaluated without delay if the previous alternative could not be used to obtain a job’s allocation.

For example, you may have a sequential job, but you want alternative resource requirements (that is, if LSF fails to match your resource, try another one).

bsub -R "{ select[type==any] order[ut] same[model] rusage[mem=1] } ||  
{ select[type==any] order[ls] same[ostype] rusage[mem=5] }" myjob

You can also add a delay before trying the second alternative:

bsub -R "{ select[type==any] order[ut] same[model] rusage[mem=1] } || 
{ select[type==any] order[ls] same[ostype] rusage[mem=5] }@4" myjob

You can also have more than 2 alternatives:

bsub -R "{select[type==any] order[ut] same[model] rusage[mem=1] } || 
{ select[type==any] order[ut] same[model] rusage[mem=1]] } || 
{ select[type==any] order[ut] same[model] rusage[mem=1] }@3 || 
{ select[type==any] order[ut] same[model] rusage[mem=1] }@6" myjob

Some parallel jobs might need compound resource requirements. You can specify alternatives for parallel jobs the same way. That is, you can have several alternative sections each with brace brackets ({ }) around them separated by ||):

bsub -n 2 -R "{ 1*{ select[type==any] order[ut] same[model] rusage[mem=1]} + 1
*{ select[type==any] order[ut] same[model] rusage[mem=1] } } ||
{  1*{ select[type==any] order[ut] same[model] rusage[mem=1]} + 
1*{ select[type==any] order[ut] same[model]
rusage[mem=1] } }@6" myjob

Alternatively, the compound resource requirement section can have both slots requiring the same resource:

bsub -n 2 -R "{ 1*{ select[type==any] order[ut] same[model] rusage[mem=1]} 
+1*{ select[type==any] order[ut] same[model] rusage[mem=1] } } || 
{  2*{ select[type==any] order[ut] same[model] rusage[mem=1] } }@10" myjob 

An alternative resource requirement can be used to indicate how many slots/processors the job requires. For example, a job may request 4 slots/processors on Solaris host types, or 8 slots/processors on Linux86 hosts types. If the -n parameter is provided at the job level then the values specified must be consistent with the values implied by the resource requirement string:

bsub -R " {8*{type==LINUX86}} || {4*{type==SOLARIS}}" a.out

If they conflict, the job submission will be rejected. For example:

bsub -n 3 -R " {8*{type==LINUX86}} || {4*{type==SOLARIS}}" a.out

Start of changeAn affinity resource requirement string specifies CPU and memory binding requirements for a resource allocation that is topology aware. An affinity[] resource requirement section controls the allocation and distribution of processor units within a host according to the hardware topology information that LSF collects. End of change

-S stack_limit

Sets a per-process (soft) stack segment size limit for each of the processes that belong to the job (see getrlimit(2)).

By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (MB, GB, TB, PB, or EB).

-s signal

Send the specified signal when a queue-level run window closes.

By default, when the window closes, LSF suspends jobs running in the queue (job state becomes SSUSP) and stops dispatching jobs from the queue.

Use -s to specify a signal number; when the run window closes, the job is signalled by this signal instead of being suspended.

-rnc resize_notification_cmd

Specify the full path of an executable to be invoked on the first execution host when the job allocation has been modified (both shrink and grow). -rnc overrides the notification command specified in the application profile (if specified). The maximum length of the notification command is 4 KB.

-sla service_class_name

Specifies the service class where the job is to run.

If the SLA does not exist or the user is not a member of the service class, the job is rejected.

If EGO-enabled SLA scheduling is configured with ENABLE_DEFAULT_EGO_SLA in lsb.params, jobs submitted without -sla are attached to the configured default SLA.

You can use -g with -sla. All jobs in a job group attached to a service class are scheduled as SLA jobs. It is not possible to have some jobs in a job group not part of the service class. Multiple job groups can be created under the same SLA. You can submit additional jobs to the job group without specifying the service class name again. You cannot use job groups with resource-based SLAs that have guarantee goals.

Tip:

Submit your velocity, deadline, and throughput SLA jobs with a runtime limit (-W option) or specify RUNLIMIT in the queue definition in lsb.queues or RUNLIMIT in the application profile definition in lsb.applications. If you do not specify a runtime limit for velocity SLAs, LSF automatically adjusts the optimum number of running jobs according to the observed run time of finished jobs.

Use bsla to display the properties of service classes configured in LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses (see lsb.serviceclasses) and dynamic information about the state of each service class.

-sp priority

Specifies user-assigned job priority that orders all jobs (from all users) in a queue. Valid values for priority are any integers between 1 and MAX_USER_PRIORITY (configured in lsb.params, displayed by bparams -l). Job priorities that are not valid are rejected. LSF and queue administrators can specify priorities beyond MAX_USER_PRIORITY.

The job owner can change the priority of their own jobs. LSF and queue administrators can change the priority of all jobs in a queue.

Job order is the first consideration to determine job eligibility for dispatch. Jobs are still subject to all scheduling policies regardless of job priority. Jobs are scheduled based first on their queue priority first, then job priority, and lastly in first-come first-served order.

User-assigned job priority can be configured with automatic job priority escalation to automatically increase the priority of jobs that have been pending for a specified period of time (JOB_PRIORITY_OVER_TIME in lsb.params).

When absolute priority scheduling is configured in the submission queue (APS_PRIORITY in lsb.queues), the user-assigned job priority is used for the JPRIORITY factor in the APS calculation.

-T thread_limit

Sets the limit of the number of concurrent threads to thread_limit for the whole job. The default is no limit.

Exceeding the limit causes the job to terminate. The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL.

-t [[year:][month:]day:]hour:minute

Specifies the job termination deadline.

If a UNIX job is still running at the termination time, the job is sent a SIGUSR2 signal, and is killed if it does not terminate within ten minutes.

If a Windows job is still running at the termination time, it is killed immediately. (For a detailed description of how these jobs are killed, see bkill.)

In the queue definition, a TERMINATE action can be configured to override the bkill default action (see the JOB_CONTROLS parameter in lsb.queues(5)).

In an application profile definition, a TERMINATE_CONTROL action can be configured to override the bkill default action (see the TERMINATE_CONTROL parameter in lsb.applications(5)).

The format for the termination time is [[year:][month:]day:]hour:minute where the number ranges are as follows: year after 1970, month 1-12, day 1-31, hour 0-23, minute 0-59.

At least two fields must be specified. These fields are assumed to be hour:minute. If three fields are given, they are assumed to be day:hour:minute, four fields are assumed to be month:day:hour:minute and five fields are assumed to be year: month:day:hour:minute.

If the year field is specified and the specified time is in the past, the job submission request is rejected.

-U reservation_ID

If an advance reservation has been created with the brsvadd command, the -U option makes use of the reservation.

For example, if the following command was used to create the reservation user1#0,
brsvadd -n 1024 -m hostA -u user1 -b 13:0 -e 18:0
Reservation "user1#0" is created
The following command uses the reservation:
bsub -U user1#0 myjob

The job can only use hosts reserved by the reservation user1#0. LSF only selects hosts in the reservation. You can use the -m option to specify particular hosts within the list of hosts reserved by the reservation, but you cannot specify other hosts not included in the original reservation.

If you do not specify hosts (bsub -m) or resource requirements (bsub -R), the default resource requirement is to select hosts that are of any host type (LSF assumes "type==any" instead of "type==local" as the default select string).

If you later delete the advance reservation while it is still active, any pending jobs still keep the "type==any" attribute.

A job can only use one reservation. There is no restriction on the number of jobs that can be submitted to a reservation; however, the number of slots available on the hosts in the reservation may run out. For example, reservation user2#0 reserves 128 slots on hostA. When all 128 slots on hostA are used by jobs referencing user2#0, hostA is no longer available to other jobs using reservation user2#0. Any single user or user group can have a maximum of 100 reservation IDs

Jobs referencing the reservation are killed when the reservation expires. LSF administrators can prevent running jobs from being killed when the reservation expires by changing the termination time of the job using the reservation (bmod -t) before the reservation window closes.

To use an advance reservation on a remote host, submit the job and specify the remote advance reservation ID. For example:
bsub -U user1#01@cluster1

In this example, we assume the default queue is configured to forward jobs to the remote cluster.

-u mail_user

Sends mail to the specified email destination. To specify a Windows user account, include the domain name in uppercase letters and use a single backslash (DOMAIN_NAME\user_name) in a Windows command line or a double backslash (DOMAIN_NAME\\user_name) in a UNIX command line.

-v swap_limit

Set the total process virtual memory limit to swap_limit for the whole job. The default is no limit. Exceeding the limit causes the job to terminate.

By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (MB, GB, TB, PB, or EB).

-W [hour:]minute[/host_name | /host_model]

Sets the runtime limit of the job. If a UNIX job runs longer than the specified run limit, the job is sent a SIGUSR2 signal, and is killed if it does not terminate within ten minutes. If a Windows job runs longer than the specified run limit, it is killed immediately. (For a detailed description of how these jobs are killed, see bkill.)

In the queue definition, a TERMINATE action can be configured to override the bkill default action (see the JOB_CONTROLS parameter in lsb.queues(5)).

In an application profile definition, a TERMINATE_CONTROL action can be configured to override the bkill default action (see the TERMINATE_CONTROL parameter in lsb.applications(5)).

If you want to provide LSF with an estimated run time without killing jobs that exceed this value, submit the job with -We, or define the RUNTIME parameter in lsb.applications and submit the job to that application profile. LSF uses the estimated runtime value for scheduling purposes only.

The run limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.

The run limit you specify is the normalized run time. This is done so that the job does approximately the same amount of processing, even if it is sent to host with a faster or slower CPU. Whenever a normalized run time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.

If ABS_RUNLIMIT=Y is defined in lsb.params, the runtime limit and the runtime estimate are not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a runtime limit or runtime estimate.

Optionally, you can supply a host name or a host model name defined in LSF. You must insert ‘/’ between the run limit and the host name or model name.

If no host or host model is given, LSF uses the default runtime normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured; otherwise, LSF uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured; otherwise, LSF uses the submission host.

For MultiCluster jobs, if no other CPU time normalization host is defined and information about the submission host is not available, LSF uses the host with the largest CPU factor (the fastest host in the cluster).

If the job also has termination time specified through the bsub -t option, LSF determines whether the job can actually run for the specified length of time allowed by the run limit before the termination time. If not, then the job is aborted.

If the IGNORE_DEADLINE parameter is set in lsb.queues(5), this behavior is overridden and the run limit is ignored.

Jobs submitted to a chunk job queue are not chunked if the run limit is greater than 30 minutes.

-We [hour:]minute[/host_name | /host_model]

Specifies an estimated run time for the job. LSF uses the estimated value for job scheduling purposes only, and does not kill jobs that exceed this value unless the jobs also exceed a defined runtime limit. The format of runtime estimate is same as run limit set by the -W option.

Use JOB_RUNLIMIT_RATIO in lsb.params to limit the runtime estimate users can set. If JOB_RUNLIMIT_RATIO is set to 0 no restriction is applied to the runtime estimate.

The job-level runtime estimate setting overrides the RUNTIME setting in an application profile in lsb.applications.

-w 'dependency_expression'

LSF does not place your job unless the dependency expression evaluates to TRUE. If you specify a dependency on a job that LSF cannot find (such as a job that has not yet been submitted), your job submission fails.

The dependency expression is a logical expression composed of one or more dependency conditions. To make dependency expression of multiple conditions, use the following logical operators:

&& (AND)

|| (OR)

! (NOT)

Use parentheses to indicate the order of operations, if necessary.

Enclose the dependency expression in single quotes (') to prevent the shell from interpreting special characters (space, any logic operator, or parentheses). If you use single quotes for the dependency expression, use double quotes (") for quoted items within it, such as job names.

In a Windows environment with multiple job dependencies, use only double quotes.

In dependency conditions, job names specify only your own jobs. By default, if you use the job name to specify a dependency condition, and more than one of your jobs has the same name, all of your jobs that have that name must satisfy the test. If JOB_DEP_LAST_SUB in lsb.params is set to 1, the test is done on the job submitted most recently.

Use double quotes (") around job names that begin with a number. In the job name, specify the wildcard character asterisk (*) at the end of a string, to indicate all jobs whose name begins with the string. For example, if you use jobA* as the job name, it specifies jobs named jobA, jobA1, jobA_test, jobA.log, etc.

Use the * with dependency conditions to define one-to-one dependency among job array elements such that each element of one array depends on the corresponding element of another array. The job array size must be identical.

For example:
bsub -w "done(myarrayA[*])" -J "myArrayB[1-10]" myJob2 

indicates that before element 1 of myArrayB can start, element 1 of myArrayA must be completed, and so on.

You can also use the * to establish one-to-one array element dependencies with bmod after an array has been submitted.

If you want to specify array dependency by array name, set JOB_DEP_LAST_SUB in lsb.params. If you do not have this parameter set, the job is rejected if one of your previous arrays has the same name but a different index.

In dependency conditions, the variable op represents one of the following relational operators:

>

>=

<

<=

==

!=

Use the following conditions to form the dependency expression.
done(job_ID |"job_name" ...)

The job state is DONE.

LSF refers to the oldest job of job_name in memory.

ended(job_ID | "job_name")

The job state is EXIT or DONE.

exit(job_ID | "job_name" [,[operator] exit_code])

The job state is EXIT, and the job’s exit code satisfies the comparison test.

If you specify an exit code with no operator, the test is for equality (== is assumed).

If you specify only the job, any exit code satisfies the test.

external(job_ID | "job_name", "status_text")

The job has the specified job status. (Commands bstatus and bpost set, change, and retrieve external job status messages.)

If you specify the first word of the job status description (no spaces), the text of the job’s status begins with the specified word. Only the first word is evaluated.

job_ID | "job_name"

If you specify a job without a dependency condition, the test is for the DONE state (LSF assumes the “done” dependency condition by default).

numdone(job_ID, operator number | *)

For a job array, the number of jobs in the DONE state satisfies the test. Use * (with no operator) to specify all the jobs in the array.

numended(job_ID, operator number | *)

For a job array, the number of jobs in the DONE or EXIT states satisfies the test. Use * (with no operator) to specify all the jobs in the array.

numexit(job_ID, operator number | *)

For a job array, the number of jobs in the EXIT state satisfies the test. Use * (with no operator) to specify all the jobs in the array.

numhold(job_ID, operator number | *)

For a job array, the number of jobs in the PSUSP state satisfies the test. Use * (with no operator) to specify all the jobs in the array.

numpend(job_ID, operator number | *)

For a job array, the number of jobs in the PEND state satisfies the test. Use * (with no operator) to specify all the jobs in the array.

numrun(job_ID, operator number | *)

For a job array, the number of jobs in the RUN state satisfies the test. Use * (with no operator) to specify all the jobs in the array.

numstart(job_ID, operator number | *)

For a job array, the number of jobs in the RUN, USUSP, or SSUSP states satisfies the test. Use * (with no operator) to specify all the jobs in the array.

post_done(job_ID | "job_name")

The job state is POST_DONE (post-execution processing of the specified job has completed without errors).

post_err(job_ID | "job_name")

The job state is POST_ERR (post-execution processing of the specified job has completed with errors).

started(job_ID | "job_name")

The job state is:

  • USUSP, SSUSP, DONE, or EXIT

  • RUN and the job has a pre-execution command (bsub -E) that is done.

-wa 'signal'

Specifies the job action to be taken before a job control action occurs.

A job warning action must be specified with a job action warning time in order for job warning to take effect.

If -wa is specified, LSF sends the warning action to the job before the actual control action is taken. This allows the job time to save its result before being terminated by the job control action.

The warning action specified by -wa option overrides JOB_WARNING_ACTION in the queue. JOB_WARNING_ACTION is used as the default when no command line option is specified.

For example the following specifies that 2 minutes before the job reaches its runtime limit, an URG signal is sent to the job:
bsub -W 60 -wt '2' -wa 'URG' myjob
-wt '[hour:]minute'

Specifies the amount of time before a job control action occurs that a job warning action is to be taken. Job action warning time is not normalized.

A job action warning time must be specified with a job warning action in order for job warning to take effect.

The warning time specified by the bsub -wt option overrides JOB_ACTION_WARNING_TIME in the queue. JOB_ACTION_WARNING_TIME is used as the default when no command line option is specified.

For example the following specifies that 2 minutes before the job reaches its runtime limit, an URG signal is sent to the job:
bsub -W 60 -wt '2' -wa 'URG' myjob
-XF

Submits a job using SSH X11 forwarding.

A job submitted with SSH X11 forwarding cannot be used with job arrays, job chunks, or user account mapping.

Jobs with SSH X11 forwarding cannot be checked or modified by an esub.

Use -XF with -I to submit an interactive job using SSH X11 forwarding. The session displays throughout the job lifecycle.

Optionally, specify LSB_SSH_XFORWARD_CMD in lsf.conf. You can replace the default value with an SSH command (full PATH and options allowed).

Cannot be used with -K, -IX, or -r.

For more information, see the LSF Configuration Reference.

-Zs

Spools a job command file to the directory specified by the JOB_SPOOL_DIR parameter in lsb.params, and uses the spooled file as the command file for the job.

By default, the command file is spooled to LSB_SHAREDIR/cluster_name/lsf_cmddir. If the lsf_cmddir directory does not exist, LSF creates it before spooling the file. LSF removes the spooled file when the job completes.

If JOB_SPOOL_DIR is specified, the -Zs option spools the command file to the specified directory and uses the spooled file as the input file for the job.

JOB_SPOOL_DIR can be any valid path up to a maximum length up to 4094 characters on UNIX and Linux or up to 255 characters for Windows.

JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host. If the specified directory is not accessible or does not exist, bsub -Zs cannot write to the default directory LSB_SHAREDIR/cluster_name/lsf_cmddir and the job fails.

The -Zs option is not supported for embedded job commands because LSF is unable to determine the first command to be spooled in an embedded job command.

-h

Prints command usage to stderr and exits.

-V

Prints LSF release version to stderr and exits.

command [argument]

The job can be specified by a command line argument command, or through the standard input if the command is not present on the command line. The command can be anything that is provided to a UNIX Bourne shell (see sh(1)). command is assumed to begin with the first word that is not part of a bsub option. All arguments that follow command are provided as the arguments to the command. Use single quotation marks around the expression if the command or arguments contain special characters.

The job command can be up to 4094 characters long for UNIX and Linux or up to 255 characters for Windows. If no job name is specified with -J, bjobs, bhist and bacct displays the command as the job name.

If the job is not given on the command line, bsub reads the job commands from standard input. If the standard input is a controlling terminal, the user is prompted with bsub> for the commands of the job. The input is terminated by entering CTRL-D on a new line. You can submit multiple commands through standard input.

The commands are executed in the order in which they are given. bsub options can also be specified in the standard input if the line begins with #BSUB; for example, #BSUB -x. If an option is given on both the bsub command line, and in the standard input, the command line option overrides the option in the standard input. The user can specify the shell to run the commands by specifying the shell path name in the first line of the standard input, such as #!/bin/csh. If the shell is not given in the first line, the Bourne shell is used. The standard input facility can be used to spool a user’s job script; such as bsub < script.

Output

If the job is successfully submitted, displays the job ID and the queue to which the job has been submitted.

Examples

bsub sleep 100

Submit the UNIX command sleep together with its argument 100 as a job.

bsub -q short -o my_output_file "pwd; ls"

Submit the UNIX command pwd and ls as a job to the queue named short and store the job output in my_output file.

bsub -m "host1 host3 host8 host9" my_program

Submit my_program to run on one of the candidate hosts: host1, host3, host8 and host9.

bsub -q "queue1 queue2 queue3" -c 5 my_program

Submit my_program to one of the candidate queues: queue1, queue2, and queue3 that are selected according to the CPU time limit specified by -c 5.

bsub -I ls

Submit an interactive job that displays the output of ls at the user’s terminal.

bsub -Ip vi myfile

Submit an interactive job to edit myfile.

bsub -Is csh

Submit an interactive job that starts csh as an interactive shell.

bsub -b 20:00 -J my_job_name my_program

Submit my_program to run after 8 p.m. and assign it the job name my_job_name.

bsub my_script

Submit my_script as a job. Since my_script is specified as a command line argument, the my_script file is not spooled. Later changes to the my_script file before the job completes may affect this job.

bsub < default_shell_script

where default_shell_script contains:

sim1.exe
sim2.exe

The file default_shell_script is spooled, and the commands are run under the Bourne shell since a shell specification is not given in the first line of the script.

bsub < csh_script

where csh_script contains:

#!/bin/csh
sim1.exe
sim2.exe

csh_script is spooled and the commands are run under /bin/csh.

bsub -q night < my_script

where my_script contains:

#!/bin/sh
#BSUB -q test
#BSUB -m "host1 host2" # my default candidate hosts
#BSUB -f "input > tmp" -f "output << tmp"
#BSUB -D 200 -c 10/host1
#BSUB -t 13:00
#BSUB -k "dir 5"
sim1.exe
sim2.exe

The job is submitted to the night queue instead of test, because the command line overrides the script.

bsub -b 20:00 -J my_job_name

bsub> sleep 1800
bsub> my_program
bsub> CTRL-D

The job commands are entered interactively.

bsub -T 4 myjob

Submits myjob with a maximum number of concurrent threads of 4.

bsub -W 15 -sla Duncan sleep 100

Submit the UNIX command sleep together with its argument 100 as a job to the service class named Duncan.

The example submits and IBM PE job and assumes two hosts in cluster, hostA and hostB, each with 4 cores and 2 networks. Each network has one IB adapter with 64 windows.

bsub –n2 –R "span[ptile=1]" –network "protocol=mpi,lapi: type=sn_all: instances=2: usage=shared" poe /home/user1/mpi_prog

For this job running on hostA and hostB, each task will reserve 8 windows (2*2*2), for 2 protocols, 2 instances and 2 networks. If enough network windows are available, other network jobs with usage=shared can run on hostA and hostB because networks used by this job are shared.

Limitations

When using account mapping, the command bpeek does not work. File transfer via the -f option to bsub requires rcp to be working between the submission and execution hosts. Use the -N option to request mail, and/or the -o and -e options to specify an output file and error file, respectively.

See also

bjobs, bkill, bqueues, bhosts, bmgroup, bmod, bchkpnt, brestart, bgadd, bgdel, bjgroup, sh, getrlimit, sbrk, libckpt.a, lsb.users, lsb.queues, lsb.params, lsb.hosts, lsb.serviceclasses, mbatchd