Metajob Plugin Configuration and User Manual

This document details the configuration and use of the 3G Bridge Metajob feature.

User Manual

The Metajob feature of the 3G Bridge enables the user to submit a batch of jobs exactly as they would a single job. Because of this transparency, the Metajob feature can be used even when jobs are submitted to the Bridge through the gLite infrastructure.

Formally, a Metajob is a 3G Bridge job with and extra input file—called the Metajob file— with the name prefix '_3gb-metajob' . This extra input file contains the definition of the sub-jobs of the Metajob.

The Metajob itself is used as a template for its sub-jobs. The Metajob file contains instructions that modify the current template, and instructions to create sub-jobs from the current template.

Submitting a Metajob

To submit a Metajob, you must create the Metajob definition first, and then, submit it to the 3G Bridge. As the information specified in the submission is used as the initial template for the sub-jobs, the example submission is presented first:

wsclient -m add -e 'http://example.com:8091/' \
  -g test -n app \
  -i _3gb-metajob-example=http://example.com/inputs/_3gb-metajob-example \
  -i alpha.txt=http://example.com/inputs/alpha_1.txt \
  -i beta.txt=http://example.com/inputs/beta_1.txt \
  -a '--p=1 --in1=alpha.txt --in2=beta.txt'
  -o result.txt
  -o stats.txt

Notice that aside from the extra input file '_3gb-metajob-example' , this submission is just an ordinary job submission. All attributes, including the target queue (grid+algorithm name), is the same as if you'd submit a single job. This submission—excluding the Metajob definition file—will be used as the initial template. The current state of the template can be changed in the Metajob definition:

_3gb-metajob-example:

...
Input=alpha.txt=http://example.com/inputs/alpha_1.txt=76ebd8e3bcb00ca8a47db97cabfc6f15=1536
Input=beta.txt=http://example.com/inputs/beta_1.txt=6e70ca01869f23676840f7d9fb51312a=9586
Arguments=--p=2 --in1=alpha.txt --in2=beta.txt
...

These commands can be defined multiple times; all occurrence will change the current template, overwriting its previous state. When the current template describes a sub-job we want to submit, a sub-job—or several identical sub-jobs—can be instantiated with the Queue command:

Queue

or, for example,

Queue 10

The Note that:

  1. For the input files, only their location can be redefined. The 'Input=alpha.txt' command sets the location of alpha.txt in the current template (same for beta.txt). No new input files can be defined and none of them can be removed from the template. This implies that
    1. All sub-jobs have to have the same logical set of input files.
    2. Only remote files can be used as input files.
  2. The output file set cannot be changed either.
  3. Remote files can be specified with the BOINC syntax (with MD5 and size).
  4. The Arguments command does not need parentheses.

Controlling the execution of the batch

As the Metajob is itself a 3G Bridge job, it has a status attribute, which must must be determined based on the state of its sub-jobs. The trivial case is when all sub-jobs have finished successfully; in this case, the status of the Metajob can be FINISHED. It is also trivial, when all sub-jobs have failed. But what happens, when some of the jobs have successfully finished, while others have failed? The 3G Bridge allows the user to control its behaviour in these intermediary cases. In the Metajob file, the user can specify the minimum and maximum number of sub-jobs they need to successfully finish.

The lower limit tells the Bridge that the whole Metajob has to be considered failed, and no output is produced if less than this number of sub-jobs have successfully finished. If the number of failed sub-jobs reaches the point where the lower limit becomes impossible to reach, the Bridge prematurely cancels all pending sub-jobs, and the Metajob fails immediately. This is useful when the result of the batch is useful, and sub-results are not needed. If this limit is set to 1, any successful sub-result will be available after the Metajob has finished.

The upper limit tells the Bridge that no more than this number of sub-results is needed. If the number of successfully finished sub-jobs reaches this number, the Bridge cancels all pending sub-jobs, and the Metajob finishes immediately. This is useful to introduce redundancy at user level, if the results are interchangeable (Monte Carlo simulations for example).

These limits can be specified in the Metajob file; both %Minimum and %Maximum can be specified once. This example shows how the limits can be specified. The term All and percentages refer to the number of sub-jobs defined in the Metajob file.

Default value for both limits is All.

%Minimum 5
%Minimum 20%
%Minimum All
%Maximum 50
%Maximum 80%
%Maximum All

Example Metajob submission

The following example shows a correct Metajob submission. The first Queue will create 100 instances of the initial template matching the submission information. The second 100 sub-jobs will have the source of alpha.txt changed, all other attributes being unchanged; etc.

wsclient -m add -e 'http://example.com:8091/' \
  -g test -n app \
  -i _3gb-metajob-example=http://example.com/inputs/_3gb-metajob-example \
  -i alpha.txt=http://example.com/inputs/alpha_1.txt \
  -i beta.txt=http://example.com/inputs/beta_1.txt \
  -a '--p=1 --in1=alpha.txt --in2=beta.txt'
  -o result.txt
  -o stats.txt
_3gb-metajob-example:

%Minimum 1
%Maximum 80%

Queue 100

Input=alpha.txt=http://example.com/inputs/alpha_2.txt=76ebd8e3bcb00ca8a47db97cabfc6f15=1536

Queue 100

Input=beta.txt=http://example.com/inputs/beta_2.txt=6e70ca01869f23676840f7d9fb51312a=9586

Queue 100

Arguments=--p=2 --in1=alpha.txt --in2=beta.txt

Queue 100

Gathering information about the Metajob

The status of the submitted Metajob can be queried the same way as of a normal job (wsclient -m status). However, the status of a Metajob is only an aggregation of the statuses of its sub-jobs. Detailed information about the sub-jobs is provided through a URL, which is stored in the Metajob's griddata attribute:

wsclient -m griddata -e 'http://example.com:8091/' -j <MetaJobID>

Obtaining results

The output files produced by sub-jobs cannot be accessed separately. Instead, when the Metajob has successfully finished, an archive containing the output of successful sub-jobs is created for each logical output file. In our running example, two output files will be created, result.txt and stats.txt; both being a .tar.gz archive (without the extension) containing respective output files of the sub-jobs.

wsclient -m griddata -e 'http://example.com:8091/' -j <MetaJobID>

Output:

<MetaJobID> result.txt http://example.com/download/<...>/result.txt
<MetaJobID> stats.txt http://example.com/download/<...>/stats.txt

To obtain the results, download and extract them:

wget http://example.com/download/<...>/result.txt
wget http://example.com/download/<...>/stats.txt
tar xzf result.txt
tar xzf stats.txt

The result of these commands will be a directory structure, in which directory names are UUID-s (those of the sub-jobs), organized in higher level directories (first two characters of the UUID). In each directory, there will be the corresponding result.txt and stats.txt.

tree
.
├── 0b/
│   └── 0b3e6bd3-f8b4-4d60-84a9-8c9c5855ffce/
│       ├── result.txt
│       └── stats.txt
├── 26/
│   └── 263a6140-cf01-4889-ad21-210ecd3d41c4/
│       ├── result.txt
│       └── stats.txt
├── 47/
│   ├── 475ea5d0-982d-45d2-8260-7eb8493851e5/
│   │   ├── result.txt
│   │   └── stats.txt
│   └── 4787ab3d-594f-4691-96db-b826c3d63b61/
│       ├── result.txt
│       └── stats.txt
└── 82/
    └── 826640ec-f67a-4a1f-a937-35edcd00d4b3/
        ├── result.txt
        └── stats.txt

Configuration

The Metajob feature is implemented as a plugin for the 3G Bridge. To enable the Metajob feature

  • A queue handler must be defined in the configuration.
  • A queue using this handler must be defined in the 3G Bridge database for each algorithm.

The name of the queue must be 'Metajob' (case-sensitive).

mysql boinc_<projectname> << EOF
INSERT INTO cg_algqueue (grid, alg, batchsize) VALUES ('Metajob', 'algname_1', 10);
INSERT INTO cg_algqueue (grid, alg, batchsize) VALUES ('Metajob', 'algname_2', 10);
INSERT INTO cg_algqueue (grid, alg, batchsize) VALUES ('Metajob', 'algname_3', 10);
...
EOF
3g-brigde.conf:
 
...
[Metajob]
handler = Metajob
maxJobsAtOnce = 100

The only configuration option is maxJobsAtOnce which determines the number of sub-jobs to unfold in one run. After unfolding this many jobs, the state is saved, and the control is given back to the QueueManager. In the next iteration, the unfolding continues from the saved state. Higher values allow faster unfolding of individual Metajobs, but may impair the overall throughput of the QueueManager.

Don't forget to configure the DownloadManager plugin too, if you need the Metajob functionality.

manual/3gbplugin-metajob.txt · Last modified: 2013/01/18 09:22 by a.visegradi
Trace: 3gbplugin-metajob
Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0