The Job Pack

The job pack contains all the information needed for creating and running a Disco job.

The first time any task of a job executes on a Disco node, the job pack for the job is retrieved from the master, and The Job Home is unzipped into a job-specific directory.

See also

The Python disco.job.JobPack class.

File format:

+---------------- 4
| magic / version |
|---------------- 8 -------------- 12 ------------- 16 ------------- 20
| jobdict offset  | jobenvs offset | jobhome offset | jobdata offset |
|------------------------------------------------------------------ 128
|                           ... reserved ...                         |
|                               jobdict                              |
|                               jobenvs                              |
|                               jobhome                              |
|                               jobdata                              |

The Job Dict

The job dict is a JSON dictionary.


A list of urls or a list of lists of urls. Each url is a string.


An inner list of urls gives replica urls for the same data. This lets you specify redundant versions of an input file. If a list of redundant inputs is specified, the scheduler chooses the input that is located on the node with the lowest load at the time of scheduling. Redundant inputs are tried one by one until the task succeeds. Redundant inputs require that map? is specified.


The path to the worker binary, relative to the job home. The master will execute this binary after it unpacks it from The Job Home.

Boolean telling whether or not this job should have a map phase.


Boolean telling whether or not this job should have a reduce phase.


Non-negative integer telling the master how many reduces to run.


This attribute will soon be removed, as the number of reduces can be inferred in all cases.


String giving the prefix the master should use for assigning a unique job name.


Only characters in [a-zA-Z0-9_] are allowed in the prefix.


Dictionary of options for the job scheduler. Currently supports the following keys:

  • max_cores - use at most this many cores (applies to both map and reduce). Default is 2**31.
  • force_local - always run task on the node where input data is located; never use HTTP to access data remotely.
  • force_remote - never run task on the node where input data is located; always use HTTP to access data remotely.

New in version 0.2.4.


String name of the owner of the job.

Job Environment Variables

A JSON dictionary of environment variables (with string keys and string values). The master will set these in the environment before running the jobdict.worker.

The Job Home

The job home directory serialized into ZIP format. The master will unzip this before running the jobdict.worker. The worker is run with this directory as its working directory.

In addition to the worker executable, the job home can be populated with files that are needed at runtime by the worker. These could either be shared libraries, helper scripts, or parameter data.


The .disco subdirectory of the job home is reserved by Disco.

The job home is shared by all tasks of the same job on the same node. That is, if the job requires two map task and two reduce task executions on a particular node, then the job home will be unpacked only once on that node, but the worker executable will be executed four times in the job home directory, and it is also possible for some of these executions to be concurrent. Thus, the worker should take care to use unique filenames as needed.

Additional Job Data

Arbitrary data included in the job pack, used by the worker. A running worker can access the job pack at the path specified by jobfile in the response to the TASK message.

Creating and submitting a Job Pack

The jobpack can be constructed and submitted using the disco job command.

Read the Docs v: 0.4.5
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.