disco.job – Disco Jobs¶
This module contains the core objects for creating and interacting with Disco jobs. Often, Job is the only thing you need in order to start running distributed computations with Disco.
Jobs in Disco are used to encapsulate and schedule computation pipelines. A job specifies a worker, the worker environment, a list of inputs, and some additional information about how to run the job. For a full explanation of how the job is specified to the Disco master, see The Job Pack.
A typical pattern in Disco scripts is to run a job synchronously, that is, to block the script until the job has finished. This can be accomplished using the Job.wait() method:
from disco.job import Job
results = Job(name).run(**jobargs).wait()
- class disco.job.Job(name=None, master=None, worker=None, settings=None)¶
Creates a Disco Job with the given name, master, worker, and settings. Use Job.run() to start the job.
Parameters: - name (string) – the job name. When you create a handle for an existing job, the name is used as given. When you create a new job, the name given is used as the jobdict.prefix to construct a unique name, which is then stored in the instance.
- master (url of master or disco.core.Disco) – the Disco master to use for submitting or querying the job.
- worker (disco.worker.Worker) – the worker instance used to create and run the job. If none is specified, the job creates a worker using its Job.Worker attribute.
- Worker¶
Defaults to disco.worker.classic.worker.Worker. If no worker parameter is specified, Worker is called with no arguments to construct the worker.
Note
Note that due to the mechanism used for submitting jobs to the Disco cluster, the submitted job class cannot belong to the __main__ module, but needs to be qualified with a module name. See examples/faq/chain.py for a simple solution for most cases.
- proxy_functions = ('clean', 'events', 'kill', 'jobinfo', 'jobpack', 'oob_get', 'oob_list', 'profile_stats', 'purge', 'results', 'stageresults', 'wait')¶
These methods from disco.core.Disco, which take a jobname as the first argument, are also accessible through the Job object:
For instance, you can use job.wait() instead of disco.wait(job.name). The job methods in disco.core.Disco come in handy if you want to manipulate a job that is identified by a jobname instead of a Job object.
- run(**jobargs)¶
Creates the JobPack for the worker using disco.worker.Worker.jobdict(), disco.worker.Worker.jobenvs(), disco.worker.Worker.jobhome(), disco.task.jobdata(), and attempts to submit it. This method executes on the client submitting a job to be run. More information on how job inputs are specified is available in disco.worker.Worker.jobdict(). The default worker implementation is called classic, and is implemented by disco.worker.classic.worker.
Parameters: jobargs (dict) – runtime parameters for the job. Passed to the disco.worker.Worker methods listed above, along with the job itself. The interpretation of the jobargs is performed by the worker interface in disco.worker.Worker and the class implementing that interface (which defaults to disco.worker.classic.worker). Raises: disco.error.JobError if the submission fails. Returns: the Job, with a unique name assigned by the master.
- class disco.job.JobPack(version, jobdict, jobenvs, jobhome, jobdata)¶
This class implements The Job Pack in Python. The attributes correspond to the fields in the job pack file. Use dumps() to serialize the JobPack for sending to the master.
- jobdict¶
The dictionary of job parameters for the master.
See also The Job Dict.
- jobenvs¶
The dictionary of environment variables to set before the worker is run.
See also Job Environment Variables.
- jobhome¶
The zipped archive to use when initializing the job home. This field should contain the contents of the serialized archive.
See also The Job Home.
- jobdata¶
Binary data that the builtin disco.worker.Worker uses for serializing itself.
See also Additional Job Data.