The program which submits a job to the master.

An arbitrary file stored in Disco Distributed Filesystem.

See also Blobs.

data locality

Performing computation over a set of data near where the data is located. Disco preserves data locality whenever possible, since transferring data over a network can be prohibitively expensive when operating on massive amounts of data.

See locality of reference.

See Disco Distributed Filesystem.
See Erlang.
garbage collection (GC)
DDFS has a tag-based filesystem, which means that a given blob could be addressed via multiple tags. This means that blobs can only be deleted once the last reference to it is deleted. DDFS uses a garbage collection procedure to detect and delete such unreferenced data.
See immutable object.

The first phase of a job, in which tasks are usually scheduled on the same node where their input data is hosted, so that local computation can be performed.

Also refers to an individual task in this phase, which produces records that may be partitioned, and reduced. Generally there is one map task per input.


Distributed core that takes care of managing jobs, garbage collection for DDFS, and other central processes.

See also Technical Overview.


A set of map and/or reduce tasks, coordinated by the Disco master. When the master receives a disco.job.JobPack, it assigns a unique name for the job, and assigns the tasks to workers until they are all completed.

See also disco.job

job functions
Job functions are the functions that the user can specify for a disco.worker.classic.worker. For example,, disco.worker.classic.func.reduce(), disco.worker.classic.func.combiner(), and disco.worker.classic.func.partition() are job functions.
job dict

The first field in a job pack, which contains parameters needed by the master for job execution.

See also The Job Dict and disco.job.JobPack.jobdict.

job home

The working directory in which a worker is executed. The master creates the job home from a job pack, by unzipping the contents of its jobhome field.

See also The Job Home and disco.job.JobPack.jobhome.

job pack

The packed contents sent to the master when submitting a new job. Includes the job dict and job home, among other things.

See also The Job Pack and disco.job.JobPack.


JavaScript Object Notation.

See Introducing JSON.


A paradigm and associated framework for distributed computing, which decouples application code from the core challenges of fault tolerance and data locality. The framework handles these issues so that jobs can focus on what is specific to their application.

See MapReduce.

The process of dividing output records into a set of labelled bins, much like tags in DDFS. Typically, the output of map is partitioned, and each reduce operates on a single partition.

A process identifier. In Disco this usually refers to the worker pid.

See process identifier.


The last phase of a job, in which non-local computation is usually performed.

Also refers to an individual task in this phase, which usually has access to all values for a given key produced by the map phase. Grouping data for reduce is achieved via partitioning.

Multiple copies (or replicas) of blobs are stored on different cluster nodes so that blobs are still available inspite of a small number of nodes going down.
When a node goes down, the system tries to create additional replicas to replace copies that were lost at the loss of the node.

Network protocol used by Erlang to start slaves.

See SSH.


The process started by the Erlang slave module.

See also Technical Overview.


The standard input file descriptor. The master responds to the worker over stdin.

See standard streams.


The standard output file descriptor. Initially redirected to stderr for a Disco worker.

See standard streams.


The standard error file descriptor. The worker sends messages to the master over stderr.

See standard streams.


A labelled collection of data in DDFS.

See also Tags.


A task is essentially a unit of work, provided to a worker. A Disco job is made of map and reduce tasks.

See also disco.task.


A worker is responsible for carrying out a task. A Disco job specifies the executable that is the worker. Workers are scheduled to run on the nodes, close to the data they are supposed to be processing.


Archive/compression format, used e.g. for the job home.

See ZIP.

Read the Docs v: 0.4.5
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.