An arbitrary file stored in Disco Distributed Filesystem.
See also Blobs.
Performing computation over a set of data near where the data is located. Disco preserves data locality whenever possible, since transferring data over a network can be prohibitively expensive when operating on massive amounts of data.
The first phase of a job, in which tasks are usually scheduled on the same node where their input data is hosted, so that local computation can be performed.
Also refers to an individual task in this phase, which produces records that may be partitioned, and reduced. Generally there is one map task per input.
Distributed core that takes care of managing jobs, garbage collection for DDFS, and other central processes.
See also Technical Overview.
A set of map and/or reduce tasks, coordinated by the Disco master. When the master receives a disco.job.JobPack, it assigns a unique name for the job, and assigns the tasks to workers until they are all completed.
See also disco.job
The first field in a job pack, which contains parameters needed by the master for job execution.
See also The Job Dict and disco.job.JobPack.jobdict.
The working directory in which a worker is executed. The master creates the job home from a job pack, by unzipping the contents of its jobhome field.
See also The Job Home and disco.job.JobPack.jobhome.
The packed contents sent to the master when submitting a new job. Includes the job dict and job home, among other things.
See also The Job Pack and disco.job.JobPack.
JavaScript Object Notation.
See Introducing JSON.
A paradigm and associated framework for distributed computing, which decouples application code from the core challenges of fault tolerance and data locality. The framework handles these issues so that jobs can focus on what is specific to their application.
See MapReduce.
A process identifier. In Disco this usually refers to the worker pid.
See process identifier.
The last phase of a job, in which non-local computation is usually performed.
Also refers to an individual task in this phase, which usually has access to all values for a given key produced by the map phase. Grouping data for reduce is achieved via partitioning.
Network protocol used by Erlang to start slaves.
See SSH.
The process started by the Erlang slave module.
See also Technical Overview.
The standard input file descriptor. The master responds to the worker over stdin.
See standard streams.
The standard output file descriptor. Initially redirected to stderr for a Disco worker.
See standard streams.
The standard error file descriptor. The worker sends messages to the master over stderr.
See standard streams.
A labelled collection of data in DDFS.
See also Tags.
A task is essentially a unit of work, provided to a worker. A Disco job is made of map and reduce tasks.
See also disco.task.
A worker is responsible for carrying out a task. A Disco job specifies the executable that is the worker. Workers are scheduled to run on the nodes, close to the data they are supposed to be processing.
See also
Archive/compression format, used e.g. for the job home.
See ZIP.