disco.settings
– Disco Settings¶
Settings can be specified in a Python file and/or using environment variables.
Settings specified in environment variables override those stored in a file.
The default settings are intended to make it easy to get Disco running on a single node.
make install will create a more reasonable settings file for a cluster environment,
and put it in /etc/disco/settings.py
Disco looks in the following places for a settings file:
- The settings file specified using the command line utility
--settings
option.~/.disco
/etc/disco/settings.py
Possible settings for Disco are as follows:
DISCO_DATA
¶Directory to use for writing data. Default obtained using
os.path.join(DISCO_ROOT, data)
.
DISCO_DEBUG
¶Sets the debugging level for Disco. Default is
1
.
DISCO_ERLANG
¶Command used to launch Erlang on all nodes in the cluster. Default usually
erl
, but depends on the OS.
DISCO_EVENTS
¶If set, events are logged to stdout. If set to
json
, events will be written as JSON strings. If set tonocolor
, ANSI color escape sequences will not be used, even if the terminal supports it. Default is unset (the empty string).
DISCO_FLAGS
¶Default is the empty string.
DISCO_HOME
¶The directory which Disco runs out of. If you run Disco out of the source directory, you shouldn’t need to change this. If you use
make install
to install Disco, it will be set properly for you in/etc/disco/settings.py
.
DISCO_HTTPD
¶Command used to launch lighttpd. Default is
lighttpd
.
DISCO_MASTER_HOME
¶Directory containing the Disco
master
directory. Default is obtained usingos.path.join(DISCO_HOME, 'master')
.
DISCO_MASTER_HOST
¶The hostname of the master. Default obtained using
socket.gethostname()
.
DISCO_MASTER_ROOT
¶Directory to use for writing master data. Default obtained using
os.path.join(DISCO_DATA, '_%s' % DISCO_NAME)
.
DISCO_MASTER_CONFIG
¶Directory to use for writing cluster configuration. Default obtained using
os.path.join(DISCO_ROOT, '%s.config' % DISCO_NAME)
.
DISCO_NAME
¶A unique name for the Disco cluster. Default obtained using
'disco_%s' % DISCO_PORT
.
DISCO_LOG_DIR
¶Directory where log-files are created. The same path is used for all nodes in the cluster. Default is obtained using
os.path.join(DISCO_ROOT, 'log')
.
DISCO_PID_DIR
¶Directory where pid-files are created. The same path is used for all nodes in the cluster. Default is obtained using
os.path.join(DISCO_ROOT, 'run')
.
DISCO_PORT
¶The port the workers use for HTTP communication. Default is
8989
.
DISCO_ROOT
¶Root directory for Disco-written data and metadata. Default is obtained using
os.path.join(DISCO_HOME, 'root')
.
DISCO_ROTATE_LOG
¶Whether to rotate the master log on startup. Default is
False
.
DISCO_USER
¶The user Disco should run as. Default obtained using
os.getenv(LOGNAME)
.
DISCO_JOB_OWNER
¶User name shown on the job status page for the user who submitted the job. Default is the login name @ host.
DISCO_WWW_ROOT
¶Directory that is the document root for the master HTTP server. Default obtained using
os.path.join(DISCO_MASTER_HOME, www)
.
DISCO_GC_AFTER
¶How long to wait before garbage collecting job-generated intermediate and result data. Only results explictly saved to DDFS won’t be garbage collected. Default is
100 * 365 * 24 * 60 * 60
(100 years). (Note that this setting does not affect data in DDFS.)
DISCO_PROFILE
¶Whether Disco should start profiling applications and send profiling data to a graphite server.
GRAPHITE_HOST
¶If DISCO_PROFILE is set, then some performance data from Disco
¶will be sent to the graphite host. The default is localhost.
¶We are assuming that the listening port is the default graphite
¶port.
¶
SYSTEMD_ENABLED
¶This adds -noshell to the erlang process. It provides compatibility for running disco using a non-forking process type in the service definition.
DISCO_WORKER_MAX_MEM
¶How much memory can be used by worker in total. Worker calls resource.setrlimit(RLIMIT_AS, limit) to set the limit when it starts. Can be either a percentage of total available memory or an exact number of bytes. Note that
setrlimit
behaves differently on Linux and Mac OS X, see man setrlimit for more information. Default is80%
i.e. 80% of the total available memory.
Settings to control the proxying behavior:
DISCO_PROXY_ENABLED
¶If set, enable proxying through the master. This is a master-side setting (set in
master:/etc/disco/settings.py
). Default is''
.
DISCO_PROXY
¶The address of the proxy to use on the client side. This is in the format
http://<proxy-host>:<proxy-port>
, where<proxy-port>
normally matches the value ofDISCO_PROXY_PORT
set on the master.Default is
''
.
DISCO_PROXY_PORT
¶The port the master proxy should run on. This is master-side setting (set in
master:/etc/disco/settings.py
). Default is8999
.
Settings to control the scheduler behavior:
Settings used by the testing environment:
Settings used by DDFS:
DDFS_ROOT
¶Deprecated since version 0.4.
Use
DDFS_DATA
instead. Only provided as a default for backwards compatability. Default is obtained usingos.path.join(DISCO_ROOT, 'ddfs')
.
DDFS_DATA
¶The root data directory for DDFS. Default is obtained using
DDFS_ROOT
.
DDFS_PUT_PORT
¶The port to use for writing to DDFS nodes. Must be open to the Disco client unless proxying is used. Default is
8990
.
DDFS_PUT_MAX
¶The maximum default number of retries for a PUT operation. Default is
3
.
DDFS_GET_MAX
¶The maximum default number of retries for a GET operation. Default is
3
.
DDFS_READ_TOKEN
¶The default read authorization token to use. Default is
None
.
DDFS_WRITE_TOKEN
¶The default write authorization token to use. Default is
None
.
DDFS_GC_INITIAL_WAIT
¶The amount of time to wait after startup before running GC (in minutes). Default is
''
, which triggers an internal default of 5 minutes.
DDFS_GC_BALANCE_THRESHOLD
¶The distance a node's disk utilization can be from the average
¶disk utilization of the cluster before the node is considered
¶to be over-utilized or under-utilized. Default is ``0.1``.
¶
DDFS_PARANOID_DELETE
¶Instead of deleting unneeded files, DDFS garbage collector prefixes obsolete files with
!trash.
, so they can be safely verified/deleted by an external process. For instance, the following command can be used to finally delete the files (assuming thatDDFS_DATA = "/srv/disco/ddfs"
):find /srv/disco/ddfs/ -perm 600 -iname '!trash*' -exec rm {} \;Default is
''
.
The following settings are used by DDFS to determine the number of replicas for data/metadata to keep (it is not recommended to use the provided defaults in a multinode cluster):
DDFS_TAG_MIN_REPLICAS
¶The minimum number of replicas for a tag operation to succeed. Default is
1
.
DDFS_TAG_REPLICAS
¶The number of replicas of tags that DDFS should aspire to keep. Default is
1
.
DDFS_BLOB_REPLICAS
¶The number of replicas of blobs that DDFS should aspire to keep. Default is
1
.
DDFS_SPACE_AWARE
¶Whether DDFS should take the amount of free space in the nodes into account when choosing the nodes to write to. Default is ````.
DDFS_ABSOLUTE_SPACE
¶Only effective in the space-aware mode. If set, the nodes with the higher absolute free space will be given precedence for hosting replicas. If unset, the nodes with the highest ratio of the free space to the total space will be given precedence for hosting the replicas.