scheduler.node package

Submodules

scheduler.node.agent module

Defines the class that represents an agent in the scheduler

class scheduler.node.agent.Agent(agent_id, hostname)

Bases: object

This class represents an agent available to Scale.

scheduler.node.conditions module

Defines the class that holds a node’s current conditions

class scheduler.node.conditions.NodeConditions(hostname)

Bases: object

This class represents the set of current conditions that apply to a node.

BAD_DAEMON_ERR = NodeError(name=u'BAD_DAEMON', title=u'Docker Not Responding', description=u'The Docker daemon on this node is not responding.', daemon_bad=True, pull_bad=True)
BAD_LOGSTASH_ERR = NodeError(name=u'BAD_LOGSTASH', title=u'Fluentd Not Responding', description=u'The Scale fluentd is not responding to this node.', daemon_bad=False, pull_bad=False)
CLEANUP_ERR = NodeError(name=u'CLEANUP', title=u'Cleanup Failure', description=u'The node failed to clean up some Scale Docker containers and volumes.', daemon_bad=False, pull_bad=False)
CLEANUP_FAILURE = NodeWarning(name=u'CLEANUP_FAILURE', title=u'Cleanup Failure', description=u'There was a failure cleaning up some of the following jobs: %s')
CLEANUP_TIMEOUT = NodeWarning(name=u'CLEANUP_TIMEOUT', title=u'Cleanup Timeout', description=u'There was a timeout cleaning up some of the following jobs: %s')
HEALTH_ERRORS = [NodeError(name=u'BAD_DAEMON', title=u'Docker Not Responding', description=u'The Docker daemon on this node is not responding.', daemon_bad=True, pull_bad=True), NodeError(name=u'BAD_LOGSTASH', title=u'Fluentd Not Responding', description=u'The Scale fluentd is not responding to this node.', daemon_bad=False, pull_bad=False), NodeError(name=u'HEALTH_FAIL', title=u'Health Check Failure', description=u'The last node health check failed with an unknown exit code.', daemon_bad=False, pull_bad=False), NodeError(name=u'HEALTH_TIMEOUT', title=u'Health Check Timeout', description=u'The last node health check timed out.', daemon_bad=False, pull_bad=False), NodeError(name=u'LOW_DOCKER_SPACE', title=u'Low Docker Disk Space', description=u'The free disk space available to Docker is low.', daemon_bad=False, pull_bad=True)]
HEALTH_FAIL_ERR = NodeError(name=u'HEALTH_FAIL', title=u'Health Check Failure', description=u'The last node health check failed with an unknown exit code.', daemon_bad=False, pull_bad=False)
HEALTH_TIMEOUT_ERR = NodeError(name=u'HEALTH_TIMEOUT', title=u'Health Check Timeout', description=u'The last node health check timed out.', daemon_bad=False, pull_bad=False)
IMAGE_PULL_ERR = NodeError(name=u'IMAGE_PULL', title=u'Image Pull Failure', description=u'The node failed to pull the Scale Docker image from the registry.', daemon_bad=False, pull_bad=False)
LOW_DOCKER_SPACE_ERR = NodeError(name=u'LOW_DOCKER_SPACE', title=u'Low Docker Disk Space', description=u'The free disk space available to Docker is low.', daemon_bad=False, pull_bad=True)
SLOW_CLEANUP = NodeWarning(name=u'SLOW_CLEANUP', title=u'Slow Cleanup', description=u'There are %s job executions waiting to be cleaned up on this node.')
generate_status_json(node_dict)

Generates the portion of the status JSON that describes these node conditions

Parameters:node_dict (dict) – The dict for this node within the status JSON
handle_cleanup_task_completed()

Handles the successful completion of a node cleanup task

handle_cleanup_task_failed(job_exes)

Handles the failure of a node cleanup task

handle_cleanup_task_timeout(job_exes)

Indicates that a node cleanup task has timed out

handle_health_task_completed()

Handles the successful completion of a node health check task

handle_health_task_failed(task_update)

Handles the given failed task update for a node health check task

Parameters:task_update (job.tasks.update.TaskStatusUpdate) – The health check task update
handle_health_task_timeout()

Indicates that a node health check task has timed out

handle_pull_task_completed()

Handles the successful completion of a node image pull task

handle_pull_task_failed()

Handles the failure of a node image pull task

handle_pull_task_timeout()

Indicates that a node image pull task has timed out

has_active_errors()

Indicates if any errors are currently active

Returns:True if at least one error is active, False otherwise
Return type:bool
last_cleanup_task_error()

Returns the last time that the cleanup task failed, None if the last cleanup task succeeded

Returns:The time of the last cleanup task failure, possibly None
Return type:datetime.datetime
last_image_pull_task_error()

Returns the last time that the image pull task failed, None if the last image pull task succeeded

Returns:The time of the last image pull task failure, possibly None
Return type:datetime.datetime
update_cleanup_count(num_job_exes)

Updates the number of job executions that need to be cleaned up

Parameters:num_job_exes (int`) – The number of job executions that need to be cleaned up
class scheduler.node.conditions.NodeError(name, title, description, daemon_bad, pull_bad)

Bases: tuple

daemon_bad

Alias for field number 3

description

Alias for field number 2

name

Alias for field number 0

pull_bad

Alias for field number 4

title

Alias for field number 1

class scheduler.node.conditions.NodeWarning(name, title, description)

Bases: tuple

description

Alias for field number 2

name

Alias for field number 0

title

Alias for field number 1

scheduler.node.manager module

Defines the class that manages the scheduler nodes

class scheduler.node.manager.NodeManager

Bases: object

This class manages the scheduler nodes. This class is thread-safe.

clear()

Clears all node data from the manager. This method is intended for testing only.

generate_status_json(status_dict)

Generates the portion of the status JSON that describes the nodes

Parameters:status_dict (dict) – The status JSON dict
get_next_tasks(when)

Returns the next node tasks to schedule

Parameters:when (datetime.datetime) – The current time
Returns:A list of the next node tasks to schedule
Return type:[job.tasks.base_task.Task]
get_node(agent_id)

Returns the node with the given agent ID, possibly None

Parameters:agent_id (string) – The agent ID of the node
Returns:The node, possibly None
Return type:scheduler.node.node_class.Node
get_nodes()

Returns a list of all nodes

Returns:The list of all nodes
Return type:[scheduler.node.node_class.Node]
handle_task_timeout(task)

Handles the timeout of the given task

Parameters:task (job.tasks.base_task.Task) – The task
handle_task_update(task_update)

Handles the given task update for a task

Parameters:task_update (job.tasks.update.TaskStatusUpdate) – The task update
lost_node(agent_id)

Informs the manager that the node with the given agent ID was lost and has gone offline

Parameters:agent_id (string) – The agent ID of the lost node
register_agents(agents)

Adds the list of online agents to the manager so they can be registered

Parameters:agents (list()) – The list of online agents to register
sync_with_database(scheduler_config)

Syncs with the database to retrieve updated node models and queries Mesos for unknown agent IDs

Parameters:scheduler_config (scheduler.configuration.SchedulerConfiguration) – The scheduler configuration

scheduler.node.node_class module

Defines the class that represents nodes in the scheduler

class scheduler.node.node_class.Node(agent_id, node, scheduler_config)

Bases: object

This class represents a node in the scheduler. It combines information retrieved from the database node models as well as run-time information retrieved from Mesos. This class is thread-safe.

CLEANUP_ERR_THRESHOLD = datetime.timedelta(0, 120)
DEGRADED = NodeState(state=u'DEGRADED', title=u'Degraded', description=u'Node has an error condition, putting it in a degraded state. New jobs will not be scheduled, and the node will attempt to continue to run existing jobs.')
DEPRECATED = NodeState(state=u'DEPRECATED', title=u'Deprecated', description=u'Node is deprecated and will not be used by Scale. Existing jobs on the node will be failed.')
HEALTH_ERR_THRESHOLD = datetime.timedelta(0, 120)
IMAGE_PULL = NodeState(state=u'IMAGE_PULL', title=u'Pulling image', description=u'Node is pulling the Scale Docker image.')
IMAGE_PULL_ERR_THRESHOLD = datetime.timedelta(0, 300)
INITIAL_CLEANUP = NodeState(state=u'INITIAL_CLEANUP', title=u'Cleaning up', description=u'Node is performing an initial cleanup step to remove existing Scale Docker containers and volumes.')
NORMAL_HEALTH_THRESHOLD = datetime.timedelta(0, 300)
OFFLINE = NodeState(state=u'OFFLINE', title=u'Offline', description=u'Node is offline/unavailable, so no jobs can currently run on it.')
PAUSED = NodeState(state=u'PAUSED', title=u'Paused', description=u'Node is paused, so no new jobs will be scheduled. Existing jobs will continue to run.')
READY = NodeState(state=u'READY', title=u'Ready', description=u'Node is ready to run new jobs.')
SCHEDULER_STOPPED = NodeState(state=u'SCHEDULER_STOPPED', title=u'Scheduler Stopped', description=u'Scheduler is paused, so no new jobs will be scheduled. Existing jobs will continue to run.')
add_job_execution(job_exe)

Adds a job execution that needs to be cleaned up

Parameters:job_exe (job.execution.job_exe.RunningJobExecution) – The job execution to add
agent_id

Returns the agent ID of the node

Returns:The agent ID
Return type:string
cleanup_desc = u'Node is performing an initial cleanup step to remove existing Scale Docker containers and volumes.'
degraded_desc = u'Node has an error condition, putting it in a degraded state. New jobs will not be scheduled, and the node will attempt to continue to run existing jobs.'
deprecated_desc = u'Node is deprecated and will not be used by Scale. Existing jobs on the node will be failed.'
generate_status_json(nodes_list)

Generates the portion of the status JSON that describes this node

Parameters:nodes_list (list()) – The list of nodes within the status JSON
get_next_tasks(when)

Returns the next node tasks to launch

Parameters:when (datetime.datetime) – The current time
Returns:The list of node tasks to launch
Return type:[job.tasks.base_task.Task]
handle_task_timeout(task)

Handles the timeout of the given node task

Parameters:task (job.tasks.base_task.Task) – The task
handle_task_update(task_update)

Handles the given task update

Parameters:task_update (job.tasks.update.TaskStatusUpdate) – The task update
hostname

Returns the hostname of the node

Returns:The hostname
Return type:string
id

Returns the ID of the node

Returns:The node ID
Return type:int
is_active

Indicates whether this node is active (True) or not (False)

Returns:Whether this node is active
Return type:bool
is_ready_for_new_job()

Indicates whether this node is ready to launch a new job execution

Returns:True if this node is ready to launch a new job execution, False otherwise
Return type:bool
is_ready_for_next_job_task()

Indicates whether this node is ready to launch the next task of a job execution

Returns:True if this node is ready to launch a job task, False otherwise
Return type:bool
is_ready_for_system_task()

Indicates whether this node is ready to launch a new system task

Returns:True if this node is ready to launch a new system task, False otherwise
Return type:bool
paused_desc = u'Node is paused, so no new jobs will be scheduled. Existing jobs will continue to run.'
pull_desc = u'Node is pulling the Scale Docker image.'
should_be_removed()

Indicates whether this node should be removed from the scheduler. If the node is no longer active and is also no longer online, there’s no reason for the scheduler to continue to track it.

Returns:True if this node should be removed from the scheduler
Return type:bool
stopped_desc = u'Scheduler is paused, so no new jobs will be scheduled. Existing jobs will continue to run.'
update_from_mesos(agent_id=None, is_online=None)

Updates this node’s data from Mesos

Parameters:
  • agent_id (string) – The Mesos agent ID for the node
  • is_online (bool) – Whether the Mesos agent is online
update_from_model(node, scheduler_config)

Updates this node’s data from the database models

Parameters:
class scheduler.node.node_class.NodeState(state, title, description)

Bases: tuple

description

Alias for field number 2

state

Alias for field number 0

title

Alias for field number 1

Module contents