scheduler.node package¶
Submodules¶
scheduler.node.agent module¶
Defines the class that represents an agent in the scheduler
-
class
scheduler.node.agent.
Agent
(agent_id, hostname)¶ Bases:
object
This class represents an agent available to Scale.
scheduler.node.conditions module¶
Defines the class that holds a node’s current conditions
-
class
scheduler.node.conditions.
NodeConditions
(hostname)¶ Bases:
object
This class represents the set of current conditions that apply to a node.
-
BAD_DAEMON_ERR
= NodeError(name=u'BAD_DAEMON', title=u'Docker Not Responding', description=u'The Docker daemon on this node is not responding.', daemon_bad=True, pull_bad=True)¶
-
BAD_LOGSTASH_ERR
= NodeError(name=u'BAD_LOGSTASH', title=u'Fluentd Not Responding', description=u'The Scale fluentd is not responding to this node.', daemon_bad=False, pull_bad=False)¶
-
CLEANUP_ERR
= NodeError(name=u'CLEANUP', title=u'Cleanup Failure', description=u'The node failed to clean up some Scale Docker containers and volumes.', daemon_bad=False, pull_bad=False)¶
-
CLEANUP_FAILURE
= NodeWarning(name=u'CLEANUP_FAILURE', title=u'Cleanup Failure', description=u'There was a failure cleaning up some of the following jobs: %s')¶
-
CLEANUP_TIMEOUT
= NodeWarning(name=u'CLEANUP_TIMEOUT', title=u'Cleanup Timeout', description=u'There was a timeout cleaning up some of the following jobs: %s')¶
-
HEALTH_ERRORS
= [NodeError(name=u'BAD_DAEMON', title=u'Docker Not Responding', description=u'The Docker daemon on this node is not responding.', daemon_bad=True, pull_bad=True), NodeError(name=u'BAD_LOGSTASH', title=u'Fluentd Not Responding', description=u'The Scale fluentd is not responding to this node.', daemon_bad=False, pull_bad=False), NodeError(name=u'HEALTH_FAIL', title=u'Health Check Failure', description=u'The last node health check failed with an unknown exit code.', daemon_bad=False, pull_bad=False), NodeError(name=u'HEALTH_TIMEOUT', title=u'Health Check Timeout', description=u'The last node health check timed out.', daemon_bad=False, pull_bad=False), NodeError(name=u'LOW_DOCKER_SPACE', title=u'Low Docker Disk Space', description=u'The free disk space available to Docker is low.', daemon_bad=False, pull_bad=True)]¶
-
HEALTH_FAIL_ERR
= NodeError(name=u'HEALTH_FAIL', title=u'Health Check Failure', description=u'The last node health check failed with an unknown exit code.', daemon_bad=False, pull_bad=False)¶
-
HEALTH_TIMEOUT_ERR
= NodeError(name=u'HEALTH_TIMEOUT', title=u'Health Check Timeout', description=u'The last node health check timed out.', daemon_bad=False, pull_bad=False)¶
-
IMAGE_PULL_ERR
= NodeError(name=u'IMAGE_PULL', title=u'Image Pull Failure', description=u'The node failed to pull the Scale Docker image from the registry.', daemon_bad=False, pull_bad=False)¶
-
LOW_DOCKER_SPACE_ERR
= NodeError(name=u'LOW_DOCKER_SPACE', title=u'Low Docker Disk Space', description=u'The free disk space available to Docker is low.', daemon_bad=False, pull_bad=True)¶
-
SLOW_CLEANUP
= NodeWarning(name=u'SLOW_CLEANUP', title=u'Slow Cleanup', description=u'There are %s job executions waiting to be cleaned up on this node.')¶
-
generate_status_json
(node_dict)¶ Generates the portion of the status JSON that describes these node conditions
Parameters: node_dict (dict) – The dict for this node within the status JSON
-
handle_cleanup_task_completed
()¶ Handles the successful completion of a node cleanup task
-
handle_cleanup_task_failed
(job_exes)¶ Handles the failure of a node cleanup task
-
handle_cleanup_task_timeout
(job_exes)¶ Indicates that a node cleanup task has timed out
-
handle_health_task_completed
()¶ Handles the successful completion of a node health check task
-
handle_health_task_failed
(task_update)¶ Handles the given failed task update for a node health check task
Parameters: task_update ( job.tasks.update.TaskStatusUpdate
) – The health check task update
-
handle_health_task_timeout
()¶ Indicates that a node health check task has timed out
-
handle_pull_task_completed
()¶ Handles the successful completion of a node image pull task
-
handle_pull_task_failed
()¶ Handles the failure of a node image pull task
-
handle_pull_task_timeout
()¶ Indicates that a node image pull task has timed out
-
has_active_errors
()¶ Indicates if any errors are currently active
Returns: True if at least one error is active, False otherwise Return type: bool
-
last_cleanup_task_error
()¶ Returns the last time that the cleanup task failed, None if the last cleanup task succeeded
Returns: The time of the last cleanup task failure, possibly None Return type: datetime.datetime
-
last_image_pull_task_error
()¶ Returns the last time that the image pull task failed, None if the last image pull task succeeded
Returns: The time of the last image pull task failure, possibly None Return type: datetime.datetime
-
update_cleanup_count
(num_job_exes)¶ Updates the number of job executions that need to be cleaned up
Parameters: num_job_exes (int`) – The number of job executions that need to be cleaned up
-
scheduler.node.manager module¶
Defines the class that manages the scheduler nodes
-
class
scheduler.node.manager.
NodeManager
¶ Bases:
object
This class manages the scheduler nodes. This class is thread-safe.
-
clear
()¶ Clears all node data from the manager. This method is intended for testing only.
-
generate_status_json
(status_dict)¶ Generates the portion of the status JSON that describes the nodes
Parameters: status_dict (dict) – The status JSON dict
-
get_next_tasks
(when)¶ Returns the next node tasks to schedule
Parameters: when ( datetime.datetime
) – The current timeReturns: A list of the next node tasks to schedule Return type: [ job.tasks.base_task.Task
]
-
get_node
(agent_id)¶ Returns the node with the given agent ID, possibly None
Parameters: agent_id (string) – The agent ID of the node Returns: The node, possibly None Return type: scheduler.node.node_class.Node
-
get_nodes
()¶ Returns a list of all nodes
Returns: The list of all nodes Return type: [ scheduler.node.node_class.Node
]
-
handle_task_timeout
(task)¶ Handles the timeout of the given task
Parameters: task ( job.tasks.base_task.Task
) – The task
-
handle_task_update
(task_update)¶ Handles the given task update for a task
Parameters: task_update ( job.tasks.update.TaskStatusUpdate
) – The task update
-
lost_node
(agent_id)¶ Informs the manager that the node with the given agent ID was lost and has gone offline
Parameters: agent_id (string) – The agent ID of the lost node
-
register_agents
(agents)¶ Adds the list of online agents to the manager so they can be registered
Parameters: agents ( list()
) – The list of online agents to register
-
sync_with_database
(scheduler_config)¶ Syncs with the database to retrieve updated node models and queries Mesos for unknown agent IDs
Parameters: scheduler_config ( scheduler.configuration.SchedulerConfiguration
) – The scheduler configuration
-
scheduler.node.node_class module¶
Defines the class that represents nodes in the scheduler
-
class
scheduler.node.node_class.
Node
(agent_id, node, scheduler_config)¶ Bases:
object
This class represents a node in the scheduler. It combines information retrieved from the database node models as well as run-time information retrieved from Mesos. This class is thread-safe.
-
CLEANUP_ERR_THRESHOLD
= datetime.timedelta(0, 120)¶
-
DEGRADED
= NodeState(state=u'DEGRADED', title=u'Degraded', description=u'Node has an error condition, putting it in a degraded state. New jobs will not be scheduled, and the node will attempt to continue to run existing jobs.')¶
-
DEPRECATED
= NodeState(state=u'DEPRECATED', title=u'Deprecated', description=u'Node is deprecated and will not be used by Scale. Existing jobs on the node will be failed.')¶
-
HEALTH_ERR_THRESHOLD
= datetime.timedelta(0, 120)¶
-
IMAGE_PULL
= NodeState(state=u'IMAGE_PULL', title=u'Pulling image', description=u'Node is pulling the Scale Docker image.')¶
-
IMAGE_PULL_ERR_THRESHOLD
= datetime.timedelta(0, 300)¶
-
INITIAL_CLEANUP
= NodeState(state=u'INITIAL_CLEANUP', title=u'Cleaning up', description=u'Node is performing an initial cleanup step to remove existing Scale Docker containers and volumes.')¶
-
NORMAL_HEALTH_THRESHOLD
= datetime.timedelta(0, 300)¶
-
OFFLINE
= NodeState(state=u'OFFLINE', title=u'Offline', description=u'Node is offline/unavailable, so no jobs can currently run on it.')¶
-
PAUSED
= NodeState(state=u'PAUSED', title=u'Paused', description=u'Node is paused, so no new jobs will be scheduled. Existing jobs will continue to run.')¶
-
READY
= NodeState(state=u'READY', title=u'Ready', description=u'Node is ready to run new jobs.')¶
-
SCHEDULER_STOPPED
= NodeState(state=u'SCHEDULER_STOPPED', title=u'Scheduler Stopped', description=u'Scheduler is paused, so no new jobs will be scheduled. Existing jobs will continue to run.')¶
-
add_job_execution
(job_exe)¶ Adds a job execution that needs to be cleaned up
Parameters: job_exe ( job.execution.job_exe.RunningJobExecution
) – The job execution to add
-
agent_id
¶ Returns the agent ID of the node
Returns: The agent ID Return type: string
-
cleanup_desc
= u'Node is performing an initial cleanup step to remove existing Scale Docker containers and volumes.'¶
-
degraded_desc
= u'Node has an error condition, putting it in a degraded state. New jobs will not be scheduled, and the node will attempt to continue to run existing jobs.'¶
-
deprecated_desc
= u'Node is deprecated and will not be used by Scale. Existing jobs on the node will be failed.'¶
-
generate_status_json
(nodes_list)¶ Generates the portion of the status JSON that describes this node
Parameters: nodes_list ( list()
) – The list of nodes within the status JSON
-
get_next_tasks
(when)¶ Returns the next node tasks to launch
Parameters: when ( datetime.datetime
) – The current timeReturns: The list of node tasks to launch Return type: [ job.tasks.base_task.Task
]
-
handle_task_timeout
(task)¶ Handles the timeout of the given node task
Parameters: task ( job.tasks.base_task.Task
) – The task
-
handle_task_update
(task_update)¶ Handles the given task update
Parameters: task_update ( job.tasks.update.TaskStatusUpdate
) – The task update
-
hostname
¶ Returns the hostname of the node
Returns: The hostname Return type: string
-
id
¶ Returns the ID of the node
Returns: The node ID Return type: int
-
is_active
¶ Indicates whether this node is active (True) or not (False)
Returns: Whether this node is active Return type: bool
-
is_ready_for_new_job
()¶ Indicates whether this node is ready to launch a new job execution
Returns: True if this node is ready to launch a new job execution, False otherwise Return type: bool
-
is_ready_for_next_job_task
()¶ Indicates whether this node is ready to launch the next task of a job execution
Returns: True if this node is ready to launch a job task, False otherwise Return type: bool
-
is_ready_for_system_task
()¶ Indicates whether this node is ready to launch a new system task
Returns: True if this node is ready to launch a new system task, False otherwise Return type: bool
-
paused_desc
= u'Node is paused, so no new jobs will be scheduled. Existing jobs will continue to run.'¶
-
pull_desc
= u'Node is pulling the Scale Docker image.'¶
-
should_be_removed
()¶ Indicates whether this node should be removed from the scheduler. If the node is no longer active and is also no longer online, there’s no reason for the scheduler to continue to track it.
Returns: True if this node should be removed from the scheduler Return type: bool
-
stopped_desc
= u'Scheduler is paused, so no new jobs will be scheduled. Existing jobs will continue to run.'¶
-
update_from_mesos
(agent_id=None, is_online=None)¶ Updates this node’s data from Mesos
Parameters: - agent_id (string) – The Mesos agent ID for the node
- is_online (bool) – Whether the Mesos agent is online
-
update_from_model
(node, scheduler_config)¶ Updates this node’s data from the database models
Parameters: - node (
node.models.Node
) – The node model - scheduler_config (
scheduler.configuration.SchedulerConfiguration
) – The scheduler configuration
- node (
-