job.configuration.data package

Submodules

job.configuration.data.data_file module

Defines the data file inputs and data file outputs that are contained within job data

class job.configuration.data.data_file.AbstractDataFileParseSaver

Bases: object

Abstract base class for a data file parse saver. A data file parse saver provides a way to save parse results for input data files.

save_parse_results(parse_results, input_file_ids)

Saves the given parse results

Parameters:
  • parse_results (dict of str -> tuple(str, datetime.datetime, datetime.datetime, list, str, str)) – Dict with each input file name mapping to a tuple of GeoJSON containing GIS meta-data (optionally None), the start time of the data contained in the file (optionally None), the end time of the data contained in the file (optionally None), the list of data types, and the new workspace path (optionally None)
  • input_file_ids (list of long) – List of IDs for all input files
class job.configuration.data.data_file.AbstractDataFileStore

Bases: object

Abstract base class for a data file store. A data file store provides a way to validate data file output configuration and store output data files.

get_workspaces(workspace_ids)

Retrieves the workspaces with the given IDs. If no workspace has a given ID, it will not be retrieved.

Parameters:workspace_ids (set of int) – The set of workspace IDs
Returns:Dict with each workspace ID mapping to a bool indicating if it is active (True)
Return type:dict of int -> bool
store_files(data_files, input_file_ids, job_exe)

Stores the given data files and writes them to the given workspaces.

Parameters:
  • data_files (dict[int, product.types.ProductFileMetadata]) – Dict with workspace ID mapping to a list of ProductFileMetadata elements with absolute local file paths and media type (media type is optionally None)
  • input_file_ids (set of long) – Set of input file IDs
  • job_exe (job.models.JobExecution) – The job execution model (with related job and job_type fields) that is storing the files
Returns:

Dict with each local file path mapping to its new file ID

Return type:

dict of str -> long

job.configuration.data.exceptions module

Defines exceptions that can occur when interacting with job data

exception job.configuration.data.exceptions.InvalidConfiguration

Bases: exceptions.Exception

Exception indicating that the provided job configuration was invalid

exception job.configuration.data.exceptions.InvalidConnection

Bases: exceptions.Exception

Exception indicating that the provided job connection was invalid

exception job.configuration.data.exceptions.InvalidData

Bases: exceptions.Exception

Exception indicating that the provided job data was invalid

exception job.configuration.data.exceptions.StatusError

Bases: exceptions.Exception

Exception indicating that an operation cannot be completed due to the current job status.

job.configuration.data.job_connection module

Defines connections that will provide data to execute jobs

class job.configuration.data.job_connection.JobConnection

Bases: object

Represents a connection that will provide data to execute jobs. This class contains the necessary description needed to ensure the data provided by the connection will be sufficient to execute the given job.

add_input_file(file_name, multiple, media_types, optional, partial)

Adds a new file parameter to this connection

Parameters:
  • file_name (str) – The file parameter name
  • multiple (bool) – Whether the file parameter provides multiple files (True)
  • media_types (list of str) – The possible media types of the file parameter (unknown if None or [])
  • optional (bool) – Whether the file parameter is optional and may not be provided (True)
  • partial (bool) – Flag indicating if the parameter only requires a small portion of the file
add_property(property_name)

Adds a new property parameter to this connection

Parameters:property_name (str) – The property parameter name
add_workspace()

Indicates that this connection provides a workspace for storing output files

has_workspace()

Indicates whether this connection provides a workspace for storing output files

Returns:True if this connection provides a workspace, False otherwise
Return type:bool
validate_input_files(files)

Validates the given file parameters to make sure they are valid with respect to the job interface.

Parameters:files (dict of str -> tuple(bool, bool, job.configuration.interface.scale_file.ScaleFileDescription)) – Dict of file parameter names mapped to a tuple with three items: whether the parameter is required (True), if the parameter is for multiple files (True), and the description of the expected file meta-data
Returns:A list of warnings discovered during validation.
Return type:list[job.configuration.data.job_data.ValidationWarning]

:raises job.configuration.data.exceptions.InvalidConnection: If there is a configuration problem.

validate_properties(property_names)

Validates the given property names to make sure all properties exist if they are required.

Parameters:property_names (dict of str -> bool) – Dict of property names mapped to a bool indicating if they are required
Returns:A list of warnings discovered during validation.
Return type:list[job.configuration.data.job_data.ValidationWarning]

:raises job.configuration.data.exceptions.InvalidConnection: If there is a configuration problem.

job.configuration.data.job_data module

Defines the data needed for executing a job

class job.configuration.data.job_data.JobData(data=None)

Bases: object

Represents the data needed for executing a job. Data includes details about the data inputs, links needed to connect shared resources to resource instances in Scale, and details needed to store all resulting output.

add_file_input(input_name, file_id)

Adds a new file parameter to this job data. This method does not perform validation on the job data.

Parameters:
  • input_name (string) – The file parameter name
  • file_id (long) – The ID of the file
add_file_list_input(input_name, file_ids)

Adds a new files parameter to this job data. This method does not perform validation on the job data.

Parameters:
  • input_name (string) – The files parameter name
  • file_ids ([long]) – The ID of the file
add_file_output(data, add_to_internal=True)

Adds a new output files to this job data with a workspace ID.

Parameters:
  • data (dict) – The output parameter dict
  • add_to_internal (bool) – Whether we should add to private data dict. Unneeded when used from __init__
add_output(output_name, workspace_id)

Adds a new output parameter to this job data with a workspace ID. This method does not perform validation on the job data.

Parameters:
  • output_name (string) – The output parameter name
  • workspace_id (int) – The ID of the workspace
add_property_input(input_name, value)

Adds a new property parameter to this job data. This method does not perform validation on the job data.

Parameters:
  • input_name (string) – The property parameter name
  • value (string) – The value of the property
static create_output_workspace_dict(output_params, job_data, job_exe)

Creates the mapping from output to workspace both ways: the old way from job data and the new way from job configuration

Parameters:
  • output_params (list()) – The list of output parameter names
  • job_data (1.0? 2.0? WHO KNOWZ?) – The job data
  • job_exe (job.models.JobExecution) – The job execution model (with related job and job_type fields)
Returns:

Dict where output param name maps to workspace ID

Return type:

dict

get_all_properties()

Retrieves all properties from this job data and returns them in ascending order of their names

Returns:List of strings containing name=value
Return type:[string]
get_dict()

Returns the internal dictionary that represents this job data

Returns:The internal dictionary
Return type:dict
get_injected_env_vars(input_files_dict)

Apply all execution time values to job data

Parameters:input_files ({str, job.execution.configuration.input_file.InputFile}) – Mapping of input names to InputFiles
Returns:Mapping of all input keys to their true file / property values
Return type:{str, str}
get_injected_input_values(input_files_dict)

Apply all execution time values to job data

Parameters:input_files ({str, job.execution.configuration.input_file.InputFile}) – Mapping of input names to InputFiles
Returns:Mapping of all input keys to their true file / property values
Return type:{str, str}
get_input_file_ids()

Returns a set of scale file identifiers for each file in the job input data.

Returns:Set of scale file identifiers
Return type:{int}
get_input_file_ids_by_input()

Returns the list of file IDs for each input that holds files

Returns:Dict where each file input name maps to its list of file IDs
Return type:dict
get_input_file_info()

Returns a set of scale file identifiers and input names for each file in the job input data.

Returns:Set of scale file identifiers and names
Return type:set[tuple]
get_output_workspace_ids()

Returns a list of the IDs for every workspace used to store the output files for this data

Returns:List of workspace IDs
Return type:[int]
get_output_workspaces()

Returns a dict of the output parameter names mapped to their output workspace ID

Returns:A dict mapping output parameters to workspace IDs
Return type:dict
get_property_values(property_names)

Retrieves the values contained in this job data for the given property names. If no value is available for a property name, it will not be included in the returned dict.

Parameters:property_names ([string]) – List of property names
Returns:Dict with each property name mapping to its value
Return type:{string: string}
has_workspaces()

Whether this job data contains output wrkspaces

Returns:Whether this job data contains output wrkspaces
Return type:bool
retrieve_input_data_files(data_files)

Retrieves the given data input files and writes them to the given local directories. Any given file parameters that do not appear in the data will not be returned in the results.

Parameters:data_files ({string: tuple(bool, string, bool)}) – Dict with each file parameter name mapping to a bool indicating if the parameter accepts multiple files (True), an absolute directory path and bool indicating if job supports partial file download (True).
Returns:Dict with each file parameter name mapping to a list of absolute file paths of the written files
Return type:{string: [string]}
save_parse_results(parse_results)

Saves the given parse results

Parameters:parse_results ({string: tuple(string, datetime.datetime, datetime.datetime, [], string, string)}) – Dict with each input file name mapping to a tuple of GeoJSON containing GIS meta-data (optionally None), the start time of the data contained in the file (optionally None), the end time of the data contained in the file (optionally None), the list of data types, and the new workspace path (optionally None)
setup_job_dir(data_files)

Sets up the directory structure for a job execution and downloads the given files

Parameters:data_files ({string: tuple(bool, string)}) – Dict with each file parameter name mapping to a bool indicating if the parameter accepts multiple files (True) and an absolute directory path
Returns:Dict with each file parameter name mapping to a list of absolute file paths of the written files
Return type:{string: [string]}
store_output_data_files(data_files, job_exe)

Stores the given data output files

Parameters:
  • data_files ({string: [ProductFileMetadata]}) – Dict with each file parameter name mapping to a list of ProductFileMetadata classes
  • job_exe (job.models.JobExecution) – The job execution model (with related job and job_type fields) that is storing the output data files
Returns:

The job results

Return type:

job.configuration.results.job_results.JobResults

validate_input_files(files)

Validates the given file parameters to make sure they are valid with respect to the job interface.

Parameters:files ({string: tuple(bool, bool, job.configuration.interface.scale_file.ScaleFileDescription)}) – Dict of file parameter names mapped to a tuple with three items: whether the parameter is required (True), if the parameter is for multiple files (True), and the description of the expected file meta-data
Returns:A list of warnings discovered during validation.
Return type:[job.configuration.data.job_data.ValidationWarning]

:raises job.configuration.data.exceptions.InvalidData: If there is a configuration problem.

validate_output_files(files)

Validates the given file parameters to make sure they are valid with respect to the job interface.

Parameters:files ([string]) – List of file parameter names
Returns:A list of warnings discovered during validation.
Return type:[job.configuration.data.job_data.ValidationWarning]

:raises job.configuration.data.exceptions.InvalidData: If there is a configuration problem.

validate_properties(property_names)

Validates the given property names to ensure they are all populated correctly and exist if they are required.

Parameters:property_names ({string: bool}) – Dict of property names mapped to a bool indicating if they are required
Returns:A list of warnings discovered during validation.
Return type:[job.configuration.data.job_data.ValidationWarning]

:raises job.configuration.data.exceptions.InvalidData: If there is a configuration problem.

class job.configuration.data.job_data.ValidationWarning(key, details)

Bases: object

Tracks job data configuration warnings during validation that may not prevent the job from working.

Module contents