job.configuration.data package¶
Submodules¶
job.configuration.data.data_file module¶
Defines the data file inputs and data file outputs that are contained within job data
-
class
job.configuration.data.data_file.
AbstractDataFileParseSaver
¶ Bases:
object
Abstract base class for a data file parse saver. A data file parse saver provides a way to save parse results for input data files.
-
save_parse_results
(parse_results, input_file_ids)¶ Saves the given parse results
Parameters: - parse_results (dict of str -> tuple(str,
datetime.datetime
,datetime.datetime
, list, str, str)) – Dict with each input file name mapping to a tuple of GeoJSON containing GIS meta-data (optionally None), the start time of the data contained in the file (optionally None), the end time of the data contained in the file (optionally None), the list of data types, and the new workspace path (optionally None) - input_file_ids (list of long) – List of IDs for all input files
- parse_results (dict of str -> tuple(str,
-
-
class
job.configuration.data.data_file.
AbstractDataFileStore
¶ Bases:
object
Abstract base class for a data file store. A data file store provides a way to validate data file output configuration and store output data files.
-
get_workspaces
(workspace_ids)¶ Retrieves the workspaces with the given IDs. If no workspace has a given ID, it will not be retrieved.
Parameters: workspace_ids (set of int) – The set of workspace IDs Returns: Dict with each workspace ID mapping to a bool indicating if it is active (True) Return type: dict of int -> bool
-
store_files
(data_files, input_file_ids, job_exe)¶ Stores the given data files and writes them to the given workspaces.
Parameters: - data_files (dict[int,
product.types.ProductFileMetadata
]) – Dict with workspace ID mapping to a list of ProductFileMetadata elements with absolute local file paths and media type (media type is optionally None) - input_file_ids (set of long) – Set of input file IDs
- job_exe (
job.models.JobExecution
) – The job execution model (with related job and job_type fields) that is storing the files
Returns: Dict with each local file path mapping to its new file ID
Return type: dict of str -> long
- data_files (dict[int,
-
job.configuration.data.exceptions module¶
Defines exceptions that can occur when interacting with job data
-
exception
job.configuration.data.exceptions.
InvalidConfiguration
¶ Bases:
exceptions.Exception
Exception indicating that the provided job configuration was invalid
-
exception
job.configuration.data.exceptions.
InvalidConnection
¶ Bases:
exceptions.Exception
Exception indicating that the provided job connection was invalid
-
exception
job.configuration.data.exceptions.
InvalidData
¶ Bases:
exceptions.Exception
Exception indicating that the provided job data was invalid
-
exception
job.configuration.data.exceptions.
StatusError
¶ Bases:
exceptions.Exception
Exception indicating that an operation cannot be completed due to the current job status.
job.configuration.data.job_connection module¶
Defines connections that will provide data to execute jobs
-
class
job.configuration.data.job_connection.
JobConnection
¶ Bases:
object
Represents a connection that will provide data to execute jobs. This class contains the necessary description needed to ensure the data provided by the connection will be sufficient to execute the given job.
-
add_input_file
(file_name, multiple, media_types, optional, partial)¶ Adds a new file parameter to this connection
Parameters: - file_name (str) – The file parameter name
- multiple (bool) – Whether the file parameter provides multiple files (True)
- media_types (list of str) – The possible media types of the file parameter (unknown if None or [])
- optional (bool) – Whether the file parameter is optional and may not be provided (True)
- partial (bool) – Flag indicating if the parameter only requires a small portion of the file
-
add_property
(property_name)¶ Adds a new property parameter to this connection
Parameters: property_name (str) – The property parameter name
-
add_workspace
()¶ Indicates that this connection provides a workspace for storing output files
-
has_workspace
()¶ Indicates whether this connection provides a workspace for storing output files
Returns: True if this connection provides a workspace, False otherwise Return type: bool
-
validate_input_files
(files)¶ Validates the given file parameters to make sure they are valid with respect to the job interface.
Parameters: files (dict of str -> tuple(bool, bool, job.configuration.interface.scale_file.ScaleFileDescription
)) – Dict of file parameter names mapped to a tuple with three items: whether the parameter is required (True), if the parameter is for multiple files (True), and the description of the expected file meta-dataReturns: A list of warnings discovered during validation. Return type: list[ job.configuration.data.job_data.ValidationWarning
]:raises
job.configuration.data.exceptions.InvalidConnection
: If there is a configuration problem.
-
validate_properties
(property_names)¶ Validates the given property names to make sure all properties exist if they are required.
Parameters: property_names (dict of str -> bool) – Dict of property names mapped to a bool indicating if they are required Returns: A list of warnings discovered during validation. Return type: list[ job.configuration.data.job_data.ValidationWarning
]:raises
job.configuration.data.exceptions.InvalidConnection
: If there is a configuration problem.
-
job.configuration.data.job_data module¶
Defines the data needed for executing a job
-
class
job.configuration.data.job_data.
JobData
(data=None)¶ Bases:
object
Represents the data needed for executing a job. Data includes details about the data inputs, links needed to connect shared resources to resource instances in Scale, and details needed to store all resulting output.
-
add_file_input
(input_name, file_id)¶ Adds a new file parameter to this job data. This method does not perform validation on the job data.
Parameters: - input_name (string) – The file parameter name
- file_id (long) – The ID of the file
-
add_file_list_input
(input_name, file_ids)¶ Adds a new files parameter to this job data. This method does not perform validation on the job data.
Parameters: - input_name (string) – The files parameter name
- file_ids ([long]) – The ID of the file
-
add_file_output
(data, add_to_internal=True)¶ Adds a new output files to this job data with a workspace ID.
Parameters: - data (dict) – The output parameter dict
- add_to_internal (bool) – Whether we should add to private data dict. Unneeded when used from __init__
-
add_output
(output_name, workspace_id)¶ Adds a new output parameter to this job data with a workspace ID. This method does not perform validation on the job data.
Parameters: - output_name (string) – The output parameter name
- workspace_id (int) – The ID of the workspace
-
add_property_input
(input_name, value)¶ Adds a new property parameter to this job data. This method does not perform validation on the job data.
Parameters: - input_name (string) – The property parameter name
- value (string) – The value of the property
-
static
create_output_workspace_dict
(output_params, job_data, job_exe)¶ Creates the mapping from output to workspace both ways: the old way from job data and the new way from job configuration
Parameters: - output_params (
list()
) – The list of output parameter names - job_data (1.0? 2.0? WHO KNOWZ?) – The job data
- job_exe (
job.models.JobExecution
) – The job execution model (with related job and job_type fields)
Returns: Dict where output param name maps to workspace ID
Return type: dict
- output_params (
-
get_all_properties
()¶ Retrieves all properties from this job data and returns them in ascending order of their names
Returns: List of strings containing name=value Return type: [string]
-
get_dict
()¶ Returns the internal dictionary that represents this job data
Returns: The internal dictionary Return type: dict
-
get_injected_env_vars
(input_files_dict)¶ Apply all execution time values to job data
Parameters: input_files ({str, job.execution.configuration.input_file.InputFile
}) – Mapping of input names to InputFilesReturns: Mapping of all input keys to their true file / property values Return type: {str, str}
-
get_injected_input_values
(input_files_dict)¶ Apply all execution time values to job data
Parameters: input_files ({str, job.execution.configuration.input_file.InputFile
}) – Mapping of input names to InputFilesReturns: Mapping of all input keys to their true file / property values Return type: {str, str}
-
get_input_file_ids
()¶ Returns a set of scale file identifiers for each file in the job input data.
Returns: Set of scale file identifiers Return type: {int}
-
get_input_file_ids_by_input
()¶ Returns the list of file IDs for each input that holds files
Returns: Dict where each file input name maps to its list of file IDs Return type: dict
-
get_input_file_info
()¶ Returns a set of scale file identifiers and input names for each file in the job input data.
Returns: Set of scale file identifiers and names Return type: set[tuple]
-
get_output_workspace_ids
()¶ Returns a list of the IDs for every workspace used to store the output files for this data
Returns: List of workspace IDs Return type: [int]
-
get_output_workspaces
()¶ Returns a dict of the output parameter names mapped to their output workspace ID
Returns: A dict mapping output parameters to workspace IDs Return type: dict
-
get_property_values
(property_names)¶ Retrieves the values contained in this job data for the given property names. If no value is available for a property name, it will not be included in the returned dict.
Parameters: property_names ([string]) – List of property names Returns: Dict with each property name mapping to its value Return type: {string: string}
-
has_workspaces
()¶ Whether this job data contains output wrkspaces
Returns: Whether this job data contains output wrkspaces Return type: bool
-
retrieve_input_data_files
(data_files)¶ Retrieves the given data input files and writes them to the given local directories. Any given file parameters that do not appear in the data will not be returned in the results.
Parameters: data_files ({string: tuple(bool, string, bool)}) – Dict with each file parameter name mapping to a bool indicating if the parameter accepts multiple files (True), an absolute directory path and bool indicating if job supports partial file download (True). Returns: Dict with each file parameter name mapping to a list of absolute file paths of the written files Return type: {string: [string]}
-
save_parse_results
(parse_results)¶ Saves the given parse results
Parameters: parse_results ({string: tuple(string, datetime.datetime
,datetime.datetime
, [], string, string)}) – Dict with each input file name mapping to a tuple of GeoJSON containing GIS meta-data (optionally None), the start time of the data contained in the file (optionally None), the end time of the data contained in the file (optionally None), the list of data types, and the new workspace path (optionally None)
-
setup_job_dir
(data_files)¶ Sets up the directory structure for a job execution and downloads the given files
Parameters: data_files ({string: tuple(bool, string)}) – Dict with each file parameter name mapping to a bool indicating if the parameter accepts multiple files (True) and an absolute directory path Returns: Dict with each file parameter name mapping to a list of absolute file paths of the written files Return type: {string: [string]}
-
store_output_data_files
(data_files, job_exe)¶ Stores the given data output files
Parameters: - data_files ({string: [ProductFileMetadata]}) – Dict with each file parameter name mapping to a list of ProductFileMetadata classes
- job_exe (
job.models.JobExecution
) – The job execution model (with related job and job_type fields) that is storing the output data files
Returns: The job results
Return type:
-
validate_input_files
(files)¶ Validates the given file parameters to make sure they are valid with respect to the job interface.
Parameters: files ({string: tuple(bool, bool, job.configuration.interface.scale_file.ScaleFileDescription
)}) – Dict of file parameter names mapped to a tuple with three items: whether the parameter is required (True), if the parameter is for multiple files (True), and the description of the expected file meta-dataReturns: A list of warnings discovered during validation. Return type: [ job.configuration.data.job_data.ValidationWarning
]:raises
job.configuration.data.exceptions.InvalidData
: If there is a configuration problem.
-
validate_output_files
(files)¶ Validates the given file parameters to make sure they are valid with respect to the job interface.
Parameters: files ([string]) – List of file parameter names Returns: A list of warnings discovered during validation. Return type: [ job.configuration.data.job_data.ValidationWarning
]:raises
job.configuration.data.exceptions.InvalidData
: If there is a configuration problem.
-
validate_properties
(property_names)¶ Validates the given property names to ensure they are all populated correctly and exist if they are required.
Parameters: property_names ({string: bool}) – Dict of property names mapped to a bool indicating if they are required Returns: A list of warnings discovered during validation. Return type: [ job.configuration.data.job_data.ValidationWarning
]:raises
job.configuration.data.exceptions.InvalidData
: If there is a configuration problem.
-
-
class
job.configuration.data.job_data.
ValidationWarning
(key, details)¶ Bases:
object
Tracks job data configuration warnings during validation that may not prevent the job from working.