Job Data¶
The job data is a JSON document that defines the actual data and configuration on which a specific job will run. It will describe all of the data being passed to the job’s inputs, as well as configuration for how to handle the job’s output. The job data is required when placing a specific job on the queue for the first time.
Consider our previous example algorithm, make_geotiff.py, from Job Interface. The job data for queuing and running a make_geotiff.py job could be defined as follows:
Example job data:
{
"version": "1.0",
"input_data": [
{
"name": "image",
"file_id": 1234
},
{
"name": "georeference_data",
"file_id": 1235
}
],
"output_data": [
{
"name": "geo_image",
"workspace_id": 12
}
]
}
The input_data value is a list detailing the data to pass to each input to the job. In this case the input called image that takes a PNG image file is being passed a file from the Scale system that has the unique ID 1234, and the input called georeference_data which takes a CSV file is being passed a Scale file with the ID 1235. The output_data value is a list detailing the configuration for handling the job’s outputs, which in our example is a single GeoTIFF file. The configuration in our example defines that after the GeoTIFF file is produced by the job, it should be stored in the workspace with the unique ID 12. To see all of the options for defining job data, please refer to the Job Data Specification below.
Job Data Specification Version 1.0¶
A valid job data is a JSON document with the following structure:
{
"version": STRING,
"input_data": [
{
"name": STRING,
"value": STRING
},
{
"name": STRING,
"file_id": INTEGER
},
{
"name": STRING,
"file_ids": [
INTEGER,
INTEGER
]
}
],
"output_data": [
{
"name": STRING,
"workspace_id": INTEGER
}
]
}
version: JSON string
The version is an optional string value that defines the version of the data specification used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value for version if it is not included is the latest version, which is currently 1.0. It is recommended, though not required, that you include the version so that future changes to the specification will still accept your job data.
input_data: JSON array
The input_data is a list of JSON objects that define the actual data the job receives for its inputs. If not provided, input_data defaults to an empty list (no input data). For the job data to be valid, every required input in the matching job interface must have a corresponding entry in this input_data field. The JSON object that represents each input data has the following fields:
name: JSON string
The name is a required string that gives the name of the input that the data is being provided for. It should match the name of an input in the job’s interface. The name of every input and output in the job data must be unique.The other fields that describe the data being passed to the input are based upon the type of the input as it is defined in the job interface, see Job Interface Specification Version 1.4. The valid types from the job interface specification are:
property
A “property” input has the following additional field:
value: JSON string
The value field contains the string value that will be passed to the “property” input.file
A “file” input has the following additional field:
file_id: JSON number
The required file_id field contains the unique ID of a file in the Scale system that will be passed to the input. The file must meet all of the criteria defined in the job interface for the input.files
A “files” input has the following additional field:
file_ids: JSON array
The required file_ids field is a list of unique IDs of the files in the Scale system that will be passed to the input. Each file must meet all of the criteria defined in the job interface for the input. A “files” input will accept a file_id field instead of a file_ids field (the input will be passed a list containing the single file).
output_data: JSON array
The output_data is a list of JSON objects that define the details for how the job should handle its outputs. If not provided, output_data defaults to an empty list (no output data). For the job data to be valid, every output in the matching job interface must have a corresponding entry in this output_data field. The JSON object that represents each output data has the following fields:
name: JSON string
The name is a required string that gives the name of the input that the data is being provided for. It should match the name of an input in the job’s interface. The name of every input and output in the job data must be unique.The other fields that describe the output configuration are based upon the type of the output as it is defined in the job interface, see Job Interface Specification Version 1.4. The valid types from the job interface specification are:
file
A “file” output has the following additional field:
workspace_id: JSON number
The required workspace_id field contains the unique ID of the workspace in the Scale system that this output file should be stored in after it is produced.files
A “files” output has the following additional field:
workspace_id: JSON number
The required workspace_id field contains the unique ID of the workspace in the Scale system that these output files should be stored in after they are produced.