.. _architecture_jobs_job_data: Job Data ======================================================================================================================== The job data is a JSON document that defines the actual data and configuration on which a specific job will run. It will describe all of the data being passed to the job's inputs, as well as configuration for how to handle the job's output. The job data is required when placing a specific job on the queue for the first time. Consider our previous example algorithm, make_geotiff.py, from :ref:`architecture_jobs_interface`. The job data for queuing and running a make_geotiff.py job could be defined as follows: **Example job data:** .. code-block:: javascript { "version": "1.0", "input_data": [ { "name": "image", "file_id": 1234 }, { "name": "georeference_data", "file_id": 1235 } ], "output_data": [ { "name": "geo_image", "workspace_id": 12 } ] } The *input_data* value is a list detailing the data to pass to each input to the job. In this case the input called *image* that takes a PNG image file is being passed a file from the Scale system that has the unique ID 1234, and the input called *georeference_data* which takes a CSV file is being passed a Scale file with the ID 1235. The *output_data* value is a list detailing the configuration for handling the job's outputs, which in our example is a single GeoTIFF file. The configuration in our example defines that after the GeoTIFF file is produced by the job, it should be stored in the workspace with the unique ID 12. To see all of the options for defining job data, please refer to the Job Data Specification below. .. _architecture_jobs_job_data_spec: Job Data Specification Version 1.0 ------------------------------------------------------------------------------------------------------------------------ A valid job data is a JSON document with the following structure: .. code-block:: javascript { "version": STRING, "input_data": [ { "name": STRING, "value": STRING }, { "name": STRING, "file_id": INTEGER }, { "name": STRING, "file_ids": [ INTEGER, INTEGER ] } ], "output_data": [ { "name": STRING, "workspace_id": INTEGER } ] } **version**: JSON string The *version* is an optional string value that defines the version of the data specification used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value for *version* if it is not included is the latest version, which is currently 1.0. It is recommended, though not required, that you include the *version* so that future changes to the specification will still accept your job data. **input_data**: JSON array The *input_data* is a list of JSON objects that define the actual data the job receives for its inputs. If not provided, *input_data* defaults to an empty list (no input data). For the job data to be valid, every required input in the matching job interface must have a corresponding entry in this *input_data* field. The JSON object that represents each input data has the following fields: **name**: JSON string The *name* is a required string that gives the name of the input that the data is being provided for. It should match the name of an input in the job's interface. The name of every input and output in the job data must be unique. The other fields that describe the data being passed to the input are based upon the *type* of the input as it is defined in the job interface, see :ref:`architecture_jobs_interface_spec`. The valid types from the job interface specification are: **property** A "property" input has the following additional field: **value**: JSON string The *value* field contains the string value that will be passed to the "property" input. **file** A "file" input has the following additional field: **file_id**: JSON number The required *file_id* field contains the unique ID of a file in the Scale system that will be passed to the input. The file must meet all of the criteria defined in the job interface for the input. **files** A "files" input has the following additional field: **file_ids**: JSON array The required *file_ids* field is a list of unique IDs of the files in the Scale system that will be passed to the input. Each file must meet all of the criteria defined in the job interface for the input. A "files" input will accept a *file_id* field instead of a *file_ids* field (the input will be passed a list containing the single file). **output_data**: JSON array The *output_data* is a list of JSON objects that define the details for how the job should handle its outputs. If not provided, *output_data* defaults to an empty list (no output data). For the job data to be valid, every output in the matching job interface must have a corresponding entry in this *output_data* field. The JSON object that represents each output data has the following fields: **name**: JSON string The *name* is a required string that gives the name of the input that the data is being provided for. It should match the name of an input in the job's interface. The name of every input and output in the job data must be unique. The other fields that describe the output configuration are based upon the *type* of the output as it is defined in the job interface, see :ref:`architecture_jobs_interface_spec`. The valid types from the job interface specification are: **file** A "file" output has the following additional field: **workspace_id**: JSON number The required *workspace_id* field contains the unique ID of the workspace in the Scale system that this output file should be stored in after it is produced. **files** A "files" output has the following additional field: **workspace_id**: JSON number The required *workspace_id* field contains the unique ID of the workspace in the Scale system that these output files should be stored in after they are produced.