1. Introduction
Seed is a general standard to aid in the discovery and consumption of a discrete unit of work contained within a Docker image. While initially developed to support the Scale data processing system with job discovery, it is designed to be readily applied to other systems as well.
Seed compliant images must be named in a specific fashion due to the lack of label search capability on Docker Hub
and Registry services. The suffix -seed
must be used when naming images to enable discovery, prior to Hub or
Registry push. This requirement will be deprecated as label search support is standardized across Docker registry
services. Use of the CLI developed by the Seed team is highly recommended to speed
the development and packaging of jobs according to the Seed specification.
1.1. Format
The Docker image created must adhere to a specific naming convention. The standard requires specification of both an job and a packaging version, tracking changes individually between the job logic and the packaging of it. The following image naming template maps to members defined under the Job Objects:
<name>-<jobVersion>-seed:<packageVersion>
Dockerfile snippet containing required label for Seed compliance:
FROM alpine ENTRYPOINT /app/job.sh LABEL com.ngageoint.seed.manifest="{\"seedVersion\":\"1.0.0-snapshot\",\"job\":{ ... }}"
The com.ngageoint.seed.manifest
label contents must be serialized as a string-escaped JSON object. The following is a
complete example including required and optional keys:
{ "seedVersion": "1.0.0-snapshot", "job": { "name": "my-job", "jobVersion": "1.0.0", "packageVersion": "1.0.0", "title": "My first job", "description": "Reads an HDF5 file and outputs two png images, a CSV and manifest containing cell_count", "tags": [ "hdf5", "png", "csv", "image processing" ], "maintainer": { "name": "John Doe", "organization": "E-corp", "email": "jdoe@example.com", "url": "http://www.example.com", "phone": "666-555-4321" }, "timeout": 3600, "interface": { "command": "${INPUT_FILE} ${OUTPUT_DIR} ${VERSION}", "inputs": { "files": [ { "name": "INPUT_FILE", "required": true, "mediaTypes": [ "image/x-hdf5-image" ], "partial": true } ], "json": [ { "name": "INPUT_JSON", "type": "string", "required": true } ] }, "outputs": { "files": [ { "name": "output_file_pngs", "mediaType": "image/png", "multiple": true, "pattern": "outfile*.png" }, { "name": "output_file_csv", "mediaType": "text/csv", "pattern": "outfile*.csv", "required": false } ], "json": [ { "name": "cell_count", "key": "cellCount", "type": "integer" }, { "name": "dummy", "type": "integer", "required": false } ] }, "mounts": [ { "name": "MOUNT_PATH", "path": "/the/container/path", "mode": "ro" }, { "name": "WRITE_PATH", "path": "/write", "mode": "rw" } ], "settings": [ { "name": "VERSION", "secret": false }, { "name": "DB_HOST", "secret": false }, { "name": "DB_PASS", "secret": true } ] }, "resources": { "scalar": [ { "name": "cpus", "value": 1 }, { "name": "mem", "value": 1024 }, { "name": "sharedMem", "value": 1024 }, { "name": "disk", "value": 1000, "inputMultiplier": 4 } ] }, "errors": [ { "code": 1, "name": "error-name-one", "title": "Error Name", "description": "Error Description", "category": "data" }, { "code": 2, "name": "error-name-two", "title": "Error Name", "description": "Error Description", "category": "job" } ] } }
1.2. Definitions
-
Seed specific terms defined in the Glossary supersede all following definitions. These terms can be found italicized throughout the specification.
-
GeoJSON, and the terms Geometry and Polygon are defined in RFC 7946 GeoJSON
-
Internet Assigned Numbers Authority (IANA), and the terms Media Types and MIME Types are defined in IETF RFC 6838
-
ISO 8601 and the specifics of the date format are defined in IETF RFC 3339
-
JavaScript Object Notation (JSON), and the terms object, name, value, array, integer, and number, are defined in JSON Schema.
-
Semantic Versioning (SemVer), and the terms major, minor, and patch version are defined at http://semver.org/spec/v2.0.0.html
-
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119.
2. Standard
The Seed standard is intended to provide a complete definition of the runtime processing, memory and storage requirements of a discrete unit of work, in addition to the inputs, outputs and potential errors produced. Completeness is fundamental but the standard accommodates both simple and complex jobs by defining a minimal subset of REQUIRED properties. The following sections detail every possible REQUIRED and OPTIONAL manifest property in both root and child objects.
A complete Seed object contained within a com.ngageoint.seed.manifest
label is always a string-escaped serialized
object. In Seed, an object consists of a collection of name/value pairs — also called members. For each member, the
name is always a string. Member values are either a string, number, object, array or one of the literals: true, false,
and null. An array consists of elements where each element is a value as described above.
2.1. Seed Object
The Seed object is the root JSON object that MUST be placed within a com.ngageoint.seed.manifest
Docker image label.
At a minimum this object MUST define the seedVersion
and job
names.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be a string that conforms to the SemVer standard. |
|
|
Required |
MUST be a Job Objects. |
|
|
2.1.1. Job Object
The Job object is the core member for describing a single unit of work and the resources it requires.
Name | Requirement | Value | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Required |
MUST be a string of only alphanumeric or dash characters (defined by the regex† |
|||||||||||||||||||||
|
Required |
MUST be a string that conforms to the SemVer standard. |
|||||||||||||||||||||
|
Required |
MUST be a string that conforms to the SemVer standard. |
|||||||||||||||||||||
|
Required |
MUST be a string and SHOULD contain a short descriptive title of the job. |
|||||||||||||||||||||
|
Required |
MUST be a string and SHOULD contain a full job abstract. |
|||||||||||||||||||||
|
Optional |
MUST be an array of strings and MAY contain any number of elements. |
|||||||||||||||||||||
|
Required |
MUST be a integer indicating a timeout period measured in seconds. Consuming systems MUST honor this value as a hard limit on job execution time. |
|||||||||||||||||||||
|
Required |
MUST be an object as defined in Maintainer Object. |
|||||||||||||||||||||
|
Recommended |
MUST be an object as defined in Resources Object. It is highly advised that this member be specified, without it resources provided will be default for the implementing framework. |
|||||||||||||||||||||
|
Optional |
MUST be an object as defined in Interface Object. |
|||||||||||||||||||||
|
Optional |
MUST be an array containing elements defined in Errors Objects |
|||||||||||||||||||||
The following annotated snippet provides quick reference to the use of Job object: |
|||||||||||||||||||||||
|
|
2.1.1.1. Maintainer Object
The Maintainer object is the member that identifies the individual and organization (optional) acting as a point of contact for a Seed job.
Name | Requirement | Value | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Required |
MUST be a string and SHOULD contain the full name of maintaining individual. |
|||||||||||
|
Required |
MUST be a string and SHOULD contain the best contact email for maintaining individual or organization. |
|||||||||||
|
Optional |
MUST be a string and SHOULD contain the best contact phone number for maintaining individual or organization. |
|||||||||||
|
Optional |
MUST be a string and SHOULD contain the organization responsible for maintaining or sponsoring Seed job. |
|||||||||||
|
Optional |
MUST be a string and SHOULD contain a publicly accessible URL to complete job design or usage documentation. |
|||||||||||
The following annotated snippet provides quick reference to the use of Maintainer object: |
|||||||||||||
|
|
2.1.1.2. Resources Object
The Resources object is the member that identifies all resource requirements for a job. This is most commonly CPU,
memory and disk scalar resources, but MAY in the future accommodate more complex types such as ranges and sets. The
final computed resources allocated for all scalar
elements MUST be injected as environment variables to the job at run
time. Reference Environment Variables and Resource Defaults for clarification on what the implementing framework
MUST provide.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be an array of |
The Scalar objects MAY include any arbitrary custom resource name, but there are reserved resources cpus
, disk
,
mem
and sharedMem
that have special meaning. The reserved resource names cpus
, disk
and mem
SHOULD be
populated by all Seed compliant images, as the defaults provided at runtime will likely be inadequate to run all but the
most minimal job. The sharedMem
resource applies primarily to high-performance and scientific applications
and will rarely be needed.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|
|
Required |
MUST be a number indicating the quantity of the resource required by the job. When dealing with storage resources
such as |
|
|
Optional |
MUST be a number indicating the factor by which input file size is multiplied and added to the constant value for resource. |
Use of inputMultiplier
for mem
or disk
resource types is useful when memory or output disk requirements of a job
are a function of input file size. The following basic formula computes the resource requirement when an
inputMultiplier
is defined.
resourceRequirement = inputVolume * inputMultiplier + constantValue
For example, when total input file size is 2.0MiB and an inputMultiplier
of 4.0
and value
of 0.1
is
specified for disk
, the following computes the resource requirement:
diskRequirement = 2.0MiB * 4.0 + 0.1MiB
The following annotated snippet provides quick reference to the use of Scalar object: | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|
2.1.1.3. Interface Object
The Interface object is the primary member that describes the command arguments, environment variables, mounts, settings, inputs and outputs defined for a job.
Name | Requirement | Value | |
---|---|---|---|
|
Optional |
MUST be a string specifying the complete string passed to the container at run time. Based on the Linux shell, shell
escaping of special characters MAY be required. If a Docker ENTRYPOINT is defined that launches the executable, omission
of the executable MAY be necessary in |
|
|
Optional |
MUST be an object as defined in Inputs Object. |
|
|
Optional |
MUST be an object as defined in Outputs Object. |
|
|
Optional |
MUST be an array of |
|
|
Optional |
MUST be an array of |
2.1.1.3.1. Inputs Object
The Inputs object is the member responsible for indicating immutable input data available to the Seed image at runtime.
Name | Requirement | Value | |
---|---|---|---|
|
Optional |
MUST be an array of objects defined in the Files Object sub-section. |
|
|
Optional |
MUST be an array of objects defined in the JSON Object sub-section. |
Critical implementation details related to multiple
member should be referenced in
environment variables. The following table defines the files
object members.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|
|
Optional |
MUST be an array of strings that MUST indicate the IANA Media types that the job accepts. An executor† MAY use any provided media types to report validation warnings to the user in the case of mismatched types. |
|
|
Optional |
MUST be a boolean indicating whether multiple physical files are processed by this |
|
|
Optional |
MUST be a boolean indicating whether input file is required in whole or in part. This allows an executor† to make intelligent choices when providing very large files to a job. This should only be set to true if the job is expected to use less than half of very large input files. If omitted, the default value MUST be treated as false. |
|
|
Optional |
MUST be a boolean indicating whether this input value SHOULD always be expected. If omitted, the default value MUST be treated as true. |
The following table defines the json
object members.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|
|
Required |
MUST be a string and indicate a valid JSON schema type. |
|
|
Optional |
MUST be a boolean indicating whether this input value SHOULD always be expected. If omitted, the default value MUST be treated as true. |
The following annotated snippet provides quick reference to the use of Inputs object: | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
2.1.1.3.2. Outputs Object
The Outputs object is the member responsible for indicating all output data and the means to capture that data
following the execution of a Seed image. Data can be captured in two different forms: directly as a file or
extracted JSON from a manifest. File type output is simply matched based on a standard glob pattern. Recursively scanning
directories is NOT supported, but a known subdirectory structure will work (e.g. base/sub-/.ext). JSON objects are
expected to be gathered from a JSON manifest that by Seed standard convention MUST be written at the root of the job
output directory as seed.outputs.json
. The absolute path to the job output directory is REQUIRED to be passed
into the container at job execution time in the OUTPUT_DIR
environment variable. Special attention should be given to
output file permissions and support is provided for defining
extended metadata.
Name | Requirement | Value | |
---|---|---|---|
|
Optional |
MUST be an array of objects defined in the Files Object sub-section. |
|
|
Optional |
MUST be an array of objects defined in the JSON Object sub-section. |
The following table defines the files
object members.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|
|
Optional |
MUST indicate the IANA Media type for the file being captured by Outputs. |
|
|
Required |
MUST indicate a standard glob pattern for the capture of files. |
|
|
Optional |
MUST be a boolean indicating whether multiple output files may be captured by this |
|
|
Optional |
MUST be a boolean indicating whether this input value SHOULD always be expected. If omitted, the default value
is |
The following table defines the json
object members.
Name | Requirement | Value | |
---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|
|
Required |
MUST be a string indicating the JSON schema type of the member being captured from the |
|
|
Optional |
MUST be a string indicating the case-sensitive |
|
|
Optional |
MUST be a boolean indicating whether this input value SHOULD always be expected. If omitted, the default value MUST be treated as true. |
The following annotated snippets provides quick reference to the use of Outputs object: | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Seed
seed.outputs.json:
|
|
2.1.1.3.3. Mounts Object
The Mounts object is the member responsible for indicating any additional directories that must be mounted into the container for the Job to run. A mount directory is typically a shared file system directory that contains some set of reference data that the Job requires.
Name | Requirement | Value | |||||||
---|---|---|---|---|---|---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|||||||
|
Required |
MUST be an absolute file system path specifying where in the container the Job expects the shared directory to be mounted. |
|||||||
|
Optional |
MUST be a string that either specifies "ro" for read-only access to the directory or "rw" for read-write access. Default value is "ro". |
|||||||
The following annotated snippet provides quick reference to the use of Mounts object: |
|||||||||
|
|
2.1.1.3.4. Settings Object
The Settings object is the member responsible for indicating all content not related to data that is needed for the Seed job to run. These will be exposed as environment variables at run time. Most commonly, Settings will be used for environment specific configuration or external credentials.
While it is highly advised that Seed jobs SHOULD limit input / output to the provided constructs (inputs
/
outputs
), there are justified use cases for violating this encapsulation. If database ingestion or downstream
messaging are necessary, this is a reasonable mechanism to accomplish that.
Name | Requirement | Value | |||||
---|---|---|---|---|---|---|---|
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|||||
|
Optional |
MUST be a boolean that indicates whether the value associated with the named setting is secret and stored as a secure string. |
|||||
The following annotated snippet provides quick reference to the use of Settings object: |
|||||||
|
|
2.1.1.4. Errors Object
The Errors object allows for developers† to map arbitrary exit codes to meaningful textual descriptions. This is useful in passing information to the executor† to differentiate between data and job errors.
Name | Requirement | Value | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Required |
MUST be an integer indicating the exit code of the executing job process. |
|||||||||||
|
Required |
MUST be a string of only alphanumeric, dash or underscore characters (defined by the regex† |
|||||||||||
|
Optional |
MUST be a string indicating the short descriptive title of the error. |
|||||||||||
|
Optional |
MUST be a string indicating the complete error description and possible causes. |
|||||||||||
|
Optional |
MUST be a string containing one of the following values: |
|||||||||||
The following annotated snippet provides quick reference to the use of Errors object: |
|||||||||||||
|
|
3. Usage
3.1. Implementing
A few requirements must be satisfied when an implementer† is building a system capable of executing Seed standardized images. The following sections detail behavior that is expected of the executor†, but these details are also important for developers† to understand what execution context they are provided.
3.1.1. Environment Variables
Environment variable injection MUST be performed as it is the primary means of providing the context required by the
defined interface of a Seed job. These environment variables MAY be consumed by a job directly or shell variable
expansion MAY be leveraged in the interface.command
member. Implementing frameworks MAY perform variable expansion,
but it MUST follow Bash expansion conventions.
3.1.1.1. Normalization
All environment variables injected MUST be normalized to remove any special characters. The majority of environment
variable injection is dynamic and tied to name
member values. These values MUST be sanitized based on the following
rules:
-
Only upper-case alphabetic, numeric and underscore characters are permitted unaltered.
-
All lower-case alphabetic characters MUST be capitalized.
-
All dash characters MUST be replaced with underscores.
3.1.1.2. Injection
Collisions between the reserved variables below and interface.inputs.files , interface.inputs.json and
interface.settings name member values MUST NOT occur.
|
The following variables MUST be provided:
-
OUTPUT_DIR: Absolute path where all output products MUST be placed by job for Seed executor† capture.
-
All
name
member values ofresources.scalar
elements MUST map to environment variables. This SHALL be done by normalizing allname
values and prefixing withALLOCATED_
. For amy-demo-resourceNew
with a value of5.0
, an environment variableALLOCATED_MY_DEMO_RESOURCENEW
SHALL be set to5.0
at run time. For a resource with aninputMultiplier
value the environment variable must include the final computed output resource allocation as defined in thescalar
element under Resources object. It SHOULD be clarified that the allocated value computed using aninputMultiplier
value MUST NOT include the size of input files - only the additional space needed for output files during execution. -
All
name
member values ofinterface.inputs.files
elements MUST map to environment variables. Variable names must follow normalization rules. Executor’s† MUST ensure data is mounted and provide a container resolvable absolute path. Injection behavior is dependent on the boolean value ofmultiple
member. Whenmultiple
istrue
the injected environment variable MUST be an absolute path to a directory with all files immediately beneath it. Whenmultiple
isfalse
it MUST be an absolute path to the single file provided. This environment variable MUST be left unset if memberrequired
is false and the input is not present. -
All
name
member values ofinterface.inputs.json
elements MUST map to environment variables. Variable names must follow normalization rules andarray
,object
andstring
JSON types MUST be injected string encoded. This environment variable MUST be left unset ifrequired
is false and the input is not present. -
All
name
member values ofinterface.settings
elements MUST map to environment variables. Variable names must follow normalization rules.
Supporting complex variable expansion where jobs require parameters associated with a switch can be accomplished with
use of Bash. Taking the example of an optional inputs.files
element, the following command
value will expand with
a preceding switch only when the input is available:
${MY_INPUT/#/-d }
This will expand to -d ${MY_INPUT}
only when MY_INPUT
is set, otherwise the entire expression will be omitted. It
is important to note this is specifically a Bash shell feature and is not present with competing Linux shells.
3.1.2. Output Data Capture
All output data generated by a Seed compliant job MUST be placed within the location identified by OUTPUT_DIR
environment variable. This location MUST be made available by the executor† so that the job is given full write
access. Developers MUST ensure any files captured by interface.outputs.files
member element pattern
values, as well
as the optional seed.outputs.json
file, are marked with read permissions at minimum. This MAY be done with the
following sample command: chmod +r yourfile.txt
Special attention should also be given to the number of files matched per interface.outputs.files
element to ensure it
is consistent in plurality to that defined by multiple
member. If multiple files are matched using the pattern
member value, while the multiple
member is set to false, these jobs SHOULD be failed by the executor†. On the other
hand, even if multiple files are not present when multiple
is true it SHOULD NOT force a job failure by the executor†.
3.1.3. Extended File Metadata
There is often a need by the executor† to capture additional job extracted metadata on output files. The Seed
standard allows for this through the use of side-car files. The side-car files must be named exactly as the file they
describe, with the addition of the .metadata.json
extension to the original file name (extension included). The file
must be formatted according to the Seed Metadata Schema. This allows for both
spatial and any custom associated metadata to be specified. The properties
member is of the object type which allows
for any JSON member type that satisfies your specific use-cases.
The following snippet is a notional example specifying a geometry and temporal bounded feature associated with a file:
{
"type": "Feature", { (1)
"geometry": { (2)
"type": "Polygon",
"coordinates": [
[ [ 100.0, 0.0 ], [ 101.0, 0.0 ], [ 101.0, 1.0 ], [ 100.0, 1.0 ], [ 100.0, 0.0 ] ]
]
},
"properties": { (3)
"time": {
"start": "2016-08-06T00:00:00.000Z",
"end": "2016-08-06T00:00:00.000Z"
}
}
}
1 | Required string indicating the GeoJSON type being defined. |
2 | Required geometry member defining spatial extent of file. |
3 | Required properties member containing example definition temporal extent of file. |
3.1.4. Resource Defaults
At a minimum, the executor† MUST provide at least the resources indicated by a Seed manifest at run time. If the
resource requirement specified by the manifest is below the minimum amount allowed by the executor† it MAY increase the
requirement to that value. Developers† SHOULD specify the cpus
, mem
and disk
requirements of their Seed job, but
if these are not set, the executor† is free to allocate minimal defaults.
For the sharedMem
reserved resource, the executor† SHOULD make use of the Docker run argument shm-size
to provide
the requested resource.
When resources are indicated by a Seed manifest that are not recognized by an executor†, the job SHOULD not be run.
3.2. Examples
The Seed standard is intended to support both simple and complex job packaging. To that end the standard allows for sensible defaults to take the place of a fully specified manifest. The following examples identify both a minimal Seed use and a more realistic, fully exercised standard.
3.2.1. Random Number Generator Job
Minimal manifest demonstrating the simplest possible Seed definition.
{ "seedVersion": "1.0.0-snapshot", "job": { "name": "random-number-gen", "jobVersion": "0.1.0", "packageVersion": "0.1.0", "title": "Random Number Generator", "description": "Generates a random number and outputs on stdout", "maintainer": { "name": "John Doe", "email": "jdoe@example.com" }, "timeout": 10 } }
Serialized as a label in a Dockerfile snippet:
FROM alpine ENTRYPOINT /app/job.sh LABEL com.ngageoint.seed.manifest="{\"seedVersion\":\"1.0.0-snapshot\",\"job\":{\"name\":\"random-number-gen\",\"jobVersion\":\"0.1.0\",\"packageVersion\":\"0.1.0\",\"title\":\"Random Number Generator\",\"description\":\"Generates a random number and outputs on stdout\",\"maintainer\":{\"name\":\"John Doe\",\"email\":\"jdoe@example.com\"},\"timeout\":10}}"
3.2.2. Image Watermark Job
Image watermark job taking a single image and returning with watermark applied using Seed definition.
{ "seedVersion": "1.0.0-snapshot", "job": { "name": "image-watermark", "jobVersion": "0.1.0", "packageVersion": "0.1.0", "title": "Image Watermarker", "description": "Processes an input PNG and outputs watermarked PNG.", "maintainer": { "name": "John Doe", "email": "jdoe@example.com" }, "timeout": 30, "interface": { "command": "${INPUT_IMAGE} ${OUTPUT_DIR}", "inputs": { "files": [ { "name": "INPUT_IMAGE" } ] }, "outputs": { "files": [ { "name": "OUTPUT_IMAGE", "pattern": "*_watermark.png" } ] } }, "resources": { "scalar": [ { "name": "cpus", "value": 1 }, { "name": "mem", "value": 64 } ] }, "errors": [ { "code": 1, "name": "image-Corrupt-1", "description": "Image input is not recognized as a valid PNG.", "category": "data" }, { "code": 2, "name": "algorithm-failure" } ] } }
Serialized as a label in a Dockerfile snippet:
FROM alpine ENTRYPOINT /app/watermark.py LABEL com.ngageoint.seed.manifest="{\"seedVersion\":\"1.0.0-snapshot\",\"job\":{\"name\":\"image-watermark\",\"jobVersion\":\"0.1.0\",\"packageVersion\":\"0.1.0\",\"title\":\"Image Watermarker\",\"description\":\"Processes an input PNG and outputs watermarked PNG.\",\"maintainer\":{\"name\":\"John Doe\",\"email\":\"jdoe@example.com\"},\"timeout\":30,\"interface\":{\"command\":\"\${INPUT_IMAGE} \${OUTPUT_DIR}\",\"inputs\":{\"files\":[{\"name\":\"INPUT_IMAGE\"}]},\"outputs\":{\"files\":[{\"name\":\"OUTPUT_IMAGE\",\"pattern\":\"*_watermark.png\"}]}},\"resources\":{\"scalar\":[{\"name\":\"cpus\",\"value\":1},{\"name\":\"mem\",\"value\":64}]},\"errors\":[{\"code\":1,\"name\":\"image-Corrupt-1\",\"description\":\"Image input is not recognized as a valid PNG.\",\"category\":\"data\"},{\"code\":2,\"name\":\"algorithm-failure\"}]}}"
4. Discovery
A primary intention of this standard is for simple job discovery from public images hosted within either Docker Hub, Docker Trusted Registry or Docker Registry. There is significant fragmentation of APIs between the various Docker offerings and the following sections describe the steps that may be taken to access the labels defined by Seed.
None of the Docker registry services support label search in any fashion. This incurs the requirement of applying a
secondary means to subset image results. The standard presently requires that all job images are named with the suffix
-seed
. This allows for quick filtering of results to a manageable set for discovery.
4.1. Docker Hub
Docker Hub stores Docker image manifest information in a readily accessible format only for Automated Builds. This enforces the need for all developers† wishing to support simple discovery from Docker Hub to support Hub builds, as opposed to local image builds followed by a docker push. Given this caveat, a service such as ImageLayers can be used to quickly identify manifest content after discovering available images.
The following two steps may be taken to find and identify labels within Docker Hub:
-
Perform HTTP GET to find Docker images:
-
Perform HTTP POST to get label details for images found in previous request:
-
BODY: {"repos":[{"name":"myorg/myjob-seed","tag":"latest"}]}
-
The ImageLayers service is a 3rd-party service by CenturyLink Labs, but the source code is openly available at ImageLayers and can be used as a reference implementation.
4.2. Docker Registry
Docker Registry does not natively support any type of search, but does provide a catalog API that allows for listing the entire registry contents. Using this along with tag and manifest inspection will allow label inspection.
The following steps may be taken to find and identify labels of Seed compliant images within Docker Registry:
All references to {registry} , {image-id} and {tag} in the following URLs should be replaced with your
environment specific values.
|
-
Perform HTTP GET against catalog endpoint to find
-seed
suffixed images: -
Perform HTTP GET against tags endpoint for each image matched:
-
Perform HTTP GET against manifests endpoint to retrieve labels per tag (extract labels from history JSON member):
4.3. Docker Trusted Registry
There is a ticket in with Docker Trusted Registry team to natively support label search. Presently there is no API support to inspect hosted images for label metadata. Images must be pulled locally for inspection.
4.4. Silo
The Seed team has developed a tool to overcome these limitations in Docker registries called Silo.
Silo offers a REST API, implemented by the Seed team, for discovering Seed images. The API allows users to scan one or more repositories for Seed images and then search the resulting images and their manifests by keywords. An executor† can leverage these endpoints to assist users in discovering Seed images and creating jobs out of them.
5. Glossary
The following terms are specific to Seed and are provided to clarify their meaning. They are marked with a dagger (†) throughout the document.
Term | Description |
---|---|
developer |
creator and packager of Seed compliant image |
executor |
process responsible for injecting run-time context (inputs, mounts and environment settings) and capturing all resulting output (files, json). |
implementer |
developer of framework for discovering or running Seed compliant images - must provide executor |
regex |
short hand for "Regular Expression," a text string that defines a pattern to be applied to other strings. More info |
glob |
a string defining a pattern to be applied to files on *nix operating systems. More info |
6. Schema
6.1. Seed Manifest
The following JSON Schema should be used to validate Seed manifests prior to label serialization into a Dockerfile for publish. It may be downloaded here: Seed Manifest Schema
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "additionalProperties": false, "properties": { "seedVersion": { "type": "string", "pattern": "^1\\.0\\.0-snapshot$" }, "job": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9-]+$" }, "jobVersion": { "type": "string", "pattern": "^(0|[1-9][0-9]*)\\.(0|[1-9][0-9]*)\\.(0|[1-9][0-9]*)(-(0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*)(\\.(0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*))*)?(\\+[0-9a-zA-Z-]+(\\.[0-9a-zA-Z-]+)*)?$" }, "packageVersion": { "type": "string", "pattern": "^(0|[1-9][0-9]*)\\.(0|[1-9][0-9]*)\\.(0|[1-9][0-9]*)(-(0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*)(\\.(0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*))*)?(\\+[0-9a-zA-Z-]+(\\.[0-9a-zA-Z-]+)*)?$" }, "title": { "type": "string" }, "description": { "type": "string" }, "tags": { "type": "array", "items": { "type": "string" } }, "maintainer": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string" }, "organization": { "type": "string" }, "email": { "type": "string" }, "url": { "type": "string" }, "phone": { "type": "string" } }, "required": [ "name", "email" ] }, "timeout": { "type": "integer" }, "resources": { "type": "object", "additionalProperties": false, "properties": { "scalar": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "value": { "type": "number" }, "inputMultiplier": { "type": "number" } }, "required": [ "name", "value" ] }, "required": [ "scalar" ] } } }, "interface": { "type": "object", "additionalProperties": false, "properties": { "command": { "type": "string" }, "inputs": { "type": "object", "additionalProperties": false, "properties": { "files": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "required": { "type": "boolean", "default": true }, "mediaTypes": { "type": "array", "items": { "type": "string" } }, "multiple": { "type": "boolean", "default": false }, "partial": { "type": "boolean", "default": false } }, "required": [ "name" ] } }, "json": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "required": { "type": "boolean", "default": true }, "type": { "type": "string", "enum": [ "array", "boolean", "integer", "number", "object", "string" ] } }, "required": [ "name", "type" ] } } } }, "outputs": { "type": "object", "additionalProperties": false, "properties": { "files": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "mediaType": { "type": "string" }, "pattern": { "type": "string" }, "multiple": { "type": "boolean", "default": false }, "required": { "type": "boolean", "default": true } }, "required": [ "name", "pattern" ] } }, "json": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "key": { "type": "string" }, "type": { "type": "string", "enum": [ "array", "boolean", "integer", "number", "object", "string" ] }, "required": { "type": "boolean", "default": true } }, "required": [ "name", "type" ] } } } }, "mounts": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "path": { "type": "string" }, "mode": { "enum": [ "ro", "rw" ], "default": "ro" } }, "required": [ "name", "path" ] } }, "settings": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "secret": { "type": "boolean", "default": false } }, "required": [ "name" ] } } } }, "errors": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "code": { "type": "integer" }, "name": { "type": "string", "pattern": "^[a-zA-Z0-9_-]+$" }, "title": { "type": "string" }, "description": { "type": "string" }, "category": { "type": "string", "default": "job", "enum": [ "job", "data" ] } }, "required": [ "code", "name" ] } } }, "required": [ "name", "jobVersion", "packageVersion", "title", "description", "maintainer", "timeout" ] } }, "required": [ "seedVersion", "job" ] }
6.2. Seed Metadata
The following JSON Schema should be used to validate the side-car metadata files generated alongside Seed job output files. It may be downloaded here: Seed Metadata Schema
{ "$schema": "http://json-schema.org/draft-04/schema#", "id": "https://tools.ietf.org/html/rfc7946#", "title": "Geo JSON object", "description": "Schema for a Geo JSON object", "type": "object", "required": [ "type" ], "properties": { "bbox": { "$ref": "http://json-schema.org/geojson/bbox.json#" } }, "oneOf": [ { "$ref": "#/definitions/geometry" }, { "$ref": "#/definitions/geometryCollection" }, { "$ref": "#/definitions/feature" }, { "$ref": "#/definitions/featureCollection" } ], "definitions": { "geometry": { "type": "object", "required": [ "type", "coordinates" ], "oneOf": [ { "title": "Point", "additionalProperties": false, "properties": { "type": { "enum": [ "Point" ] }, "coordinates": { "$ref": "#/definitions/position" } } }, { "title": "MultiPoint", "additionalProperties": false, "properties": { "type": { "enum": [ "MultiPoint" ] }, "coordinates": { "$ref": "#/definitions/positionArray" } } }, { "title": "LineString", "additionalProperties": false, "properties": { "type": { "enum": [ "LineString" ] }, "coordinates": { "$ref": "#/definitions/lineString" } } }, { "title": "MultiLineString", "additionalProperties": false, "properties": { "type": { "enum": [ "MultiLineString" ] }, "coordinates": { "type": "array", "items": { "$ref": "#/definitions/lineString" } } } }, { "title": "Polygon", "additionalProperties": false, "properties": { "type": { "enum": [ "Polygon" ] }, "coordinates": { "$ref": "#/definitions/polygon" } } }, { "title": "MultiPolygon", "additionalProperties": false, "properties": { "type": { "enum": [ "MultiPolygon" ] }, "coordinates": { "type": "array", "items": { "$ref": "#/definitions/polygon" } } } } ] }, "geometryCollection": { "title": "GeometryCollection", "description": "A collection of geometry objects", "required": [ "geometries" ], "properties": { "type": { "enum": [ "GeometryCollection" ] }, "geometries": { "type": "array", "items": { "$ref": "#/definitions/geometry" } } } }, "feature": { "title": "Feature", "description": "A Geo JSON feature object", "required": [ "geometry", "properties" ], "properties": { "type": { "enum": [ "Feature" ] }, "geometry": { "oneOf": [ { "type": "null" }, { "$ref": "#/definitions/geometry" } ] }, "properties": { "type": [ "object", "null" ] }, "id": { "type": [ "string", "number" ] } } }, "featureCollection": { "title": "FeatureCollection", "description": "A Geo JSON feature collection", "required": [ "features" ], "properties": { "type": { "enum": [ "FeatureCollection" ] }, "features": { "type": "array", "items": { "$ref": "#/definitions/feature" } } } }, "position": { "description": "A single position", "type": "array", "minItems": 2, "maxItems": 3, "items": [ { "type": "number" }, { "type": "number" }, { "type": "number" } ], "additionalItems": false }, "positionArray": { "description": "An array of positions", "type": "array", "items": { "$ref": "#/definitions/position" } }, "lineString": { "description": "An array of two or more positions", "allOf": [ { "$ref": "#/definitions/positionArray" }, { "minItems": 2 } ] }, "linearRing": { "description": "An array of four positions where the first equals the last", "allOf": [ { "$ref": "#/definitions/positionArray" }, { "minItems": 4 } ] }, "polygon": { "description": "An array of linear rings", "type": "array", "items": { "$ref": "#/definitions/linearRing" } } } }