Triggers

Scale uses triggers for automatically generating jobs and recipes to execute as new source data enters the system. Rules are configured and when a (trigger) event occurs in Scale that matches an existing trigger rule, the job(s) and/or recipe(s) for the rule are created and placed on the queue. A given trigger event can trigger multiple rules. There are two different types of Scale triggers: ingest triggers and parse triggers.

Ingest Triggers

Ingest triggers are triggers that can occur when a source file is ingested into Scale. A trigger event is generated for every file ingest and checked against all ingest trigger rules.

Example ingest trigger configuration:

{
   "version": "1.1",
   "condition": {
      "media_type": "text/plain",
      "data_types": [
         "foo"
      ],
      "any_of_data_types": [
         "bar",
         "qaz"
      ],
      "not_data_types": [
         "baz"
      ]
   },
   "data": {
      "input_data_name": "my_file",
      "workspace_name": "my_workspace"
   }
}

The condition field is used to define the conditions for when the ingest rule is triggered. The media_type field says that an ingested file must have a media type of text/plain (a plain text file) in order to trigger this rule. The three data type filtering fields (data_types, any_of_data_types, not_data_types) stipulate that the parsed file must have the data type “foo” tagged, must have data type “bar” or “qaz” tagged, and can not have data type “baz” tagged in order to trigger this rule. The data field specifies the information needed to create the applicable job/recipe (whatever the trigger rule is linked to) when the rule is triggered. The input_data_name field defines the input parameter name of the job/recipe that the ingested file should be passed to, and the workspace_name field gives the unique system name of the workspace for storing all of the products generated by the created job/recipe. To see all of the options for an ingest trigger rule’s configuration, please refer to the Ingest Trigger Configuration Specification below.

Ingest Trigger Configuration Specification Version 1.1

A valid ingest trigger rule configuration is a JSON document with the following structure:

{
   "version": "1.1",
   "condition": {
      "media_type": STRING,
      "data_types": [
         STRING,
         STRING
      ],
      "any_of_data_types": [
         STRING,
         STRING
      ],
      "not_data_types": [
         STRING,
         STRING
      ]
   },
   "data": {
      "input_data_name": STRING,
      "workspace_name": STRING
   }
}

version: JSON string

The version is an optional string value that defines the version of the configuration used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value for version if it is not included is the latest version, which is currently 1.1. It is recommended, though not required, that you include the version so that future changes to the specification will still accept your ingest trigger rule configuration.

condition: JSON object

The condition field is optional and contains other fields that specify the conditions under which this ingest rule is triggered. If not provided, the rule is triggered by EVERY source file ingest.

media_type: JSON string

The media_type field is an optional string that defines a media type. An ingested file must have the identical media type defined here in order to trigger this rule. If not provided, the field defaults to “” and all file media types are accepted by the rule.

data_types: JSON array

The data_types field is an optional list of data type strings. An ingested file must have all of the data types that are listed here tagged to the file in order to trigger this rule. If not provided, the field defaults to [] and no data types are required.

any_of_data_types: JSON array

The any_of_data_types field is an optional list of data type strings. An ingested file must have at least one of the data types that are listed here tagged to the file in order to trigger this rule. If not provided, the field defaults to [] and no data types are required.

not_data_types: JSON array

The not_data_types field is an optional list of data type strings. If an ingested file has any tagged data type that is listed here, the rule will not trigger. If not provided, the field defaults to [] and no data types are required.

data: JSON object

The data field is required and contains other fields that specify the details for creating the job/recipe linked to this trigger rule.

input_data_name: JSON string

The input_data_name field is a required string that specifies the input parameter name of the triggered job/recipe that the ingested file should be passed to when the job/recipe is created and placed on the queue.

workspace_name: JSON string

The workspace_name field is required and contains the unique system name of the workspace that should store the products created by the triggered job/recipe.

Parse Triggers

Parse triggers are triggers that can occur when a source file is parsed. This happens when a job completes with a parse_results section in its generated results manifest file, see Results Manifest. A trigger event is generated for every source file parse and checked against all parse trigger rules.

Example parse trigger configuration:

{
   "version": "1.1",
   "condition": {
      "media_type": "text/plain",
      "data_types": [
         "foo"
      ],
      "any_of_data_types": [
         "bar",
         "qaz"
      ],
      "not_data_types": [
         "baz"
      ]
   },
   "data": {
      "input_data_name": "my_file",
      "workspace_name": "my_workspace"
   }
}

The condition field is used to define the conditions for when the parse rule is triggered. The media_type field says that a parsed file must have a media type of text/plain (a plain text file) in order to trigger this rule. The three data type filtering fields (data_types, any_of_data_types, not_data_types) stipulate that the parsed file must have the data type “foo” tagged, must have data type “bar” or “qaz” tagged, and can not have data type “baz” tagged in order to trigger this rule. The data field specifies the information needed to create the applicable job/recipe (whatever the trigger rule is linked to) when the rule is triggered. The input_data_name field defines the input parameter name of the job/recipe that the parsed file should be passed to, and the workspace_name field gives the unique system name of the workspace for storing all of the products generated by the created job/recipe. To see all of the options for a parse trigger rule’s configuration, please refer to the Parse Trigger Configuration Specification below.

Parse Trigger Configuration Specification Version 1.1

A valid parse trigger rule configuration is a JSON document with the following structure:

{
   "version": "1.1",
   "condition": {
      "media_type": STRING,
      "data_types": [
         STRING,
         STRING
      ],
      "any_of_data_types": [
         STRING,
         STRING
      ],
      "not_data_types": [
         STRING,
         STRING
      ]
   },
   "data": {
      "input_data_name": STRING,
      "workspace_name": STRING
   }
}

version: JSON string

The version is an optional string value that defines the version of the configuration used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value for version if it is not included is the latest version, which is currently 1.1. It is recommended, though not required, that you include the version so that future changes to the specification will still accept your parse trigger rule configuration.

condition: JSON object

The condition field is optional and contains other fields that specify the conditions under which this parse rule is triggered. If not provided, the rule is triggered by EVERY source file parse.

media_type: JSON string

The media_type field is an optional string that defines a media type. A parsed file must have the identical media type defined here in order to trigger this rule. If not provided, the field defaults to “” and all file media types are accepted by the rule.

data_types: JSON array

The data_types field is an optional list of data type strings. A parsed file must have all of the data types that are listed here tagged to the file in order to trigger this rule. If not provided, the field defaults to [] and no data types are required.

any_of_data_types: JSON array

The any_of_data_types field is an optional list of data type strings. An ingested file must have at least one of the data types that are listed here tagged to the file in order to trigger this rule. If not provided, the field defaults to [] and no data types are required.

not_data_types: JSON array

The not_data_types field is an optional list of data type strings. If a parsed file has any tagged data type that is listed here, the rule will not trigger. If not provided, the field defaults to [] and no data types are required.

data: JSON object

The data field is required and contains other fields that specify the details for creating the job/recipe linked to this trigger rule.

input_data_name: JSON string

The input_data_name field is a required string that specifies the input parameter name of the triggered job/recipe that the parsed file should be passed to when the job/recipe is created and placed on the queue.

workspace_name: JSON string

The workspace_name field is required and contains the unique system name of the workspace that should store the products created by the triggered job/recipe.

Clock Triggers

Clock triggers are triggers that can occur on a pre-defined schedule. This happens when a the Scale Clock process fires every minute and looks at what clock trigger rules are due to be executed. A trigger event is generated for every clock tick that exceeds the threshold specified by a clock trigger rule. Each clock rule uses its own custom trigger event that is defined by the specification outlined below. Clock rules are useful for general system maintenance that cannot be associated to a normal event like file parsing. Calculating system metrics/performance or archiving old records are good cases for a clock rule.

Example clock trigger configuration:

{
   "version": "1.0",
   "event_type": "MY_METRICS",
   "schedule": "PT1H0M0S"
}

The event_type field determines the type of event that is triggered and when determining the last time an event was triggered for the rule. The schedule field determines how often the event should be triggered. The schedule value uses the ISO-8601 period format and is interpreted as absolute time within each day. Therefore, in the example above we are specifying the trigger should happen every hour on the hour. If an event is triggered a few minutes after the hour, the next event will still attempt to fire at the top of the next hour, rather than exactly one hour after the previous event in relative time. This makes the system more predictable and avoids events slowly drifting over time.

Also note that the name field of the trigger rule model must match a corresponding clock event processor registration in the clock module. The processor registration determines what function the Scale clock will execute when the rule is due to trigger a new event.

Clock Trigger Configuration Specification Version 1.0

A valid clock trigger rule configuration is a JSON document with the following structure:

{
   "version": "1.0",
   "event_type": STRING,
   "schedule": STRING
}

version: JSON string

The version is an optional string value that defines the version of the configuration used. This allows updates to be made to the specification while maintaining backwards compatibility by allowing Scale to recognize an older version and convert it to the current version. The default value for version if it is not included is the latest version, which is currently 1.0. It is recommended, though not required, that you include the version so that future changes to the specification will still accept your parse trigger rule configuration.

event_type: JSON string

The event_type field is a required string that determines the trigger event associated with the rule. When the clock process checks to see if a rule needs to be triggered it will query for associated events using this type. If the clock determines that the rule does in fact need to trigger, then this type is used to create the new event that is passed to the clock processor function to do the actual work.

schedule: JSON string

The schedule field is a required string that specifies how often the rule should be triggered. The value must follow the ISO-8601 period format, which takes the form of hours, minutes, and seconds to trigger an event. Note that the current Scale clock implementation does not support the optional days portion of the standard and the smallest time slice that it can execute is once every minute. It is also important to note the scheduler interprets the period relative to the start of each day, rather than relative to its last triggered event. That way if a schedule is defined for every hour and one of the executions falls behind by a few minutes, the next event will still attempt to trigger as close to the hour as possible. For example, if we request execution every hour using PT1H0M0S and the last event actually runs at 11:07AM, then the next execution will be attempted at 12:00PM even though that is not a full hour later.