Sigma
Important Capabilities
Capability | Status | Notes |
---|---|---|
Asset Containers | ✅ | Enabled by default |
Descriptions | ✅ | Enabled by default |
Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion.remove_stale_metadata |
Extract Ownership | ✅ | Enabled by default, configured using ingest_owner |
Extract Tags | ✅ | Enabled by default |
Platform Instance | ✅ | Enabled by default |
Schema Metadata | ✅ | Enabled by default |
Table-Level Lineage | ✅ | Enabled by default. |
This plugin extracts the following:
- Sigma Workspaces and Workbooks as Container.
- Sigma Datasets
- Pages as Dashboard and its Elements as Charts
Integration Details
This source extracts the following:
- Workspaces and workbooks within that workspaces as Container.
- Sigma Datasets as Datahub Datasets.
- Pages as Datahub dashboards and elements present inside pages as charts.
Configuration Notes
- Refer doc to generate an API client credentials.
- Provide the generated Client ID and Secret in Recipe.
We have observed issues with the Sigma API, where certain API endpoints do not return the expected results, even when the user is an admin. In those cases, a workaround is to manually add the user associated with the Client ID/Secret to each workspace with missing metadata.
Empty workspaces are listed in the ingestion report in the logs with the key empty_workspaces
.
Concept mapping
Sigma | Datahub | Notes |
---|---|---|
Workspace | Container | SubType "Sigma Workspace" |
Workbook | Dashboard | SubType "Sigma Workbook" |
Page | Dashboard | |
Element | Chart | |
Dataset | Dataset | SubType "Sigma Dataset" |
User | User (a.k.a CorpUser) | Optionally Extracted |
Advanced Configurations
Chart source platform mapping
If you want to provide platform details(platform name, platform instance and env) for chart's all external upstream data sources, then you can use chart_sources_platform_mapping
as below:
Example - For just one specific chart's external upstream data sources
chart_sources_platform_mapping:
"workspace_name/workbook_name/chart_name_1":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
"workspace_name/folder_name/workbook_name/chart_name_2":
data_source_platform: postgres
platform_instance: cloud_instance
env: DEV
Example - For all charts within one specific workbook
chart_sources_platform_mapping:
"workspace_name/workbook_name_1":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
"workspace_name/folder_name/workbook_name_2":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Example - For all workbooks charts within one specific workspace
chart_sources_platform_mapping:
"workspace_name":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
Example - All workbooks use the same connection
chart_sources_platform_mapping:
"*":
data_source_platform: snowflake
platform_instance: new_instance
env: PROD
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: sigma
config:
# Coordinates
api_url: "https://aws-api.sigmacomputing.com/v2"
# Credentials
client_id: "CLIENTID"
client_secret: "CLIENT_SECRET"
# Optional - filter for certain workspace names instead of ingesting everything.
# workspace_pattern:
# allow:
# - workspace_name
ingest_owner: true
# Optional - mapping of sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.
# chart_sources_platform_mapping:
# folder_path:
# data_source_platform: postgres
# platform_instance: cloud_instance
# env: DEV
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
client_id ✅ string | Sigma Client ID |
client_secret ✅ string | Sigma Client Secret |
api_url string | Sigma API hosted URL. |
extract_lineage boolean | Whether to extract lineage of workbook's elements and datasets or not. Default: True |
ingest_owner boolean | Ingest Owner from source. This will override Owner info entered from UI. Default: True |
ingest_shared_entities boolean | Whether to ingest the shared entities or not. Default: False |
platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. |
env string | The environment that all assets produced by this connector belong to Default: PROD |
chart_sources_platform_mapping map(str,PlatformDetail) | Any source that connects to a platform should inherit this class |
chart_sources_platform_mapping. key .envstring | The environment that all assets produced by this connector belong to Default: PROD |
chart_sources_platform_mapping. key .data_source_platform ❓string | A chart's data sources platform name. |
chart_sources_platform_mapping. key .platform_instancestring | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. |
workbook_lineage_pattern AllowDenyPattern | Regex patterns to filter workbook's elements and datasets lineage in ingestion.Requires extract_lineage to be enabled. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
workbook_lineage_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
workbook_lineage_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
workbook_lineage_pattern.allow.string string | |
workbook_lineage_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
workbook_lineage_pattern.deny.string string | |
workspace_pattern AllowDenyPattern | Regex patterns to filter Sigma workspaces in ingestion.Mention 'My documents' if personal entities also need to ingest. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
workspace_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
workspace_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
workspace_pattern.allow.string string | |
workspace_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
workspace_pattern.deny.string string | |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Sigma Stateful Ingestion Config. |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.fail_safe_threshold number | Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'. Default: 75.0 |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "SigmaSourceConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
"type": "string"
},
"stateful_ingestion": {
"title": "Stateful Ingestion",
"description": "Sigma Stateful Ingestion Config.",
"allOf": [
{
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
}
]
},
"api_url": {
"title": "Api Url",
"description": "Sigma API hosted URL.",
"default": "https://aws-api.sigmacomputing.com/v2",
"type": "string"
},
"client_id": {
"title": "Client Id",
"description": "Sigma Client ID",
"type": "string"
},
"client_secret": {
"title": "Client Secret",
"description": "Sigma Client Secret",
"type": "string"
},
"workspace_pattern": {
"title": "Workspace Pattern",
"description": "Regex patterns to filter Sigma workspaces in ingestion.Mention 'My documents' if personal entities also need to ingest.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"ingest_owner": {
"title": "Ingest Owner",
"description": "Ingest Owner from source. This will override Owner info entered from UI.",
"default": true,
"type": "boolean"
},
"ingest_shared_entities": {
"title": "Ingest Shared Entities",
"description": "Whether to ingest the shared entities or not.",
"default": false,
"type": "boolean"
},
"extract_lineage": {
"title": "Extract Lineage",
"description": "Whether to extract lineage of workbook's elements and datasets or not.",
"default": true,
"type": "boolean"
},
"workbook_lineage_pattern": {
"title": "Workbook Lineage Pattern",
"description": "Regex patterns to filter workbook's elements and datasets lineage in ingestion.Requires extract_lineage to be enabled.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"chart_sources_platform_mapping": {
"title": "Chart Sources Platform Mapping",
"description": "A mapping of the sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.",
"default": {},
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/PlatformDetail"
}
}
},
"required": [
"client_id",
"client_secret"
],
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
},
"fail_safe_threshold": {
"title": "Fail Safe Threshold",
"description": "Prevents large amount of soft deletes & the state from committing from accidental changes to the source configuration if the relative change percent in entities compared to the previous state is above the 'fail_safe_threshold'.",
"default": 75.0,
"minimum": 0.0,
"maximum": 100.0,
"type": "number"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"PlatformDetail": {
"title": "PlatformDetail",
"description": "Any source that connects to a platform should inherit this class",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
"type": "string"
},
"data_source_platform": {
"title": "Data Source Platform",
"description": "A chart's data sources platform name.",
"type": "string"
}
},
"required": [
"data_source_platform"
],
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.sigma.sigma.SigmaSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Sigma, feel free to ping us on our Slack.