DataJob MetaData
Interface Overview
In the BladePipe DataJob configuration, it is difficult to understand two parts: one is to describe the structure of metadata such as data source database tables, and the other is the mapping relationship between source and target metadata.
This document describes how to understand the structure of metadata such as data source database tables.
For Example
The following structure generally appears in the DataJob configuration srcSchema or dstSchema, representing information such as a data source database table. This structure has several features:
- Structure content nesting represents the hierarchy of data source metadata
- Different data source types have different hierarchies
- There are some elements that differ between data source types
- The information described by each layer contains the common targetAutoCreate and inBlackList attributes, the former indicating whether the peer structure is automatically created and the latter indicating whether it is in the blacklist
[
{
"db": "dingtax",
"dbPattern": "",
"tables": [
{
"table": "access_table_111112222222333333333333344444444444444",
"tablePattern": "",
"columns": [
{
"column": "id",
"targetAutoCreate": true,
"inBlackList": false
},
{
"column": "guid",
"targetAutoCreate": true,
"inBlackList": false
}
],
"actions": [
"INSERT",
"UPDATE",
"DELETE"
],
"inBlackList": false,
"targetAutoCreate": true,
"specifiedPks": []
},
{
"table": "kbs_no_pk_have_uniq",
"tablePattern": "",
"columns": [
{
"column": "name",
"targetAutoCreate": false,
"inBlackList": false
},
{
"column": "uniq_id",
"targetAutoCreate": false,
"inBlackList": false
}
],
"actions": [
"INSERT"
],
"inBlackList": false,
"targetAutoCreate": false,
"specifiedPks": [
"uniq_id"
]
}
],
"targetAutoCreate": false,
"inBlackList": false
}
]
Common data source hierarchies and special attributes
Datasource Name | First level | Second level | Third level | Fourth level |
---|---|---|---|---|
MySQL | Database db (Database name) dbPattern (Expression to match database names, not used) tables (List of table information, see second level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) specifiedPks (List of specified primary keys) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see third level description) | Column column (Column name) | None |
PolarDbMySQL | Database db (Database name) dbPattern (Expression to match database names, not used) tables (List of table information, see second level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) specifiedPks (List of specified primary keys) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see third level description) | Column column (Column name) | None |
PostgreSQL | Database db (Database name) schemas (List of pg schemas, see second level description) | Schema schema (Schema name) schemaPattern (Expression to match schema names, not used) tables (List of table information, see third level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see fourth level description) | Column column (Column name) |
Greenplum | Database db (Database name) schemas (List of pg schemas, see second level description) | Schema schema (Schema name) schemaPattern (Expression to match schema names, not used) tables (List of table information, see third level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see fourth level description) | Column column (Column name) |
Oracle | Database db (Database name) tableSpaces (List of oracle schemas, see second level description) | Schema tableSpace (Schema name) tableSpacePattern (Expression to match schema names, not used) tables (List of table information, see third level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see fourth level description) | Column column (Column name) |
SQLServer | Not available | Not available | Not available | Not available |
Redis | Namespace for cache key prefix (Prefix for cache key namespace) suffixFields (List of properties for cache key namespace) | None | None | None |
ElasticSearch | Index indexName (Name of the index) idFieldNames (List of property names to construct the primary key of ES) numberOfShards (Number of shards, type int) numberOfReplicas (Number of replicas, type int) globalTimeZone (Timezone string) fields (List of column information, see second level description) | Field fieldName (Field name) fieldTypeName (Type of field in ES) needIndex (Whether the field needs to be indexed, type boolean) timeFormat (Time format) esAnalyzerType (Analyzer type, see esAnalyzerType description below) needAutoCreated (Whether the field should be automatically created, type boolean) | ||
AdbForMySQL | Database db (Database name) dbPattern (Expression to match database names, not used) tables (List of table information, see second level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see third level description) | Column column (Column name) | None |
TiDB | Database db (Database name) dbPattern (Expression to match database names, not used) tables (List of table information, see second level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see third level description) | Column column (Column name) | None |
ClickHouse | Database db (Database name) dbPattern (Expression to match database names, not used) tables (List of table information, see second level description) | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) columns (List of columns, see third level description) | Column column (Column name) | None |
Kudu | Table table (Table name) tablePattern (Expression to match table names, not used) dataFilter (JSON structure, see dataFilter structure description below) actions (List of actions, currently including INSERT, UPDATE, and DELETE) partitions (List of partitions, see second level description) columns (List of columns, see second level description) | Partition columns (Columns of the partition) partitionType (Partition type, possible values are Range and Hash) Column column (Column name) | None | None |
MongoDB | Database db (Database name) collections (List of collections, see second level description) | Collection collection (Collection name) actions (List of actions, currently including INSERT, UPDATE, and DELETE) | None | None |
Kafka | Topic topic (Topic name) topicPattern (Expression to match topic names, not used) partitions (Number of partitions, type integer) partitionKeys (List of partition keys) | None | None | None |
RocketMQ | Topic topic (Topic name) topicPattern (Expression to match topic names, not used) partitions (Number of partitions, type integer) partitionKeys (List of partition keys) | None | None | None |
RabbitMQ | Queue queue (Queue name) queuePattern (Expression to match queue names, not used) | None | None | None |
Hive | Database db (Database name) dbPattern (Expression to match database names, not used) tables (List of table information, see second level description) | Table table (Table name) tablePattern (Expression to match table names, not used) partitionKeys (List of partition keys, see partitionKeys structure description below) columns (List of columns, see third level description) | Column column (Column name) | None |
Data Filter structure description
Property Name | Property Description |
---|---|
Type | Data filter type SQL_WHERE JAVA_CODE (not implemented) REGULAR_EXPRESSION (not implemented) AVIATOR_EXPRESSION (not implemented) |
Expression | Data filter expression corresponding to the type |
Partition Keys Structure description
Property Name | Property Description |
---|---|
Key Name | Name of the partition key |
OriginCol | Data source column |
PartitionFunction | Partition method EQUAL YEAR_FORMAT MONTH_FORMAT DAY_FORMAT HOUR_FORMAT MINUTE_FORMAT |
EsAnalyzer Type Description
user-defined classifier. In ElasticSearch, the word splitter needs to be named to the corresponding lowercase string, i.e. 'custom_a', 'custom_b', 'custom_c', 'custom_d', 'custom_e',(later modified to be more elegant).
STANDARD
SIMPLE
WHITESPACE
STOP
KEYWORD
PATTERN
ENGLISH
FINGERPRINT
ALIWS
QQ_SMART
QQ_MAX
QQ_SMART_NER
QQ_MAX_NER
IK_SMART
IK_MAX_WORD
SMARTCN
CUSTOM_A
CUSTOM_B
CUSTOM_C
CUSTOM_D
CUSTOM_E