Skip to main content

DataJob MetaData

Interface Overview

In the BladePipe DataJob configuration, it is difficult to understand two parts: one is to describe the structure of metadata such as data source database tables, and the other is the mapping relationship between source and target metadata.

This document describes how to understand the structure of metadata such as data source database tables.

For Example

The following structure generally appears in the DataJob configuration srcSchema or dstSchema, representing information such as a data source database table. This structure has several features:

  • Structure content nesting represents the hierarchy of data source metadata
  • Different data source types have different hierarchies
  • There are some elements that differ between data source types
  • The information described by each layer contains the common targetAutoCreate and inBlackList attributes, the former indicating whether the peer structure is automatically created and the latter indicating whether it is in the blacklist
[
{
"db": "dingtax",
"dbPattern": "",
"tables": [
{
"table": "access_table_111112222222333333333333344444444444444",
"tablePattern": "",
"columns": [
{
"column": "id",
"targetAutoCreate": true,
"inBlackList": false
},
{
"column": "guid",
"targetAutoCreate": true,
"inBlackList": false
}
],
"actions": [
"INSERT",
"UPDATE",
"DELETE"
],
"inBlackList": false,
"targetAutoCreate": true,
"specifiedPks": []
},
{
"table": "kbs_no_pk_have_uniq",
"tablePattern": "",
"columns": [
{
"column": "name",
"targetAutoCreate": false,
"inBlackList": false
},
{
"column": "uniq_id",
"targetAutoCreate": false,
"inBlackList": false
}
],
"actions": [
"INSERT"
],
"inBlackList": false,
"targetAutoCreate": false,
"specifiedPks": [
"uniq_id"
]
}
],
"targetAutoCreate": false,
"inBlackList": false
}
]

Common data source hierarchies and special attributes

Datasource NameFirst levelSecond levelThird levelFourth level
MySQLDatabase

db (Database name)
dbPattern (Expression to match database names, not used)
tables (List of table information, see second level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
specifiedPks (List of specified primary keys)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see third level description)
Column

column (Column name)
None
PolarDbMySQLDatabase

db (Database name)
dbPattern (Expression to match database names, not used)
tables (List of table information, see second level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
specifiedPks (List of specified primary keys)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see third level description)
Column

column (Column name)
None
PostgreSQLDatabase

db (Database name)
schemas (List of pg schemas, see second level description)
Schema

schema (Schema name)
schemaPattern (Expression to match schema names, not used)
tables (List of table information, see third level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see fourth level description)
Column

column (Column name)
GreenplumDatabase

db (Database name)
schemas (List of pg schemas, see second level description)
Schema

schema (Schema name)
schemaPattern (Expression to match schema names, not used)
tables (List of table information, see third level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see fourth level description)
Column

column (Column name)
OracleDatabase

db (Database name)
tableSpaces (List of oracle schemas, see second level description)
Schema

tableSpace (Schema name)
tableSpacePattern (Expression to match schema names, not used)
tables (List of table information, see third level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see fourth level description)
Column

column (Column name)
SQLServerNot availableNot availableNot availableNot available
RedisNamespace for cache key

prefix (Prefix for cache key namespace)
suffixFields (List of properties for cache key namespace)
NoneNoneNone
ElasticSearchIndex

indexName (Name of the index)
idFieldNames (List of property names to construct the primary key of ES)
numberOfShards (Number of shards, type int)
numberOfReplicas (Number of replicas, type int)
globalTimeZone (Timezone string)
fields (List of column information, see second level description)
Field

fieldName (Field name)
fieldTypeName (Type of field in ES)
needIndex (Whether the field needs to be indexed, type boolean)
timeFormat (Time format)
esAnalyzerType (Analyzer type, see esAnalyzerType description below)
needAutoCreated (Whether the field should be automatically created, type boolean)
AdbForMySQLDatabase

db (Database name)
dbPattern (Expression to match database names, not used)
tables (List of table information, see second level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see third level description)
Column

column (Column name)
None
TiDBDatabase

db (Database name)
dbPattern (Expression to match database names, not used)
tables (List of table information, see second level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see third level description)
Column

column (Column name)
None
ClickHouseDatabase

db (Database name)
dbPattern (Expression to match database names, not used)
tables (List of table information, see second level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
columns (List of columns, see third level description)
Column

column (Column name)
None
KuduTable

table (Table name)
tablePattern (Expression to match table names, not used)
dataFilter (JSON structure, see dataFilter structure description below)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
partitions (List of partitions, see second level description)
columns (List of columns, see second level description)
Partition

columns (Columns of the partition)
partitionType (Partition type, possible values are Range and Hash)
Column

column (Column name)
NoneNone
MongoDBDatabase

db (Database name)
collections (List of collections, see second level description)
Collection

collection (Collection name)
actions (List of actions, currently including INSERT, UPDATE, and DELETE)
NoneNone
KafkaTopic

topic (Topic name)
topicPattern (Expression to match topic names, not used)
partitions (Number of partitions, type integer)
partitionKeys (List of partition keys)
NoneNoneNone
RocketMQTopic

topic (Topic name)
topicPattern (Expression to match topic names, not used)
partitions (Number of partitions, type integer)
partitionKeys (List of partition keys)
NoneNoneNone
RabbitMQQueue

queue (Queue name)
queuePattern (Expression to match queue names, not used)
NoneNoneNone
HiveDatabase

db (Database name)
dbPattern (Expression to match database names, not used)
tables (List of table information, see second level description)
Table

table (Table name)
tablePattern (Expression to match table names, not used)
partitionKeys (List of partition keys, see partitionKeys structure description below)
columns (List of columns, see third level description)
Column

column (Column name)
None

Data Filter structure description

Property NameProperty Description
TypeData filter type

SQL_WHERE
JAVA_CODE (not implemented)
REGULAR_EXPRESSION (not implemented)
AVIATOR_EXPRESSION (not implemented)
ExpressionData filter expression corresponding to the type

Partition Keys Structure description

Property NameProperty Description
Key NameName of the partition key
OriginColData source column
PartitionFunctionPartition method

EQUAL
YEAR_FORMAT
MONTH_FORMAT
DAY_FORMAT
HOUR_FORMAT
MINUTE_FORMAT

EsAnalyzer Type Description

user-defined classifier. In ElasticSearch, the word splitter needs to be named to the corresponding lowercase string, i.e. 'custom_a', 'custom_b', 'custom_c', 'custom_d', 'custom_e',(later modified to be more elegant).

STANDARD
SIMPLE
WHITESPACE
STOP
KEYWORD
PATTERN
ENGLISH
FINGERPRINT

ALIWS

QQ_SMART
QQ_MAX
QQ_SMART_NER
QQ_MAX_NER

IK_SMART
IK_MAX_WORD
SMARTCN

CUSTOM_A
CUSTOM_B
CUSTOM_C
CUSTOM_D
CUSTOM_E