pyflink消費kafka-connect-jdbc消息(帶schema)

1、數據接入

通過kafka的restFul接口創建連接mysql的連接器並啟動。

{

"name": "mysql_stream_test",

"config": {

"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",

"timestamp.column.name": "",

"incrementing.column.name": "ID",

"connection.password": "",

"validate.non.null": true,

"tasks.max": 1,

"batch.max.rows": 100,

"table.whitelist": "baseqx.test_demo",

"mode": "incrementing",

"topic.prefix": "mysql_",

"connection.user": "",

"poll.interval.ms": 5000,

"numeric.mapping": "best_fit",

"connection.url": "jdbc:mysql://xxx.xxx.xxx.xxx:3306/baseqx?useUnicode=true&characterEncoding=utf8&allowMultiQueries=true"

}

2.kafka-connect創建主題中的默認數據格式為

{"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"field":"ID"},{"type":"string","optional":false,"field":"NAME"},{"type":"int64","optional":false,"name":"org.apache.kafka.connect.data.Timestamp","version":1,"field":"CREATE_TIME"}],"optional":false,"name":"test_demo"},"payload":{"ID":1,"NAME":"prestoEtl","CREATE_TIME":1606902182000}}

3.使用pyflink消費帶schema的消息

#!/usr/bin/python3.7

# -*- coding: UTF-8 -*-

from pyflink.datastream import StreamExecutionEnvironment, CheckpointingMode

from pyflink.table import StreamTableEnvironment, TableConfig, SqlDialect

s_env = StreamExecutionEnvironment.get_execution_environment()

s_env.set_parallelism(1)

st_env = StreamTableEnvironment.create(s_env, TableConfig())

st_env.get_config().set_python_executable("python3")

st_env.use_catalog("default_catalog")

st_env.use_database("default_database")

# DML上可以固定schema為字符串, 用 ROW 函數封裝 payload