1.waterdrop下载
下载链接:https://github.com/InterestingLab/waterdrop/releases  根据waterdrop官网提示:我本地测试环境spark版本是2.4.8,waterdrop版本是1.5.1。
先解压缩:tar -zxvf waterdrop-1.5.1.zip -C /opt/
然后修改配置文件:vim /opt/waterdrop-1.5.1/config/waterdrop-env.sh
SPARK_HOME=/opt/spark-2.4.8-bin-hadoop2.7
2.kudu数据准备
kudu表结构如下图所示: 
3.clickhouse建表
CREATE TABLE test.user_info
(
`id` String,
`name` String,
`sex` String,
`city` String,
`occupation` String,
`tel` String,
`fixPhoneNum` String,
`bankName` String,
`address` String,
`marriage` String,
`childNum` String
)
ENGINE = MergeTree
ORDER BY id;
4.配置文件
spark {
spark.app.name = "Waterdrop"
spark.executor.instances = 1
spark.executor.cores = 1
spark.executor.memory = "1g"
}
input {
kudu{
kudu_master="node04:7051"
kudu_table="user_info"
result_table_name="user_info"
}
}
filter {
}
output {
clickhouse {
source_table_name="user_info"
host = "node04:8123"
clickhouse.socket_timeout = 50000
database = "test"
table = "user_info"
fields = ["id","name","sex","city","occupation","tel","fixPhoneNum","bankName","address","marriage","childNum"]
username = ""
password = ""
bulk_size = 20000
}
}
5.效果展示
启动任务:/opt/waterdrop-1.5.1/bin/start-waterdrop.sh --master local[1] --deploy-mode client --config /opt/waterdrop-1.5.1/config/kudu2clickhouse.conf
查询clickhouse表:select * from test.user_info; 
|