如何使用Spark SQL 的JDBC server

運行環境

集群環境：CDH5.3.0

具體JAR版本如下：

spark版本：1.2.0-cdh5.3.0

hive版本：0.13.1-cdh5.3.0

hadoop版本：2.5.0-cdh5.3.0

啟動 JDBC server

cd /etc/spark/conf

ln -s /etc/hive/conf/hive-site.xml hive-site.xml

cd /opt/cloudera/parcels/CDH/lib/spark/

chmod- -R 777 logs/

cd /opt/cloudera/parcels/CDH/lib/spark/sbin

./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008

Connecting to the JDBC server with Beeline

cd /opt/cloudera/parcels/CDH/lib/spark/bin

beeline -u jdbc:hive2://hadoop04:10000

[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000

scan complete in 2ms

Connecting to jdbc:hive2://hadoop04:10000

Connected to: Spark SQL (version 1.2.0)

Driver: Hive JDBC (version 0.13.1-cdh5.3.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

Beeline version 0.13.1-cdh5.3.0 by Apache Hive

0: jdbc:hive2://hadoop04:10000>

Working with Beeline

Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.

CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'

#加載本地文件

load data local inpath '/external/tmp/data.txt' into table mytable

#加載hdfs文件

load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;

describe mytable;

explain select * from mytable where name = '張三'

select * from mytable where name = '張三'

cache table mytable

select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';

uncache table mytable

使用數據示例

張三#廣州#學生

李四#貴州#教師

王五#武漢#講師

趙六#成都#學生

lisa#廣州#學生

lily#gz#studene

Standalone Spark SQL Shell

Spark SQL also supports a simple shell you can use as a single process: spark-sql

它主要用於本地的開發環境，在***享集群環境中，請使用JDBC SERVER

cd /opt/cloudera/parcels/CDH/lib/spark/bin

./spark-sql

上一篇:股票bias指標有幾根線

下一篇:誰知道NBA掘金隊的卡梅隆安東尼的詳細個人資料？

如何寫引導句？

數據挖掘的經典算法

急求用Dreamweaver做成績查詢系統高分

GoodERP為什麽選用Python作為開發語言

Photon如何升級版本

開源義務有哪些

Dfm源代碼描述