集群環境:CDH5.3.0
具體JAR版本如下:
spark版本:1.2.0-cdh5.3.0
hive版本:0.13.1-cdh5.3.0
hadoop版本:2.5.0-cdh5.3.0
啟動 JDBC server
cd /etc/spark/conf
ln -s /etc/hive/conf/hive-site.xml hive-site.xml
cd /opt/cloudera/parcels/CDH/lib/spark/
chmod- -R 777 logs/
cd /opt/cloudera/parcels/CDH/lib/spark/sbin
./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008
Connecting to the JDBC server with Beeline
cd /opt/cloudera/parcels/CDH/lib/spark/bin
beeline -u jdbc:hive2://hadoop04:10000
[root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000
scan complete in 2ms
Connecting to jdbc:hive2://hadoop04:10000
Connected to: Spark SQL (version 1.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.3.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.3.0 by Apache Hive
0: jdbc:hive2://hadoop04:10000>
Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.
CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
create table mytable(name string,addr string,status string) row format delimited fields terminated by '#'
#加載本地文件
load data local inpath '/external/tmp/data.txt' into table mytable
#加載hdfs文件
load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable;
describe mytable;
explain select * from mytable where name = '張三'
select * from mytable where name = '張三'
cache table mytable
select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz';
uncache table mytable
使用數據示例
張三#廣州#學生
李四#貴州#教師
王五#武漢#講師
趙六#成都#學生
lisa#廣州#學生
lily#gz#studene
Standalone Spark SQL Shell
Spark SQL also supports a simple shell you can use as a single process: spark-sql
它主要用於本地的開發環境,在***享集群環境中,請使用JDBC SERVER
cd /opt/cloudera/parcels/CDH/lib/spark/bin
./spark-sql