(5). hive的注释,同SQL注释’–‘
(6). 显示列表,需要设置hive.cli.print.header变量
hive> set hive.cli.print.header=true;
(7). 查看变量
hive> set;
…
hive> set-v;
… even more output!…
‘set’输出hivevar,hiveconf,system和env命名空间下的所有变量。
‘set -v’包括了输出Hadoop定义的全部变量。
hive> set hivevar:foo=hello;
hive> set hivevar:foo;
hivevar:foo=hello
(8). 使用变量:
hive> create table toss1(i int, ${hivevar:foo} string);
OK
Time taken: 0.652 seconds
hive> desc toss1;
OK
i int None
hello string None
Time taken: 0.055 seconds, Fetched: 2 row(s)
(9). 变量属于不同的命名空间。这些命名空间分别是:
namespace – access – description
hivevar – Read/Write – User-defined custom variables.
hiveconf – Read/Write – Hive-specific configuration properties.
system – Read/Write – Configuration properties defined by Java.
env – Read only – Environment variables defined by the shell environment (e.g., bash).
(10). 设置hiveconf命名空间变量
hiveconf是hive的配置变量,这里用hiveconf hive.cli.print.current.db=true ( It turns on printing of the current working database name in the CLI prompt.)
[hadoop@cloud011 ~]$ hive –hiveconf hive.cli.print.current.db=true
hive (default)> set hiveconf:hive.cli.print.current.db=false;
hive>
2. Hive batch mode
并使用hive直接执行SQL语句
hive> create table test(a string, b string) row format delimited fields terminated by ‘ ‘ stored as textfile;
OK
复制代码
构造一些数据并load
SHELL$ \
for (( i = 0; i < MAX ; i ++ ))
do
echo "a$i b$1" >> a1
done
Time taken: 0.565 seconds
复制代码
hive> load data local inpath ‘/home/hadoop/a1.txt’ into table test;
该模式是hive最常用的,启动了hiveserver以后,应用程序就可以通过jdbc等驱动来访问hive。
(HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results.)
SHELL$ hive –service hiveserver &
或者自己指定端口:
SHELL$ hive –service hiveserver -p 50000 &
/usr/local/jdk1.7.0_51/bin/java -Xmx256m
-Djava.library.path=/home/hadoop/hadoop-2.3.0/lib/native
-Djava.net.preferIPv4Stack=true
-Dhadoop.log.dir=/home/hadoop/hadoop-2.3.0/logs
-Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/home/hadoop/hadoop-2.3.0
-Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,console
-Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true -Xmx512m
-Dhadoop.security.logger=INFO,NullAppender
org.apache.hadoop.util.RunJar /home/hadoop/hive-0.12.0/lib/hive-service-0.12.0.jar org.apache.hadoop.hive.service.HiveServer -p 50000
5. hiveserver2
hive 0.11以后加入了hiveserver2的功能,这个是hiveserver的升级版,解决了daemon不稳定、并发请求(HiveServer cannot handle concurrent requests from more than one client)、session管理等问题。hiveserver2的相关参数:
SHELL$ hive -S -e “set” | grep server2
复制代码
hive.server2.async.exec.shutdown.timeout=10
hive.server2.async.exec.threads=50
hive.server2.authentication=NONE
hive.server2.enable.doAs=true
hive.server2.table.type.mapping=CLASSIC
hive.server2.thrift.bind.host=localhost
hive.server2.thrift.http.max.worker.threads=500
hive.server2.thrift.http.min.worker.threads=5
hive.server2.thrift.http.path=cliservice
hive.server2.thrift.http.port=10001
hive.server2.thrift.max.worker.threads=500
hive.server2.thrift.min.worker.threads=5
hive.server2.thrift.port=10000
hive.server2.thrift.sasl.qop=auth
hive.server2.transport.mode=binary
复制代码
由此可以看到,默认的port=10000 & bond host=localhost,最大线程数为500。
下面启动hiveserver2:
hiveserver和hiveserver2并不是以后台服务的形式运行,而命令行也只提供了设置hiveserver2参数的方法,关闭在后台运行的hiveserver2只有通过kill来实现了(cloudra和HDP中已经将hiveserver2设置成了一个service)。
[hadoop@cloud011 ~]$ hive –service hiveserver2 -H
usage: hiveserver2
-H,–help Print help information
–hiveconf Use value for given property
hiveserver2同时提供了一个client端的命令行工具beeline,[url=]官方文档[/url]上会给出详细使用说明。6. metastore