hive (default)> insert into student values(1,"zhangsan");
4)如果没有报错就表示成功了
hive (default)> select * from student;
1 zhangsan
复制代码
2.3.4 小结
1)运行Tez时检查到用过多内存而被NodeManager杀死进程问题:
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103
For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
[摘录] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.
load data local inpath '/opt/module/data/bigtable.lzo' into table bigtable;
(3)测试(建索引之前),观察map个数(1个)
select id,count(*) from bigtable group by id limit 10;
(4)建索引
hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /user/hive/warehouse/bigtable
(5)测试(建索引之后),观察map个数(2个)
select id,count(*) from bigtable group by id limit 10;
load data inpath '/origin_data/gmall/log/topic_start/2020-10-14' into table gmall.ods_start_log partition(dt='2019-12-14');
注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式
复制代码
3)查看是否加载成功
hive (gmall)> select * from ods_start_log limit 2;
复制代码
4)为lzo压缩文件创建索引
hadoop jar /opt/module/hadoop/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_start_log/dt=2020-10-14
load data inpath '/origin_data/gmall/log/topic_event/2020-10-14' into table gmall.ods_event_log partition(dt='2020-10-14');
注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式
复制代码
3)查看是否加载成功
hive (gmall)> select * from ods_event_log limit 2;
复制代码
4)为lzo压缩文件创建索引
hadoop jar /opt/module/hadoop/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_event_log/dt=2020-10-14
load data inpath '/origin_data/gmall/log/topic_start/$do_date' into table "$APP".ods_start_log partition(dt='$do_date');
load data inpath '/origin_data/gmall/log/topic_event/$do_date' into table "$APP".ods_event_log partition(dt='$do_date');
"
$hive -e "$sql"
$hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_start_log/dt=$do_date
$hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_event_log/dt=$do_date
说明1:
[ -n 变量值 ] 判断变量的值,是否为空
-- 变量的值,非空,返回true
-- 变量的值,为空,返回false
说明2:
查看date命令的使用,[kgg@hadoop102 ~]$ date --help
复制代码
2)增加脚本执行权限
[kgg@hadoop102 bin]$ chmod 777 ods_log.sh
复制代码
3)脚本使用
[kgg@hadoop102 module]$ ods_log.sh 2019-02-11
复制代码
4)查看导入数据
hive (gmall)>
select * from ods_start_log where dt='2019-02-11' limit 2;
select * from ods_event_log where dt='2019-02-11' limit 2;