创建数据(文本以tab分隔)
创建数据
touch test.txt
vim test.txt
创建以Tab分隔
1 zhangsan 20 Beijing
2 lisi 13 changsha
3 wangwu 31 chengdu
4 liuliu 14 kunming
5 mingming 21 guangzhou
6 huahua 51 xian
11 xiaoxia 26 tianjin
创建新表
创建表结构
hive> CREATE TABLE test_hive (id int, name String, age int, addr String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 1.629 seconds
导入数据test.txt到test_hive表
导入数据
hive> LOAD DATA LOCAL INPATH '/home/hadoop/test.txt' OVERWRITE INTO TABLE test_hive;
Loading data to table default.test_hive
OK
Time taken: 1.199 seconds
操作数据表
#查看有哪些表
hive> show tables;
OK
test_hive
Time taken: 0.106 seconds, Fetched: 1 row(s)
#正则查询表
hive> show tables '*test*';
OK
test_hive
#查询表
hive> select * from test_hive;
OK
1 zhangsan 20 Beijing
2 lisi 13 changsha
3 wangwu 31 chengdu
4 liuliu 14 kunming
5 mingming 21 guangzhou
6 huahua 51 xian
11 xiaoxia 26 tianjin
Time taken: 0.116 seconds, Fetched: 7 row(s)
#查询表结构
hive> desc test_hive;
OK
id int
name string
age int
addr string
Time taken: 0.068 seconds, Fetched: 4 row(s)
#增加一列属性
hive> ALTER TABLE test_hive ADD COLUMNS (mobile String);
OK
Time taken: 0.134 seconds
hive> select * from test_hive;
OK
1 zhangsan 20 Beijing NULL
2 lisi 13 changsha NULL
3 wangwu 31 chengdu NULL
4 liuliu 14 kunming NULL
5 mingming 21 guangzhou NULL
6 huahua 51 xian NULL
11 xiaoxia 26 tianjin NULL
Time taken: 0.117 seconds, Fetched: 7 row(s)
变更表内容,支持ACID
要支持update操作跟delete操作,必须额外再配置一些东西。
做这些操作其实是相当耗时的,需要启动MapReduce,然后启动任务去执行修改。
下面简单举例。
具体可参考官方wiki:
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions
修改hive-site.xml,然后重启hive
hive.support.concurrency – true
hive.enforce.bucketing – true (Hive 2.0以后可以不用设置)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on – true
hive.compactor.worker.threads – 1
<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>
Whether Hive supports concurrency control or not.
A ZooKeeper instance must be up and running when using zookeeper Hive lock manager
</description>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
<description>
In strict mode, the user must specify at least one static partition
in case the user accidentally overwrites all partitions.
In nonstrict mode all partitions are allowed to be dynamic.
</description>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
<description>
Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive
transactions, which also requires appropriate settings for hive.compactor.initiator.on,
hive.compactor.worker.threads, hive.support.concurrency (true), hive.enforce.bucketing
(true), and hive.exec.dynamic.partition.mode (nonstrict).
The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides
no transactions.
</description>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>on</value>
<description>
Whether to run the initiator and cleaner threads on this metastore instance or not.
Set this to true on one instance of the Thrift metastore service as part of turning
on Hive transactions. For a complete list of parameters required for turning on
transactions, see hive.txn.manager.
</description>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
<description>
How many compactor worker threads to run on this metastore instance. Set this to a
positive number on one or more instances of the Thrift metastore service as part of
turning on Hive transactions. For a complete list of parameters required for turning
on transactions, see hive.txn.manager.
Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions
themselves. Increasing the number of worker threads will decrease the time it takes
tables or partitions to be compacted once they are determined to need compaction.
It will also increase the background load on the Hadoop cluster as more MapReduce jobs
will be running in the background.
</description>
</property>
create table test_acid
(id int, name String, age int, addr String)
clustered by (id) into 2 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
stored as orc TBLPROPERTIES ('transactional'='true');
OK
Time taken: 2.894 seconds
hive> show tables;
OK
test_acid
test_hive
Time taken: 0.22 seconds, Fetched: 2 row(s)
#导入数据
from test_hive
insert into table test_acid
select id,name,age,addr;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = hadoop_20160527165406_6671a29e-3c04-4e41-8b20-8c1662830433
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1464339368801_0001, Tracking URL = http://localhost:8088/proxy/application_1464339368801_0001/
Kill Command = /opt/hadoop/hadoop-2.7.2/bin/hadoop job -kill job_1464339368801_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2016-05-27 16:56:26,593 Stage-1 map = 0%, reduce = 0%
2016-05-27 16:56:35,189 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2016-05-27 16:56:47,356 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 3.85 sec
2016-05-27 16:56:48,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.32 sec
MapReduce Total cumulative CPU time: 6 seconds 320 msec
Ended Job = job_1464339368801_0001
Loading data to table default.test_acid
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 2 Cumulative CPU: 6.32 sec HDFS Read: 12460 HDFS Write: 1892 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 320 msec
OK
Time taken: 164.028 seconds
#查询数据
hive> select * from test_acid;
OK
6 huahua 51 xian
4 liuliu 14 kunming
2 lisi 13 changsha
11 xiaoxia 26 tianjin
5 mingming 21 guangzhou
3 wangwu 31 chengdu
1 zhangsan 20 Beijing
#更新数据
hive> update test_acid set age=99 where id = 1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = hadoop_20160527165857_2a47933b-aede-40d4-87cc-281e04b60fe2
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1464339368801_0002, Tracking URL = http://localhost:8088/proxy/application_1464339368801_0002/
Kill Command = /opt/hadoop/hadoop-2.7.2/bin/hadoop job -kill job_1464339368801_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 2
2016-05-27 16:59:07,223 Stage-1 map = 0%, reduce = 0%
2016-05-27 16:59:17,235 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.07 sec
2016-05-27 16:59:19,348 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.24 sec
2016-05-27 16:59:32,275 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.33 sec
MapReduce Total cumulative CPU time: 8 seconds 330 msec
Ended Job = job_1464339368801_0002
Loading data to table default.test_acid
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 2 Cumulative CPU: 8.33 sec HDFS Read: 23537 HDFS Write: 935 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 330 msec
OK
Time taken: 37.032 seconds
#查询更改之后的数据
hive> select * from test_acid;
OK
6 huahua 51 xian
4 liuliu 14 kunming
2 lisi 13 changsha
11 xiaoxia 26 tianjin
5 mingming 21 guangzhou
3 wangwu 31 chengdu
1 zhangsan 99 Beijing
Time taken: 0.159 seconds, Fetched: 7 row(s)
由此可见,对于实时更新数据,Hadoop执行效率其实相当低。
其它操作
1) 创建与已知表相同结构的表Like:
只复制表的结构,而不复制表的内容。
create table test_like_table like test_bucket;
2) 对表进行重命名 rename to:
ALTER TABLE table_name RENAME TO new_table_name
3) 增加分区 Add Partitions:
ALTER TABLE table_name ADD partition_spec [ LOCATION 'location1' ]partition_spec [ LOCATION 'location2' ]
4) 对表中的某一列进行修改,包括列的名称/列的数据类型/列的位置/列的注释
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type[COMMENT col_comment] [FIRST|AFTER column_name]
5) 添加/替换列Add/ReplaceColumns
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENTcol_comment], ...)
ADD COLUMNS 允许用户在当前列的末尾增加新的列,但是在分区列之前。
6) 创建表的完整语句:
Create [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)]INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
原文链接:Hadoop学习笔记(11)--Hive基础使用,转载请注明来源!