萬盛學電腦網

 萬盛學電腦網 >> 電腦基本常識 >> hive數據怎麼導入

hive數據怎麼導入

   1.通過外部表導入

  用戶在hive上建external表,建表的同時指定hdfs路徑,在數據拷貝到指定hdfs路徑的同時,也同時完成數據插入external表。

  例如:

  編輯文件test.txt

  $ cat test.txt

  1 hello

  2 world

  3 test

  4 case

  字段之間以't'分割

  啟動hive:

  $ hive

  建external表:

  hive> CREATE EXTERNAL TABLE MYTEST(num INT, name STRING)

  > COMMENT 'this is a test'

  > ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'

  > STORED AS TEXTFILE

  > LOCATION '/data/test';

  OK

  Time taken: 0.714 seconds

  hive> show tables;

  OK

  mytest

  partition_test

  partition_test_input

  test

  Time taken: 0.07 seconds

  hive> desc mytest ;

  OK

  num int

  name string

  Time taken: 0.121 seconds|

  數據拷貝到hdfs:

  $ hadoop fs -put test.txt /data/test

  查看hive表數據:

  hive> select * from mytest;

  OK

  1 hello

  2 world

  3 test

  4 case

  Time taken: 0.375 seconds

  hive> select num from mytest;

  Total MapReduce jobs = 1

  Launching Job 1 out of 1

  ......

  Total MapReduce CPU Time Spent: 510 msec

  OK

  1

  2

  3

  4

  Time taken: 27.157 seconds

  這種方式常常用於當hdfs上有一些歷史數據,而我們需要在這些數據上做一些hive的操作時使用。這種方式避免了數據拷貝開銷

  2.從本地導入

  數據不在hdfs上,直接從本地導入hive表

  文件/home/work/test.txt內容同上

  建表:

  hive> CREATE TABLE MYTEST2(num INT, name STRING)

  > COMMENT 'this is a test2'

  > ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'

  > STORED AS TEXTFILE;

  OK

  Time taken: 0.077 seconds

  導數據入表:

  hive> LOAD DATA LOCAL INPATH '/home/work/test.txt' INTO TABLE MYTEST2;

  Copying data from file:/home/work/test.txt

  Copying file: file:/home/work/test.txt

  Loading data to table default.mytest2

  OK

  Time taken: 0.24 seconds

  查看數據:

  hive> select * from MYTEST2;

  OK

  1 hello

  2 world

  3 test

  4 case

  Time taken: 0.11 seconds

  這種方式導入的本地數據可以是一個文件,一個文件夾或者通配符,需要注意的是,如果是文件夾,文件夾內不能包含子目錄,同樣,通配符只能通配文件。

  3.從hdfs導入

  上述test.txt文件已經導入/data/test

  則可以使用下述命令直接將數據導入hive表:

  hive> CREATE TABLE MYTEST3(num INT, name STRING)

  > COMMENT "this is a test3"

  > ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'

  > STORED AS TEXTFILE;

  OK

  Time taken: 4.735 seconds

  hive> LOAD DATA INPATH '/data/test/test.txt' INTO TABLE MYTEST3;

  Loading data to table default.mytest3

  OK

  Time taken: 0.337 seconds

  hive> select * from MYTEST3 ;

  OK

  1 hello

  2 world

  3 test

  4 case

  Time taken: 0.227 seconds

  4. 從其它表導入數據:

  hive> CREATE EXTERNAL TABLE MYTEST4(num INT) ;

  OK

  Time taken: 0.091 seconds

  hive> FROM MYTEST3 test3

  > INSERT OVERWRITE TABLE MYTEST4

  > select test3.num where name='world';

  Total MapReduce jobs = 2

  Launching Job 1 out of 2

  Number of reduce tasks is set to 0 since there's no reduce operator

  Starting Job = job_201207230024_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002

  Kill Command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002

  Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

  2012-07-23 18:59:02,365 Stage-1 map = 0%, reduce = 0%

  2012-07-23 18:59:08,417 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:09,435 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:10,445 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:11,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:12,470 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:13,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

  2012-07-23 18:59:14,508 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.62 sec

  MapReduce Total cumulative CPU time: 620 msec

  Ended Job = job_201207230024_0002

  Ended Job = -174856900, job is filtered out (removed at runtime).

  Moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

  Loading data to table default.mytest4

  Deleted hdfs://localhost:9000/user/hive/warehouse/mytest4

  Table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0]

  1 Rows loaded to mytest4

  MapReduce Jobs Launched:

  Job 0: Map: 1 Accumulative CPU: 0.62 sec HDFS Read: 242 HDFS Write: 2 SUCESS

  Total MapReduce CPU Time Spent: 620 msec

  OK

  Time taken: 30.663 seconds

  hive> select * from mytest4;

  OK

  2

  Time taken: 0.103 seconds

copyright © 萬盛學電腦網 all rights reserved