top of page
Writer's picturezealous-tech

Scoop to Hive with Multilevel Partition table


 

Hey Guys,


I came across issue while running Sqoop import to a partitioned table, and found workaround for same, sharing my two cents..


Let’s begin…..

Create hive partitioned table at same time, import data:

 sqoop import --create-hive-table \
--connect jdbc:oracle:thin:@localhost:1521/hr.com  \
--username xx \
--password yyy  \
--table t_test_emp \
--hive-import \
--columns EMPNO,ENAME,JOB \
--hive-partition-key DEPTNO \
--split-by DEPTNO \
--hive-table hr_hive.emps -m1

No issues, no the big deal right? Let’s try this…


sqoop import --create-hive-table \
--connect jdbc:oracle:thin:@localhost:1521/hr.com \
--username xxx \
--password yyy \
--table t_test_emp2 \
--hive-import \
--columns EMPNO,ENAME,JOB \
--hive-partition-key DEPTNO,JOB \
--split-by DEPTNO \
--hive-table hr_hive.emps -m1

This has failed, what’s the difference? Here if you observe we have two level of partition DEPTNO,JOB

And previous was single level partition.

There isn’t a straight way to achieve this. Here comes the HCATLOG to rescue:

Hcatlog in brief:

HCatalog is a table storage management tool for Hadoop that exposes the tabular data of Hive metastore to other Hadoop applications.In short we can say hive is built over hcat.

With help of hcat we can do it in two steps:



For 4th Step we don’t need to do anything, as table will be automatically reflecting in hive.


Run the scripts:

a) create table (multilevel partitioned table)



sqoop import \
--connect jdbc:oracle:thin:@localhost:1521/hr.com \
--username xxx \
--password yyy \
--query 'select ENAME,JOB,MGR,HIREDATE,SAL,COMM from t_test_emp where $CONDITIONS AND 1=2' \
--create-hcatalog-table \
--hcatalog-database hr_hive \
--hcatalog-table test_part3 \
--hcatalog-partition-keys DEPTNO,EMPNO \
--hcatalog-partition-values 1,1 -m 1

b)import data to previously created table i.e. test_part3


sqoop \
--connect jdbc:oracle:thin:@localhost:1521/hr.com \
--username xxx \
--password yyy \
--table t_test_emp \
--columns EMPNO,DEPTNO,ENAME,JOB \
--hcatalog-database dev_hce_cs \
--hcatalog-table test_part3 \
--split-by DEPTNO

Hope you find this helpful,

Questions/Issues? Please comment below

27 views0 comments

Comentarios


bottom of page