Sqoop views using Netezza

I pondered upon a use case to transfer netezza tables/views to hadoop system. The current flow that we are using are : 1. Netezza -> SAN 2. SAN -> S3 3. S3 -> hdfs

If there is no primary key for the table in netezza you will be forced to use -split-by option or -m option. Only use verbose if needed.

And the reverse to transfer to netezza. After analyzing the use case the best option i found was to use sqoop. We are using yarn queues hence ignore the -Dmapreduce.job.queuename=q1 this option if none is setup.

Transfer view

Sqoop doesnot allow you to write into existing directory so removing the directory before transferring

$ hdfs dfs -rm -R /apps/hive/warehouse/<hivedbname>.db/<hivetablename>

$ sqoop import -Dmapreduce.job.queuename=q1 
    --hive-import 
    --hive-database <hivedbname> 
    --hive-table <hivetablename> 
    --driver org.netezza.Driver 
    --direct 
    --connect jdbc:netezza://<host>:<port>/<netezzadbname> 
    --username <netezzauser> 
    --password <netezzapwd> 
    --table <netezza tablename> 
    --target-dir hdfs:///apps/hive/warehouse/<hivedbname>.db/<hivetablename> 
    -split-by <anycolumn>

If we dont use --driver org.netezza.Driver parameter the following error is encountered.

2017-07-18 09:34:53,079 ERROR [Thread-16] org.apache.sqoop.mapreduce.db.netezza.NetezzaJDBCStatementRunner: Unable to execute external table export
org.netezza.error.NzSQLException: ERROR:  Column reference "DATASLICEID" not supported for views
 at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:276)
 at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:73)
 at org.netezza.sql.NzConnection.execute(NzConnection.java:2673)
 at org.netezza.sql.NzStatement._execute(NzStatement.java:849)
 at org.netezza.sql.NzPreparedStatament.execute(NzPreparedStatament.java:152)
 at org.apache.sqoop.mapreduce.db.netezza.NetezzaJDBCStatementRunner.run(NetezzaJDBCStatementRunner.java:75)

End of LogType:syslog

Instead of split by option we can also use -m 1 , which transfers the data in one mapper & can be a bit slow.

Transfer a table

Sqoop doesnot allow you to write into existing directory so removing the directory before transferring

$ hdfs dfs -rm -R /apps/hive/warehouse/<hivedbname>.db/<hivetablename>

$ sqoop import 
    -Dmapreduce.job.queuename=q1 
    --verbose 
    --hive-import 
    --hive-database jijo 
    --direct 
    --connect jdbc:netezza://<host>:<port>/<netezzadbname> 
    --username <netezzauser> 
    --password <netezzapwd> 
    --table <netezza tablename>
    --target-dir hdfs:///apps/hive/warehouse/<hivedbname>.db/<hivetablename> 
    -m 1

Running analyze table <hivedbname>.<hivetablename> compute statistics would be ideal for hive running on tez execution engine.