Sqoop views using Netezza
I pondered upon a use case to transfer netezza tables/views to hadoop system. The current flow that we are using are : 1. Netezza -> SAN 2. SAN -> S3 3. S3 -> hdfs
If there is no primary key for the table in netezza you will be forced to use -split-by option or -m option. Only use verbose if needed.
And the reverse to transfer to netezza. After analyzing the use case the best option i found was to use sqoop. We are using yarn queues hence ignore the -Dmapreduce.job.queuename=q1
this option if none is setup.
Transfer view
Sqoop doesnot allow you to write into existing directory so removing the directory before transferring
$ hdfs dfs -rm -R /apps/hive/warehouse/<hivedbname>.db/<hivetablename>
$ sqoop import -Dmapreduce.job.queuename=q1
--hive-import
--hive-database <hivedbname>
--hive-table <hivetablename>
--driver org.netezza.Driver
--direct
--connect jdbc:netezza://<host>:<port>/<netezzadbname>
--username <netezzauser>
--password <netezzapwd>
--table <netezza tablename>
--target-dir hdfs:///apps/hive/warehouse/<hivedbname>.db/<hivetablename>
-split-by <anycolumn>
If we dont use --driver org.netezza.Driver
parameter the following error is encountered.
2017-07-18 09:34:53,079 ERROR [Thread-16] org.apache.sqoop.mapreduce.db.netezza.NetezzaJDBCStatementRunner: Unable to execute external table export
org.netezza.error.NzSQLException: ERROR: Column reference "DATASLICEID" not supported for views
at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:276)
at org.netezza.internal.QueryExecutor.execute(QueryExecutor.java:73)
at org.netezza.sql.NzConnection.execute(NzConnection.java:2673)
at org.netezza.sql.NzStatement._execute(NzStatement.java:849)
at org.netezza.sql.NzPreparedStatament.execute(NzPreparedStatament.java:152)
at org.apache.sqoop.mapreduce.db.netezza.NetezzaJDBCStatementRunner.run(NetezzaJDBCStatementRunner.java:75)
End of LogType:syslog
Instead of split by option we can also use -m 1 , which transfers the data in one mapper & can be a bit slow.
Transfer a table
Sqoop doesnot allow you to write into existing directory so removing the directory before transferring
$ hdfs dfs -rm -R /apps/hive/warehouse/<hivedbname>.db/<hivetablename>
$ sqoop import
-Dmapreduce.job.queuename=q1
--verbose
--hive-import
--hive-database jijo
--direct
--connect jdbc:netezza://<host>:<port>/<netezzadbname>
--username <netezzauser>
--password <netezzapwd>
--table <netezza tablename>
--target-dir hdfs:///apps/hive/warehouse/<hivedbname>.db/<hivetablename>
-m 1
Running
analyze table <hivedbname>.<hivetablename> compute statistics
would be ideal for hive running on tez execution engine.