Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
In my database, there are 100+ tables, I want to export all of them to Hive, Is there a simple way to do this? I really don't want to create 100+ "import/export flow" to do this.
Thanks.
Honestly, I would use Sqoop: you can query the database to list the tables, then iterate over that into a Sqoop "select * from <table>" query. I've done something similar to execute the same query on several hundred databases that shared a common schema. Write the Sqoop job as a separate job, then run it from a parent job as:
tMSSQLInput > tFlowToIterate > tRunjob
tMSSQLInput gets you the list of table names. tFlowToIterate can be used to populate a context variable with the current table name; you'll pass that variable to the Sqoop job using tRunjob. Keeping the jobs separate makes it easier to handle database connectivity/timeout issues, since it isolates each table load so that a single failure doesn't kill the entire job (you'll need to set up tRunjob to not die on error for this to work).
Hope this helps,
David
Honestly, I would use Sqoop: you can query the database to list the tables, then iterate over that into a Sqoop "select * from <table>" query. I've done something similar to execute the same query on several hundred databases that shared a common schema. Write the Sqoop job as a separate job, then run it from a parent job as:
tMSSQLInput > tFlowToIterate > tRunjob
tMSSQLInput gets you the list of table names. tFlowToIterate can be used to populate a context variable with the current table name; you'll pass that variable to the Sqoop job using tRunjob. Keeping the jobs separate makes it easier to handle database connectivity/timeout issues, since it isolates each table load so that a single failure doesn't kill the entire job (you'll need to set up tRunjob to not die on error for this to work).
Hope this helps,
David