Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We have historically implemented Compose HIVE projects which does not have the provisioning zone functionality. We also have issues regarding large HDS tables and the current and full views that are not usable from our downstream tools.
Some options we are exploring - creating daily snapshots of the large tables in order to bypass the current/full view.
Version 9 has been touted to solve some of the performance issues, hence wait for version 9
1. Can one create a decoupled SPARK compose job and only create a Provisioning Zone and keep the creation of the Storage Zones with the HIVE project. If technically this is possible is this a good idea?
2. IS there a similar "exit" in the compose HIVE step where one could potentially hook in a "snapshot" insert statement into new schema.
3. Wait for version 9
Any feedback would be appreciated
HI @Corne - what cluster/hadoop platform are you on ? and do you have the ability to enable HIVE ACID transactions?
#1 - You cannot take a Spark project and use a Hive based storage layer. The projects are built to handle the end to end processing.
2. There is a Command Task which can run anything on the Compose server. This could be a script or program / executable which could connect to Hive and run additional statements. The Command task can be hooked to a Compose processing task via the Workflow feature.
I'm not sure what "Version 9" is. 🙂
Thanks - we are using Cloudera Data Platform. In short - no, we will not be able to use HIVE ACID transactions for the databases/tables in question.
We will investigate your command task option, thanks.
Sorry, got my versions mixed up, I meant version 7. (Although I am sure version 9 will be even better 😉 )