Solved: Join between 4 tables - Qlik Community

Telaviv · ‎2024-01-25

Hello everyone, I have a question. I have a source (db2) and a target (Kafka). I do transformations at the Qlik replicate level for each field from different tables, thereby simulating a join, which, as I understand it, does not work efficiently and generates a separate select for each field every time, if I have 4 tables and I want to get an output from 4 tables and in total there will be 40 columns that will go to the topic(thru Avro). But does the Qlik replicate have a mechanism that allows me to optimize the join that will get these 40 columns with one select join? As I understand it, the parallel loading option is only available when there is a full load and up to 10 fields, and then this function will not be able to help me with optimization and performance (if my table has 5-10k entries added to it in a short time). Is there a pill from Qlik that will help me make an optimized join from several tables and send this entity to Kafka? Thanks.

Heinvandenheuvel · ‎2024-01-26

Replicate has a decent amount of flexibility, but it is NOT a programming language. You can bend it some, but really you should look to stick to the basic one table in, one table out.

Sure you can use SOURCE_LOOKUP to select data from other tables, and you can cache some results to attempt to minimize the overhead but in the end you have to be ready to do a synchronous roundtrip to the source for each column requested one... at... a... time ....

The only cheat I can see is to 'glue' multiple columns together, perhaps as a CSV string, and peel them apart later on the receiving end, but that seems hardly practical.

>> he parallel loading option is only available when there is a full load and up to 10 fields,

I suspect you read it wrong. Parallel loading is for tables (partitions) not for fields and the default is 5 but you can pick more. I just tried 99 and it happily accepted.

>> (if my table has 5-10k entries added to it in a short time

For full-load only you could/should consider a VIEW from the multiple source table. But the 'short term' remark suggests CDC. The best option for that is just to stick to the Replicate design starting point Replicating all base tables and do the join on the target. Of course that suggest a proper Relational DB at the target end. I wouldn't know whether that is a viable suggestion with something like Kafka as a target, but then I never understood the desirability of Kafka and the likes. I draw the line at Snowflake 🙂 ;-).

hth,

Hein.

View solution in original post

deepaksahirwar · ‎2024-01-26

Dear @Telaviv ,

There are two options you can use to join multiple source tables in Qlik Replicate:

Use the SOURCE_LOOKUP data enrichment function. This function allows you to look up values from another source table and add them to the current table. You can use this function in the Transformations tab of the Table Settings dialog box. You can find more details and examples in the User Guide.
Use a VIEW in the source database. This option requires you to create a view in the source database that joins the tables and filters the rows as you need. Then you can access the view in Qlik Replicate and replicate the records to the target. For this option, you need to use the generic ODBC driver to access the source database, and set the task type to Full Load Only.

I hope this helps you.😊

If our response has been helpful, please consider clicking "Accept as Solution". This will assist other users in easily finding the answer

Regards,
Deepak

View solution in original post

Telaviv · ‎2024-01-26

Thanks Deepak for your answer. Qlik rep is not able to optimise a select based on the all fields that were used in output with source lookup? I mean , when I add one more transformation with additional field from another table, delay increases through I am using the same table to make a join, I guess that Qlik is not able to generate(optimise) a one select using 7 fields from the same table, he will use a separated select for each column which is involved in the join , even if it will be the same table..? Another question is please let me know if I am able to use a parallel load with the case which I mentioned above, thanks(4 tables , summary 40columns)

View solution in original post

john_wang · ‎2024-01-26

Hello @Telaviv ,

Nope, it's hard to get such a complex cofiguration within Replicate SOURCE_LOOKUP. Both the task design and runtime performance is not good.

If you are willing to join the 4 tables in Full Load stage, then VIEW (or logical file in DB2i) is the best option: easy to maintain and simple in the task design.

If you are intend to join the 4 tables in CDC stage, as @Heinvandenheuvel suggestion, one to one is best. but looks to me you are seeking a solution for full load, is that right?

Best Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

Telaviv · ‎2024-01-26

Thanks for reply, currently I will use the full load once and in the future my table will be replenished with a couple of thousand every minute, I had thought to create a view on the side of source db2 that will be updated each time from some tables, but Qlik rep is able to make a full load with View only, correct?If you have any thoughts on what to do at the source level, I would be grateful, or does Qlik have a custom version of adding a script when working with Joins or another product that is designed to work in this direction, I understand that I can replicate tables to topics and use Kafka streams to make a join on the target level, but now I would like to find a solution to implement this strategy on the level of source, thank you

View solution in original post

john_wang · ‎2024-01-26

Hello @Telaviv ,

but Qlik rep is able to make a full load with View only, correct?

You are right. Qlik Replicate is log-based CDC product, it gets changes data from database transaction log of each table.

The VIEW (most of the VIEW in various relational database systems), anyhow, its data are not recorded into transaction log (eg Journal of DB2i). so far Replicate has no way to support VIEW in a CDC task, that's why VIEW can be added to a Full Load ONLY task.

I understand that I can replicate tables to topics and use Kafka streams to make a join on the target level, but now I would like to find a solution to implement this strategy on the level of source

Each table's change data will be recorded into Transaction log independently. It's hard to 'join' them together within the replication tool during the CDC replication, it's same for Qlik Replicate. Personally I do not see there is a solution to do it in source side.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

Heinvandenheuvel · ‎2024-01-26

Replicate has a decent amount of flexibility, but it is NOT a programming language. You can bend it some, but really you should look to stick to the basic one table in, one table out.

Sure you can use SOURCE_LOOKUP to select data from other tables, and you can cache some results to attempt to minimize the overhead but in the end you have to be ready to do a synchronous roundtrip to the source for each column requested one... at... a... time ....

The only cheat I can see is to 'glue' multiple columns together, perhaps as a CSV string, and peel them apart later on the receiving end, but that seems hardly practical.

>> he parallel loading option is only available when there is a full load and up to 10 fields,

I suspect you read it wrong. Parallel loading is for tables (partitions) not for fields and the default is 5 but you can pick more. I just tried 99 and it happily accepted.

>> (if my table has 5-10k entries added to it in a short time

For full-load only you could/should consider a VIEW from the multiple source table. But the 'short term' remark suggests CDC. The best option for that is just to stick to the Replicate design starting point Replicating all base tables and do the join on the target. Of course that suggest a proper Relational DB at the target end. I wouldn't know whether that is a viable suggestion with something like Kafka as a target, but then I never understood the desirability of Kafka and the likes. I draw the line at Snowflake 🙂 ;-).

hth,

Hein.

deepaksahirwar · ‎2024-01-26

Dear @Telaviv ,

There are two options you can use to join multiple source tables in Qlik Replicate:

Use the SOURCE_LOOKUP data enrichment function. This function allows you to look up values from another source table and add them to the current table. You can use this function in the Transformations tab of the Table Settings dialog box. You can find more details and examples in the User Guide.
Use a VIEW in the source database. This option requires you to create a view in the source database that joins the tables and filters the rows as you need. Then you can access the view in Qlik Replicate and replicate the records to the target. For this option, you need to use the generic ODBC driver to access the source database, and set the task type to Full Load Only.

I hope this helps you.😊

If our response has been helpful, please consider clicking "Accept as Solution". This will assist other users in easily finding the answer

Regards,
Deepak

Telaviv · ‎2024-01-26

Thanks Deepak for your answer. Qlik rep is not able to optimise a select based on the all fields that were used in output with source lookup? I mean , when I add one more transformation with additional field from another table, delay increases through I am using the same table to make a join, I guess that Qlik is not able to generate(optimise) a one select using 7 fields from the same table, he will use a separated select for each column which is involved in the join , even if it will be the same table..? Another question is please let me know if I am able to use a parallel load with the case which I mentioned above, thanks(4 tables , summary 40columns)

john_wang · ‎2024-01-26

Hello @Telaviv ,

Nope, it's hard to get such a complex cofiguration within Replicate SOURCE_LOOKUP. Both the task design and runtime performance is not good.

If you are willing to join the 4 tables in Full Load stage, then VIEW (or logical file in DB2i) is the best option: easy to maintain and simple in the task design.

If you are intend to join the 4 tables in CDC stage, as @Heinvandenheuvel suggestion, one to one is best. but looks to me you are seeking a solution for full load, is that right?

Best Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Telaviv · ‎2024-01-26

Thanks for reply, currently I will use the full load once and in the future my table will be replenished with a couple of thousand every minute, I had thought to create a view on the side of source db2 that will be updated each time from some tables, but Qlik rep is able to make a full load with View only, correct?If you have any thoughts on what to do at the source level, I would be grateful, or does Qlik have a custom version of adding a script when working with Joins or another product that is designed to work in this direction, I understand that I can replicate tables to topics and use Kafka streams to make a join on the target level, but now I would like to find a solution to implement this strategy on the level of source, thank you

john_wang · ‎2024-01-26

Hello @Telaviv ,

but Qlik rep is able to make a full load with View only, correct?

You are right. Qlik Replicate is log-based CDC product, it gets changes data from database transaction log of each table.

The VIEW (most of the VIEW in various relational database systems), anyhow, its data are not recorded into transaction log (eg Journal of DB2i). so far Replicate has no way to support VIEW in a CDC task, that's why VIEW can be added to a Full Load ONLY task.

I understand that I can replicate tables to topics and use Kafka streams to make a join on the target level, but now I would like to find a solution to implement this strategy on the level of source

Each table's change data will be recorded into Transaction log independently. It's hard to 'join' them together within the replication tool during the CDC replication, it's same for Qlik Replicate. Personally I do not see there is a solution to do it in source side.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Join between 4 tables

Configuration

Functionality