Qlik Replicate: Lookup and filter sourcetables

till_bentz · ‎2023-11-13

Hello,

I have the following problem when replicating tables from an Oracle source:

I now only want to load the rows from table A for which Def A has the value 50 or 60.

As a final outcome, only just under 50 million rows of table A are required.

My attempt to add Def A via source_lookup (source_lookup('NO_CACHING','x','y','b.ID_B FROM Table_A a, Table_B b WHERE a.ID_B = b.ID_B and a.ID_A=:1 --', 'z',$ID_A)) leads to a massive drop in performance (17000 rows/second -> 1000 rows/second).

Is there a better solution?

Thanks a lot!
Till

Dana_Baldwin · ‎2023-11-13

Hi @till_bentz

Source lookup is generally used during change processing, I think you're looking for a pass thru filter which uses a where clause on the select query so only the desired rows are captured during the full load. You can also set a filter to be used during change processing to ensure only the data you want comes through.

Please refer to these knowledge articles:

Filter for last 90 days of data in Qlik Replicate - Qlik Community - 1880761

Filter for last 90 days of data in Qlik Replicate ... - Qlik Community - 1980009

Thanks,

Dana

Heinvandenheuvel · ‎2023-11-13

>>> My attempt to add Def A via source_lookup (source_lookup('NO_CACHING','x','y','b.ID_B FROM Table_A a, Table_B b WHERE a.ID_B = b.ID_B and a.ID_A=:1 --', 'z',$ID_A)

I don't understand that. Why not use source_lookup(10,'<schema>','Table_B','DefA',$ID_B) in ('50','60')

>>> leads to a massive drop in performance (17000 rows/second -> 1000 rows/second).

Considering you asked for no-caching, meaning a synchronous round-trip per row, that 1000 rows per second is very good.

Why would you not ask for caching such that if one of the next rows has the same ID_B, it does not have to make the roundtrip. Of course depending on data distribution this might never happen, but when it does it helps a lot.

You might need two trips anyway , for the same values, if the IN expression does not do exactly what you need. If so, 1 second caching would double performance.

As per @Dana_Baldwin using source_lookup is best (only?) done for CDC work. For fullload you are better of with a passthru filter making the source NOT send 300M rows but just the 50M that satisfy.

Alternatively you could define a VIEW on source representing the desired filter and create a (dedicated) task to load from the view into the actual target table. Next start the real task (with source_lookup) with 'run advanced table already loaded.'

hth, Hein.

Performance