

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Qlik Replicate: Lookup and filter sourcetables
Hello,
I have the following problem when replicating tables from an Oracle source:
Table A: 300 million rows
| ID_A | Value 1| Value 2 | ID_B |
Table B: 100,000 rows
| ID_B | Value X | Value Y | Def A |
I now only want to load the rows from table A for which Def A has the value 50 or 60.
As a final outcome, only just under 50 million rows of table A are required.
My attempt to add Def A via source_lookup (source_lookup('NO_CACHING','x','y','b.ID_B FROM Table_A a, Table_B b WHERE a.ID_B = b.ID_B and a.ID_A=:1 --', 'z',$ID_A)) leads to a massive drop in performance (17000 rows/second -> 1000 rows/second).
Is there a better solution?
Thanks a lot!
Till

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @till_bentz
Source lookup is generally used during change processing, I think you're looking for a pass thru filter which uses a where clause on the select query so only the desired rows are captured during the full load. You can also set a filter to be used during change processing to ensure only the data you want comes through.
Please refer to these knowledge articles:
Filter for last 90 days of data in Qlik Replicate - Qlik Community - 1880761
Filter for last 90 days of data in Qlik Replicate ... - Qlik Community - 1980009
Thanks,
Dana

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>> My attempt to add Def A via source_lookup (source_lookup('NO_CACHING','x','y','b.ID_B FROM Table_A a, Table_B b WHERE a.ID_B = b.ID_B and a.ID_A=:1 --', 'z',$ID_A)
I don't understand that. Why not use source_lookup(10,'<schema>','Table_B','DefA',$ID_B) in ('50','60')
>>> leads to a massive drop in performance (17000 rows/second -> 1000 rows/second).
Considering you asked for no-caching, meaning a synchronous round-trip per row, that 1000 rows per second is very good.
Why would you not ask for caching such that if one of the next rows has the same ID_B, it does not have to make the roundtrip. Of course depending on data distribution this might never happen, but when it does it helps a lot.
You might need two trips anyway , for the same values, if the IN expression does not do exactly what you need. If so, 1 second caching would double performance.
As per @Dana_Baldwin using source_lookup is best (only?) done for CDC work. For fullload you are better of with a passthru filter making the source NOT send 300M rows but just the 50M that satisfy.
Alternatively you could define a VIEW on source representing the desired filter and create a (dedicated) task to load from the view into the actual target table. Next start the real task (with source_lookup) with 'run advanced table already loaded.'
hth, Hein.
