Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
till_bentz
Contributor III
Contributor III

Qlik Replicate: Lookup and filter sourcetables

Hello,

I have the following problem when replicating tables from an Oracle source:

Table A: 300 million rows
| ID_A | Value 1| Value 2 | ID_B |

Table B: 100,000 rows
| ID_B | Value X | Value Y | Def A |

I now only want to load the rows from table A for which Def A has the value 50 or 60.

As a final outcome, only just under 50 million rows of table A are required.

My attempt to add Def A via source_lookup (source_lookup('NO_CACHING','x','y','b.ID_B FROM Table_A a, Table_B b WHERE a.ID_B = b.ID_B and a.ID_A=:1 --', 'z',$ID_A)) leads to a massive drop in performance (17000 rows/second -> 1000 rows/second).

Is there a better solution?

Thanks a lot!
Till

Labels (1)
2 Replies
Dana_Baldwin
Support
Support

Hi @till_bentz 

Source lookup is generally used during change processing, I think you're looking for a pass thru filter which uses a where clause on the select query so only the desired rows are captured during the full load. You can also set a filter to be used during change processing to ensure only the data you want comes through.

Please refer to these knowledge articles:

Filter for last 90 days of data in Qlik Replicate - Qlik Community - 1880761

Filter for last 90 days of data in Qlik Replicate ... - Qlik Community - 1980009

Thanks,

Dana

Heinvandenheuvel
Specialist III
Specialist III

>>> My attempt to add Def A via source_lookup (source_lookup('NO_CACHING','x','y','b.ID_B FROM Table_A a, Table_B b WHERE a.ID_B = b.ID_B and a.ID_A=:1 --', 'z',$ID_A)

I don't understand that. Why not use source_lookup(10,'<schema>','Table_B','DefA',$ID_B)   in ('50','60')

>>> leads to a massive drop in performance (17000 rows/second -> 1000 rows/second).

Considering you asked for no-caching, meaning a synchronous round-trip per row,  that 1000 rows per second is very good.

Why would you not ask for caching such that if one of the next rows has the same ID_B, it does not have to make the roundtrip.  Of course depending on data distribution this might never happen, but when it does it helps a lot.

You might  need two trips anyway , for the same values, if the IN expression does not do exactly what you need. If so, 1 second caching would double performance.

As per @Dana_Baldwin  using source_lookup is best (only?) done for CDC work. For fullload you are better of with a passthru filter making the source NOT send 300M rows but just the 50M that satisfy.

Alternatively you could define a VIEW on source representing the desired filter and create a (dedicated) task to load from the view into the actual target table. Next start the real task (with source_lookup) with 'run advanced table already loaded.'

hth,  Hein.