Filtering duplicates using Fullload Passthru Filte... - Qlik Community

leonardoed · ‎2022-02-09

Hi All,

I`m trying to replicate a table from SAP HANA as a source to azure as target. By now I am bringing the last 13 months of data from the table, and I need this 13 months. For this I am using the Fullload Passthru Filter as follows:

Now comes the problem: Even filtering only the 13 last months, my dataset still have more than 500 million rows. And I am facing an error because I have used all my space on azure. I found that I have some duplicated rows ( when I select distinct it goes from 500kk to 30kk) that are needed on HANA but I wont need them on azure. So now comes my question: Can I do this select distinct filter using the Fullload Passthru Filter? Or is any other way to filter this when Qlik brings the data from the source? If not I will have to manage this directly on hana before replicating.

Thanks

Heinvandenheuvel · ‎2022-02-10

So you don't want all columns from the table, just those listed in the DISTINCT clause right. Really?

Hmm, the 'passthru' is implemented as TEXT for a WHERE clause.

The SELECT <column list> FROM table text is already in place. No option to inject a 'DISTINCT' there, plus there is no option for an alternative <column list>

I don't think there is a clever WHERE clause only thing you can do as alternative. If you google for 'SQL alternative to distinct' or similar you'll find several responses, but the ones I saw needed a JOIN in the FROM.

Your best bet is probably to fullload a VIEW, not the base table.

Hein

shashi_holla · ‎2022-02-11

If we don't need duplicate on the target, we would have to create a unique constraint on the target table so that duplicates get filtered out. And also the scenario looks like we will need to have different unique constraint between source and target.

Heinvandenheuvel · ‎2022-02-12

Shashi has an interesting suggestion.

Select the duplication controlling columns on the target to create a unique constraint after table creation.

Do not let Replicate drop and recreate, but truncate instead

Set the task error handling for 'apply conflicts' to 'Duplicate key when applying INSERT:
UPDATE the existing target record'. That way you'll just get the last row with duplicate key updating any prior ones.

hth,

Hein

Filtering duplicates using Fullload Passthru Filter

General Question