Re: Comparing Two data sources and showing differe... - Qlik Community

Report Inappropriate Content · ‎2016-05-06

I am looking for suggestions comparing two different databases which contain multiple tables, 100's of columns and millions of rows. Our primary concern is the scale-ability of this application. We have an existing application working for our smaller tables, however as we add more fields and compare using our larger tables, the application becomes unusable.

I have seen a similar problem such as below:

https://community.qlik.com/thread/32944

I just figured I would ask again in case anyone has seen anything or developed anything. I know Qlikview doesn't forte in this type of thing, but we don't have a lot of options as far as branching out to other tools. Any help or ideas would be greatly appreciated.

Thank you,

Brandon

sunny_talwar · ‎2016-05-07

You can try binary loading one of those files into the other one. The only issue that you might run into is the fact that you might end up making unwanted connections between similar named fields. The work around that would be to use Qualify * and unqualify only those fields which you want to connect between the two databases.

I guess the bigger concern here is the size, I don't think I have an advice with regards to that. Not saying you might not be able to do anything, but I don't have too many expertise in that area. May be someone else can offer a better advice.

Gysbert_Wassenaar · ‎2016-05-09

You could try something like this (not necessarily using resident loads):

SUB TableDiff(Table1, Table2)

SET vDiffTable = Diff_$(Table1)_$(Table2);

NoDiff:
LOAD HASH as HASH_SAME RESIDENT $(Table1);

INNER JOIN (NoDiff)

LOAD HASH as HASH_SAME RESIDENT $(Table2);

     $(vDiffTable):
     NOCONCATENATE
     LOAD * RESIDENT $(Table1) WHERE Not Exists(HASH_SAME, HASH);

     CONCATENATE ($(vDiffTable))

LOAD * RESIDENT $(Table2) WHERE Not Exists(HASH_SAME, HASH);

STORE $(vDiffTable) INTO $(vDiffTable).qvd;

DROP TABLES $(vDiffTable), NoDiff;

ENDSUB

T1:
NOCONCATENATE
LOAD *, Hash256(A,B,C) as HASH INLINE [
A, B, C
1, a, 10
2, a, 30
3, b, 40
4, c, 10
5, d, 30
];

T2:
NOCONCATENATE
LOAD *, Hash256(A,B,C) as HASH INLINE [
A, B, C
1, a, 20
2, a, 30
3, b, 40
6, c, 10
7, d, 30
];

CALL TableDiff('T1', 'T2');

talk is cheap, supply exceeds demand

Comparing Two data sources and showing differences