Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
gino2780
Creator
Creator

Outer Join / Generate matching fields /

Hello, 

i'd like to know how to create specific fields for outer joins which give you the info which ID are just in T1, which are in both and which are only in T2. Let me give you a small example:

Source data come from 2 different tables:

ID_T1ID_T2
10-
15-
56-
7878
8080
-82
-89

 

My goal is to generate new fields after an outer join which give me some sort of indication how many id are being matched or not (either only in T1 or only T2) for counts e.g.

The result should look like this (3 more fields within the script to generate for the outer join):

ID_T1m_T1_T2ID_T2
101
101
101
010
010
-10-1
-10-1
Value meaningsValue meanings

Value meanings

1 = only in table10 = not matched T1:T21 = only in table2
0 = matched 1:11 = matched T1:T20 = matched 1:1
-1 = only in table2 -1 = only in table1

 

Any ideas?

Labels (2)
1 Solution

Accepted Solutions
marcus_sommer

I don't know for what do you want to use the information which keys are in which source available but I doubt that I would try to generate it and also to load the data with a join-approach - else I would probably concatenate the sources and apply the flagging with exists(). The following might be therefore a useful starting point for your task:

t1: load ID_T2 inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

t2: load ID_T1 as ID, (-exists(ID_T2, ID_T1) -1) * 1 as Flag, recno() as RecNo, rowno() as RowNo, 1 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T1));

concatenate(t2)

load ID_T2 as ID, (-exists(ID, ID_T2) -1) * 2 as Flag, recno() as RecNo, rowno() as RowNo, 2 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

drop tables t1;

KeyCheckFlags.JPG

and if there are not very special requirements I could imagine that you don't need this kind of flagging else that flagging the source (and using it as selection-field and/or within any set analysis) would be enough.

- Marcus

View solution in original post

4 Replies
marcus_sommer

I don't know for what do you want to use the information which keys are in which source available but I doubt that I would try to generate it and also to load the data with a join-approach - else I would probably concatenate the sources and apply the flagging with exists(). The following might be therefore a useful starting point for your task:

t1: load ID_T2 inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

t2: load ID_T1 as ID, (-exists(ID_T2, ID_T1) -1) * 1 as Flag, recno() as RecNo, rowno() as RowNo, 1 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T1));

concatenate(t2)

load ID_T2 as ID, (-exists(ID, ID_T2) -1) * 2 as Flag, recno() as RecNo, rowno() as RowNo, 2 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

drop tables t1;

KeyCheckFlags.JPG

and if there are not very special requirements I could imagine that you don't need this kind of flagging else that flagging the source (and using it as selection-field and/or within any set analysis) would be enough.

- Marcus

gino2780
Creator
Creator
Author

Hi Marcus,

 

thanks for the script and also your insights on the topic.

What would your recommended approach look like in the script  and/or set analysis commands?

Can you share this as well? Would be highly appreciated.

marcus_sommer

Multiple fact-tables within a datamodel which have missing keys couldn't be simply joined or associated.

To overcome the problem you could identify the missing keys and adding them with more or less dummy-data. Another quite common approach is to create a link-table with the keys from all relating fact-tables. And the usually easiest way is just to concatenate them into a single table and therefore it's my preferred way and always my starting point - and only if I get any performance issues I would check for further optimizations (which is then usually a join-construct with prepared data which also leads to a single fact-table).

In regard to my suggested concatenate-approach you could use for the most calculations simple expressions like:

sum(value)

and if you really need to differ the results between the origin-sources you could just apply:

sum({< Source = {'table abc'}>} value) respectively sum({< Source = {'table xyz'}>} value)

- Marcus

gino2780
Creator
Creator
Author

Hello Marcus,

thanks for sharing your insights on this topic. I'll try out your approach.

My initial thought on this topic had something to do with experiences i've made with statistic programs that generate system-defined variables for any type of merge (left, right, inner, outer join). These kind of variables provide additional information on the merged datamodell with reference to the originate data.

Greets,

gino2780