Solved: Re: Outer Join / Generate matching fields / - Qlik Community

gino2780 · ‎2019-06-19

Hello,

i'd like to know how to create specific fields for outer joins which give you the info which ID are just in T1, which are in both and which are only in T2. Let me give you a small example:

Source data come from 2 different tables:

ID_T1	ID_T2
10	-
15	-
56	-
78	78
80	80
-	82
-	89

My goal is to generate new fields after an outer join which give me some sort of indication how many id are being matched or not (either only in T1 or only T2) for counts e.g.

The result should look like this (3 more fields within the script to generate for the outer join):

ID_T1	m_T1_T2	ID_T2
1	0	1
1	0	1
1	0	1
0	1	0
0	1	0
-1	0	-1
-1	0	-1
Value meanings	Value meanings	Value meanings
1 = only in table1	0 = not matched T1:T2	1 = only in table2
0 = matched 1:1	1 = matched T1:T2	0 = matched 1:1
-1 = only in table2		-1 = only in table1

Any ideas?

marcus_sommer · ‎2019-06-19

I don't know for what do you want to use the information which keys are in which source available but I doubt that I would try to generate it and also to load the data with a join-approach - else I would probably concatenate the sources and apply the flagging with exists(). The following might be therefore a useful starting point for your task:

t1: load ID_T2 inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

t2: load ID_T1 as ID, (-exists(ID_T2, ID_T1) -1) * 1 as Flag, recno() as RecNo, rowno() as RowNo, 1 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T1));

concatenate(t2)

load ID_T2 as ID, (-exists(ID, ID_T2) -1) * 2 as Flag, recno() as RecNo, rowno() as RowNo, 2 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

drop tables t1;

and if there are not very special requirements I could imagine that you don't need this kind of flagging else that flagging the source (and using it as selection-field and/or within any set analysis) would be enough.

- Marcus

View solution in original post

marcus_sommer · ‎2019-06-19

I don't know for what do you want to use the information which keys are in which source available but I doubt that I would try to generate it and also to load the data with a join-approach - else I would probably concatenate the sources and apply the flagging with exists(). The following might be therefore a useful starting point for your task:

t1: load ID_T2 inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

t2: load ID_T1 as ID, (-exists(ID_T2, ID_T1) -1) * 1 as Flag, recno() as RecNo, rowno() as RowNo, 1 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T1));

concatenate(t2)

load ID_T2 as ID, (-exists(ID, ID_T2) -1) * 2 as Flag, recno() as RecNo, rowno() as RowNo, 2 as Source inline [
ID_T1, ID_T2
10,
15,
56,
78, 78
80, 80
, 82
, 89
] where len(trim(ID_T2));

drop tables t1;

and if there are not very special requirements I could imagine that you don't need this kind of flagging else that flagging the source (and using it as selection-field and/or within any set analysis) would be enough.

- Marcus

gino2780 · ‎2019-06-19

Hi Marcus,

thanks for the script and also your insights on the topic.

What would your recommended approach look like in the script and/or set analysis commands?

Can you share this as well? Would be highly appreciated.

marcus_sommer · ‎2019-06-19

Multiple fact-tables within a datamodel which have missing keys couldn't be simply joined or associated.

To overcome the problem you could identify the missing keys and adding them with more or less dummy-data. Another quite common approach is to create a link-table with the keys from all relating fact-tables. And the usually easiest way is just to concatenate them into a single table and therefore it's my preferred way and always my starting point - and only if I get any performance issues I would check for further optimizations (which is then usually a join-construct with prepared data which also leads to a single fact-table).

In regard to my suggested concatenate-approach you could use for the most calculations simple expressions like:

sum(value)

and if you really need to differ the results between the origin-sources you could just apply:

sum({< Source = {'table abc'}>} value) respectively sum({< Source = {'table xyz'}>} value)

- Marcus

gino2780 · ‎2019-06-24

Hello Marcus,

thanks for sharing your insights on this topic. I'll try out your approach.

My initial thought on this topic had something to do with experiences i've made with statistic programs that generate system-defined variables for any type of merge (left, right, inner, outer join). These kind of variables provide additional information on the merged datamodell with reference to the originate data.

Greets,

gino2780