Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Concatenate/Join tables from 3 sources without duplication

I have three sources of data (source1, source2, source3) and I would like for them all to be joined into one table. I've been able to get close to the desired result using Concatenate or Outer Join, but not quite what I require as some records are being duplicated. To give an small example of my problem, my load script is as follows:

Table:

LOAD

Team

Division

Source

FROM source1

Concatenate

LOAD

Team

Division

Source

FROM source2

Concatenate

LOAD

Team

Division

Source

FROM source3

My problem arises when I have a Team which has changed Division in a different source. Each Team/Division combination should only show up once, but they're actually showing up three times. I only want the rows shaded green in my data:

TeamDivisionSource
North London GunsFirstsource1
North London GunsFirstsource2
North London GunsFirstsource3
North London GunsSecondsource1
North London GunsSecondsource2
North London GunsSecondsource3
North London GunsThirdsource1
North London GunsThirdsource2
North London GunsThirdsource3

All the other teams which don't have a different division between sources appear perfectly fine when concatenating or outer joining the tables (1 team per source).

Does anyone know how I can modify the script so that it doesn't produce the undesired rows above? Would some form of applymap be helpful?

Thanks

1 Solution

Accepted Solutions
Not applicable
Author

Thanks Marcus - While the suggestion above didn't yield the results I require (it only showed source2 and source3 data that didn't exist in source1; in this case, only the first row), it did inspire my eventual solution/workaround.

I created a field (call it TeamDivision) which is a concatenation of Team & Division, separated by a "|":

Team & '|' & Division AS TeamDivision

Using this (for some reason I simply didn't notice this before I included the "|") I could see that the division matched the source on each row:

TeamDivisionTeamDivisionSource
North London Guns|FirstNorth London GunsFirstsource1
North London Guns|SecondNorth London GunsFirstsource2
North London Guns|ThirdNorth London GunsFirstsource3
North London Guns|FirstNorth London GunsSecondsource1
North London Guns|SecondNorth London GunsSecondsource2
North London Guns|ThirdNorth London GunsSecondsource3
North London Guns|FirstNorth London GunsThirdsource1
North London Guns|SecondNorth London GunsThirdsource2
North London Guns|ThirdNorth London GunsThirdsource3

I then created another field which took a substring of the TeamDivision, based on that magical "|" which displays everything to the right of the "|"

mid([TeamDivision], index([TeamDivision],'|')+1) as TeamDivisionSub

And added it into the table:

TeamDivisionTeamDivisionSubTeamDivisionSource
North London Guns|FirstFirstNorth London GunsFirstsource1
North London Guns|SecondSecondNorth London GunsFirstsource2
North London Guns|ThirdThirdNorth London GunsFirstsource3
North London Guns|FirstFirstNorth London GunsSecondsource1
North London Guns|SecondSecondNorth London GunsSecondsource2
North London Guns|ThirdThirdNorth London GunsSecondsource3
North London Guns|FirstFirstNorth London GunsThirdsource1
North London Guns|SecondSecondNorth London GunsThirdsource2
North London Guns|ThirdThirdNorth London GunsThirdsource3

Finally, I added an expression to the table which only shows the rows where TeamDivisionSub = Division, resulting in only the green rows 🙂

TeamDivisionTeamDivisionSubTeamDivisionSource
North London Guns|FirstFirstNorth London GunsFirstsource1
North London Guns|SecondSecondNorth London GunsSecondsource2
North London Guns|ThirdThirdNorth London GunsThirdsource3

TeamDivisionSource
North London GunsFirstsource1
North London GunsSecondsource2
North London GunsThirdsource3

Thanks for your assistance!

View solution in original post

6 Replies
Anonymous
Not applicable
Author

Try this condition on your last 2 load statements :

     Where Not Exists(Team) And Not Exists(Division)

Also is the [Source] field actually required ?

marcus_sommer

Concatenate should be the best approach - if you then have duplicates put recno() and rowno() in your loads and/or check your source-files. Maybe they needed to be cleaned or you need another logic to identify them.

- Marcus

Not applicable
Author

Thanks - I tried your solution, but it excluded too many values, so only the first green row in the table (source1) was loaded, but I also need the source2 and source3 rows. The Source field is required because I need to compare rows from the three sources (based on the source's date), or more specifically, I need to track any team that's changed division between the different sources (each source is a different season).

Not applicable
Author

Thanks for the reply Marcus. I've placed recNo() and rowNo() in the three load statements but that hasn't changed anything as every row is already unique, but the problem is the rows in the table above that aren't shaded green don't actually exist in the raw data; they're being created at some point in the load, probably due to the concatenation. I'm unable to find a way of loading them all so that those extra rows aren't created, or so that too many rows are excluded from the load.

marcus_sommer

Try something like this:

    Table: 

    LOAD  Team, Division, Source, Team & '|' & Division as [UniqueKey] FROM source1;

                Concatenate (Table)

    LOAD  Team, Division, Source FROM source2 where not exists([UniqueKey], Team & '|' & Division);

                Concatenate (Table)

    LOAD  Team, Division, Source FROM source3 where not exists([UniqueKey], Team & '|' & Division);

- Marcus

Not applicable
Author

Thanks Marcus - While the suggestion above didn't yield the results I require (it only showed source2 and source3 data that didn't exist in source1; in this case, only the first row), it did inspire my eventual solution/workaround.

I created a field (call it TeamDivision) which is a concatenation of Team & Division, separated by a "|":

Team & '|' & Division AS TeamDivision

Using this (for some reason I simply didn't notice this before I included the "|") I could see that the division matched the source on each row:

TeamDivisionTeamDivisionSource
North London Guns|FirstNorth London GunsFirstsource1
North London Guns|SecondNorth London GunsFirstsource2
North London Guns|ThirdNorth London GunsFirstsource3
North London Guns|FirstNorth London GunsSecondsource1
North London Guns|SecondNorth London GunsSecondsource2
North London Guns|ThirdNorth London GunsSecondsource3
North London Guns|FirstNorth London GunsThirdsource1
North London Guns|SecondNorth London GunsThirdsource2
North London Guns|ThirdNorth London GunsThirdsource3

I then created another field which took a substring of the TeamDivision, based on that magical "|" which displays everything to the right of the "|"

mid([TeamDivision], index([TeamDivision],'|')+1) as TeamDivisionSub

And added it into the table:

TeamDivisionTeamDivisionSubTeamDivisionSource
North London Guns|FirstFirstNorth London GunsFirstsource1
North London Guns|SecondSecondNorth London GunsFirstsource2
North London Guns|ThirdThirdNorth London GunsFirstsource3
North London Guns|FirstFirstNorth London GunsSecondsource1
North London Guns|SecondSecondNorth London GunsSecondsource2
North London Guns|ThirdThirdNorth London GunsSecondsource3
North London Guns|FirstFirstNorth London GunsThirdsource1
North London Guns|SecondSecondNorth London GunsThirdsource2
North London Guns|ThirdThirdNorth London GunsThirdsource3

Finally, I added an expression to the table which only shows the rows where TeamDivisionSub = Division, resulting in only the green rows 🙂

TeamDivisionTeamDivisionSubTeamDivisionSource
North London Guns|FirstFirstNorth London GunsFirstsource1
North London Guns|SecondSecondNorth London GunsSecondsource2
North London Guns|ThirdThirdNorth London GunsThirdsource3

TeamDivisionSource
North London GunsFirstsource1
North London GunsSecondsource2
North London GunsThirdsource3

Thanks for your assistance!