Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have three sources of data (source1, source2, source3) and I would like for them all to be joined into one table. I've been able to get close to the desired result using Concatenate or Outer Join, but not quite what I require as some records are being duplicated. To give an small example of my problem, my load script is as follows:
Table:
LOAD
Team
Division
Source
FROM source1
Concatenate
LOAD
Team
Division
Source
FROM source2
Concatenate
LOAD
Team
Division
Source
FROM source3
My problem arises when I have a Team which has changed Division in a different source. Each Team/Division combination should only show up once, but they're actually showing up three times. I only want the rows shaded green in my data:
Team | Division | Source |
---|---|---|
North London Guns | First | source1 |
North London Guns | First | source2 |
North London Guns | First | source3 |
North London Guns | Second | source1 |
North London Guns | Second | source2 |
North London Guns | Second | source3 |
North London Guns | Third | source1 |
North London Guns | Third | source2 |
North London Guns | Third | source3 |
All the other teams which don't have a different division between sources appear perfectly fine when concatenating or outer joining the tables (1 team per source).
Does anyone know how I can modify the script so that it doesn't produce the undesired rows above? Would some form of applymap be helpful?
Thanks
Thanks Marcus - While the suggestion above didn't yield the results I require (it only showed source2 and source3 data that didn't exist in source1; in this case, only the first row), it did inspire my eventual solution/workaround.
I created a field (call it TeamDivision) which is a concatenation of Team & Division, separated by a "|":
Team & '|' & Division AS TeamDivision
Using this (for some reason I simply didn't notice this before I included the "|") I could see that the division matched the source on each row:
TeamDivision | Team | Division | Source |
---|---|---|---|
North London Guns|First | North London Guns | First | source1 |
North London Guns|Second | North London Guns | First | source2 |
North London Guns|Third | North London Guns | First | source3 |
North London Guns|First | North London Guns | Second | source1 |
North London Guns|Second | North London Guns | Second | source2 |
North London Guns|Third | North London Guns | Second | source3 |
North London Guns|First | North London Guns | Third | source1 |
North London Guns|Second | North London Guns | Third | source2 |
North London Guns|Third | North London Guns | Third | source3 |
I then created another field which took a substring of the TeamDivision, based on that magical "|" which displays everything to the right of the "|"
mid([TeamDivision], index([TeamDivision],'|')+1) as TeamDivisionSub
And added it into the table:
TeamDivision | TeamDivisionSub | Team | Division | Source |
---|---|---|---|---|
North London Guns|First | First | North London Guns | First | source1 |
North London Guns|Second | Second | North London Guns | First | source2 |
North London Guns|Third | Third | North London Guns | First | source3 |
North London Guns|First | First | North London Guns | Second | source1 |
North London Guns|Second | Second | North London Guns | Second | source2 |
North London Guns|Third | Third | North London Guns | Second | source3 |
North London Guns|First | First | North London Guns | Third | source1 |
North London Guns|Second | Second | North London Guns | Third | source2 |
North London Guns|Third | Third | North London Guns | Third | source3 |
Finally, I added an expression to the table which only shows the rows where TeamDivisionSub = Division, resulting in only the green rows 🙂
TeamDivision | TeamDivisionSub | Team | Division | Source |
---|---|---|---|---|
North London Guns|First | First | North London Guns | First | source1 |
North London Guns|Second | Second | North London Guns | Second | source2 |
North London Guns|Third | Third | North London Guns | Third | source3 |
Team | Division | Source |
---|---|---|
North London Guns | First | source1 |
North London Guns | Second | source2 |
North London Guns | Third | source3 |
Thanks for your assistance!
Try this condition on your last 2 load statements :
Where Not Exists(Team) And Not Exists(Division)
Also is the [Source] field actually required ?
Concatenate should be the best approach - if you then have duplicates put recno() and rowno() in your loads and/or check your source-files. Maybe they needed to be cleaned or you need another logic to identify them.
- Marcus
Thanks - I tried your solution, but it excluded too many values, so only the first green row in the table (source1) was loaded, but I also need the source2 and source3 rows. The Source field is required because I need to compare rows from the three sources (based on the source's date), or more specifically, I need to track any team that's changed division between the different sources (each source is a different season).
Thanks for the reply Marcus. I've placed recNo() and rowNo() in the three load statements but that hasn't changed anything as every row is already unique, but the problem is the rows in the table above that aren't shaded green don't actually exist in the raw data; they're being created at some point in the load, probably due to the concatenation. I'm unable to find a way of loading them all so that those extra rows aren't created, or so that too many rows are excluded from the load.
Try something like this:
Table:
LOAD Team, Division, Source, Team & '|' & Division as [UniqueKey] FROM source1;
Concatenate (Table)
LOAD Team, Division, Source FROM source2 where not exists([UniqueKey], Team & '|' & Division);
Concatenate (Table)
LOAD Team, Division, Source FROM source3 where not exists([UniqueKey], Team & '|' & Division);
- Marcus
Thanks Marcus - While the suggestion above didn't yield the results I require (it only showed source2 and source3 data that didn't exist in source1; in this case, only the first row), it did inspire my eventual solution/workaround.
I created a field (call it TeamDivision) which is a concatenation of Team & Division, separated by a "|":
Team & '|' & Division AS TeamDivision
Using this (for some reason I simply didn't notice this before I included the "|") I could see that the division matched the source on each row:
TeamDivision | Team | Division | Source |
---|---|---|---|
North London Guns|First | North London Guns | First | source1 |
North London Guns|Second | North London Guns | First | source2 |
North London Guns|Third | North London Guns | First | source3 |
North London Guns|First | North London Guns | Second | source1 |
North London Guns|Second | North London Guns | Second | source2 |
North London Guns|Third | North London Guns | Second | source3 |
North London Guns|First | North London Guns | Third | source1 |
North London Guns|Second | North London Guns | Third | source2 |
North London Guns|Third | North London Guns | Third | source3 |
I then created another field which took a substring of the TeamDivision, based on that magical "|" which displays everything to the right of the "|"
mid([TeamDivision], index([TeamDivision],'|')+1) as TeamDivisionSub
And added it into the table:
TeamDivision | TeamDivisionSub | Team | Division | Source |
---|---|---|---|---|
North London Guns|First | First | North London Guns | First | source1 |
North London Guns|Second | Second | North London Guns | First | source2 |
North London Guns|Third | Third | North London Guns | First | source3 |
North London Guns|First | First | North London Guns | Second | source1 |
North London Guns|Second | Second | North London Guns | Second | source2 |
North London Guns|Third | Third | North London Guns | Second | source3 |
North London Guns|First | First | North London Guns | Third | source1 |
North London Guns|Second | Second | North London Guns | Third | source2 |
North London Guns|Third | Third | North London Guns | Third | source3 |
Finally, I added an expression to the table which only shows the rows where TeamDivisionSub = Division, resulting in only the green rows 🙂
TeamDivision | TeamDivisionSub | Team | Division | Source |
---|---|---|---|---|
North London Guns|First | First | North London Guns | First | source1 |
North London Guns|Second | Second | North London Guns | Second | source2 |
North London Guns|Third | Third | North London Guns | Third | source3 |
Team | Division | Source |
---|---|---|
North London Guns | First | source1 |
North London Guns | Second | source2 |
North London Guns | Third | source3 |
Thanks for your assistance!