Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I've been trying to alter an ETL script so that it produces less rows in 1 table.
One of the potential changes would be to the following:
LOAD
%Incident_key
-- extra information --
FROM Datasource_A;
CONCATENATE LOAD
%Incident_key
1 AS _Flag_NewIncident
FROM Datasource_B;
Datasource_A has many more rows and keys than Datasource_B.
The end result of this script would be the sum of the rows in Datasource_A and Datasource_B.
EXAMPLE (the reasoning behind why you'd want to do this isn't logical in the example):
Datasource_A:
%Incident_Key | Type |
1 | A |
2 | A |
3 | B |
4 | A |
5 | A |
6 | B |
7 | C |
8 | B |
9 | A |
10 | C |
Datasource_B:
%Incident_key |
6 |
7 |
9 |
10 |
End result:
%Incident_Key | Type | _Flag_NewIncident |
1 | A | . |
2 | A | . |
3 | B | . |
4 | A | . |
5 | A | . |
6 | B | . |
7 | C | . |
8 | B | . |
9 | A | . |
10 | C | . |
6 | . | 1 |
7 | . | 1 |
9 | . | 1 |
10 | . | 1 |
Is it possible to make the script so that the end result would be:
%Incident_Key | Type | _Flag_NewIncident |
1 | A | . |
2 | A | . |
3 | B | . |
4 | A | . |
5 | A | . |
6 | B | 1 |
7 | C | 1 |
8 | B | . |
9 | A | 1 |
10 | C | 1 |
Yours sincerely,
Casper Westelaken
I did not realize it was possible to JOIN LOAD.
Thank you very much!
Yours sincerely,
Casper Westelaken
Okey,
Whilst this was the perfect solution for my example. Sadly it doesn't solve my issue the way it should.
This is my fault for not correctly describing the issue.
The Incident_key is a combined key in the first table, where in the second table it is a sole key field.
As follows:
Datasource_A:
%Incident_key | %Extra_key | Type |
1 | 11 | A |
1 | 12 | A |
1 | 13 | A |
2 | 11 | B |
3 | 11 | A |
3 | 12 | A |
4 | 22 | C |
5 | 31 | C |
5 | 41 | C |
5 | 23 | C |
5 | 12 | C |
6 | 56 | B |
6 | 32 | B |
7 | 12 | A |
8 | 43 | B |
9 | 23 | A |
9 | 54 | A |
10 | 32 | C |
Datasource_B:
%Incident_key |
2 |
4 |
5 |
7 |
9 |
End Result using original Concatenate:
%Incident_key | %Extra_key | Type | _Flag_IncidentNew |
1 | 11 | A | |
1 | 12 | A | |
1 | 13 | A | |
2 | 11 | B | |
3 | 11 | A | |
3 | 12 | A | |
4 | 22 | C | |
5 | 31 | C | |
5 | 41 | C | |
5 | 23 | C | |
5 | 12 | C | |
6 | 56 | B | |
6 | 32 | B | |
7 | 12 | A | |
8 | 43 | B | |
9 | 23 | A | |
9 | 54 | A | |
10 | 32 | C | |
2 | Example error | 1 | |
4 | Example error | 1 | |
5 | Example error | 1 | |
7 | Example error | 1 | |
9 | Example error | 1 |
End result using Join:
%Incident_key | %Extra_key | Type | _Flag_IncidentNew |
1 | 11 | A | |
1 | 12 | A | |
1 | 13 | A | |
2 | 11 | B | 1 |
3 | 11 | A | |
3 | 12 | A | |
4 | 22 | C | 1 |
5 | 31 | C | 1 |
5 | 41 | C | 1 |
5 | 23 | C | 1 |
5 | 12 | C | 1 |
6 | 56 | B | |
6 | 32 | B | |
7 | 12 | A | 1 |
8 | 43 | B | |
9 | 23 | A | 1 |
9 | 54 | A | 1 |
10 | 32 | C |
Desired End Result:
%Incident_key | %Extra_key | Type | _Flag_IncidentNew |
1 | 11 | A | |
1 | 12 | A | |
1 | 13 | A | |
2 | 11 | B | 1 |
3 | 11 | A | |
3 | 12 | A | |
4 | 22 | C | 1 |
5 | 31 | C | 1 |
5 | 41 | C | |
5 | 23 | C | |
5 | 12 | C | |
6 | 56 | B | |
6 | 32 | B | |
7 | 12 | A | 1 |
8 | 43 | B | |
9 | 23 | A | 1 |
9 | 54 | A | |
10 | 32 | C |
Do you know whether or not this is possible?
How do you know which of the rows with e.g. incident_key = 5, that should be flagged?
%Incident_key | %Extra_key | Type | _Flag_IncidentNew |
1 | 11 | A | |
1 | 12 | A | |
1 | 13 | A | |
2 | 11 | B | 1 |
3 | 11 | A | |
3 | 12 | A | |
4 | 22 | C | 1 |
5 | 31 | C | 1 |
5 | 41 | C | |
5 | 23 | C | |
5 | 12 | C |
In this example I only used an %Extra_key, but since we're working with a facts table and dimensions setup this is actually multiple keys.
One of these keys is a %Date_key.
The row that should be flagged would the one that coincides with the earliest instance of the Incident.
Datasource_B (The table with the Incident dimension) includes a SysCreated field which during the load statement would be turned into a %Date_Key for loading into the facts table only (to avoid a synthetic key)
Sadly though... Theoretically it is possible for 2 rows in the Facts table to have the same %Incident_key and %Date_key. So I'd rather create something that would simply load it in once per Incident and date combination. All that matters is that the earliest date per incident gets flagged, so that no double flags exist.
So essentially I'm loading the _Flag_IncidentNew into the Facts table using the Incidents table in which the %Incident_key is distinct, where during this load I'm transforming one of the columns in the Incidents table into a %Date_key.
In the actual situation (non-example) every %Incident_key gets flagged once.
I guess my examples were insufficient.
The end result would be:
%Incident_key | %Extra_key | Type | _Flag_IncidentNew |
1 | 11 | A | 1 |
1 | 12 | A | |
1 | 13 | A | |
2 | 11 | B | 1 |
3 | 11 | A | 1 |
3 | 12 | A | |
4 | 22 | C | 1 |
5 | 31 | C | 1 |
5 | 41 | C | |
5 | 23 | C | |
5 | 12 | C |