Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi guys,
I just want to collect some of your ideas how to solve one simple transformation.
Source Data:
ID | Date | Attribute1 | Attribute2 | Attribute3 |
---|---|---|---|---|
1 | 01-01-2017 | active | NULL | NULL |
1 | 01-01-2017 | NULL | 1 | NULL |
1 | 01-01-2017 | NULL | NULL | 8 |
1 | 02-01-2017 | active | NULL | NULL |
1 | 02-01-2017 | NULL | 7 | NULL |
1 | 02-01-2017 | NULL | NULL | 100 |
2 | 01-01-2017 | active | NULL | NULL |
2 | 01-01-2017 | NULL | 2 | NULL |
2 | 01-01-2017 | NULL | NULL | 9 |
2 | 02-01-2017 | inactive | NULL | NULL |
2 | 02-01-2017 | NULL | 20 | NULL |
2 | 02-01-2017 | NULL | NULL | 50 |
Output should be:
ID | Date | Attribute1 | Attribute2 | Attribute3 |
---|---|---|---|---|
1 | 01-01-2017 | active | 1 | 8 |
1 | 02-01-2017 | active | 7 | 100 |
2 | 01-01-2017 | active | 2 | 9 |
2 | 02-01-2017 | inactive | 20 | 50 |
Transformation is done for millions of rows, so actually I am trying to find the option for shortest reload time.
Thank you very much.
BR
Martin
Hard to say without knowing what is happening - Could you share the load script and the reload log.
Load Null as 0 and Active as 1 and Inactive as 2. The search will become faster
Hi Martin,
If the data doesn't change after the fact, I would suggest an 'incremental' reload. This will speed it up massively.
An example for your case, load the data where the date is greater than a certain date and concatenate old data already loaded into a QVD.
Eg.
LET vDate = Date(Today()-1,'DD-MM-YYYY');
Data:
LOAD * FROM dataSource
WHERE Date >= '$(vDate)'
CONCATENATE (Data) LOAD * FROM C:\DataLocation\Data.qvd (qvd)
WHERE Date < '$(vDate)'; //may or may not need this WHERE statement depending on how your load works, if there is no date crossover from the stored data and the new data then you won't.
STORE Data Into C:\DataLocation\Data.qvd (qvd);
...This way you won't be transforming old data each time you reload, it will simply add to the old set.
Hope this helps.
Sam
Hi Bill,
doing just mapping load for each attribute where not isnull(Attribute)and then applying it into next table on distinct combination of ID and Date.
MAP_attribute1:
mapping load
Autonumber(ID&'│'&Date),
attribute1
Resident XXX
where not isnull(Attribute1);
Load DISTINCT
Autonumber(ID&'│'&Date),
ApplyMap(...) as AttrbuteValue
Resident XXX;
...
Thanks.
Martin
Hello,
Don;t use string for values, it's bad for the memory...
I made it like this:
LOAD ID, Date,
only(if(Attribute1='active',-1,if( Attribute1='inactive',0))) as Attr1,
only(Attribute2) as Attribute2,
only(Attribute3) as Attribute3
FROM
[.\Book1.xlsx]
(ooxml, embedded labels, table is Sheet1)
Group by ID, Date;
Group by also consumes resources but it depends on how much data you have.
Hope it helps,
Hi Andrei,
thanks for advice. Sure, I am using INT instead of CHAR, that was just an example of data set. Group by looks like the best option here.
Thank you.
Martin
Glad i could help.
My pleasure,
Andrei