Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
In the load script I want to populate missing data based on linear trend between 2 existing data points.
So for example my raw data is this:
Date | Value |
---|---|
01/01/12 | 10 |
02/01/12 | 9 |
04/01/12 | 7 |
05/01/12 | 6 |
08/01/12 | 3 |
So I want Qlikview to fill in the missing dates. I have a master calendar as follows and when I join the data above with it, I get this:
Date | Value |
---|---|
01/01/12 | 10 |
02/01/12 | 9 |
03/01/12 | |
04/01/12 | 7 |
05/01/12 | 6 |
06/01/12 | |
07/01/12 | |
08/01/12 | 3 |
Simple linear calculation would allow me to determine that the value for 03/01/12 should be 8, simply by taking the 2 values either side of it.
How do I do this in the load script for a large data set.
Thanks
Dan
Hi Dan,
Without knowing further on your datamodel, you can do another load statement RESIDENT the table joined, ordered by date, so if Value is empty, it adds the value from the previous record minus one:
NOCONCATENATE LOAD *,
If(Len(Value) = 0, Previous(Value) -1, Value) AS Value
RESIDENT Table
ORDER BY Date ASC;
It could be possible to do this in one step, if the JOIN results on an already ordered table.
Hope that helps.
Miguel
Thanks, but...
I used a simple relationship of -1 for the purposes of showing the example, but in reality the linear relationship between the dates will not be that simple. Example of real data:
Date | Value |
---|---|
01/01/12 | 356 |
02/01/12 | |
03/01/12 | 321 |
04/01/12 | 318 |
05/01/12 | |
06/01/12 | |
07/01/12 | 253 |
Useage is not constant, but I want to assume a linear relationship between 2 dates where a value is missing, and calculate what the intermediate value should be based on that linear regression.
I may have a gap of several days between values, not just one or two, and I need to populate several missing values based on a linear line between the known values either side.
Thanks