Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am developing an App where I have some data that I need to transform in the load script.
Currently I am having the following issue:
I have extracted a small part of the data set and as you can see in the upper screenshot on the 11th of May 2023 I all of a sudden have two prices for that date (onwards it is always two prices available, where before the price for 1200 was always 0).
Now I would like to transfrom this data so that for 10/05/2023 it returns me only the row with ProductNr 1000 and price 1,274449. For 11/05/2023 I want it to return me the price for ProductNr 1200.
I need the specific ProductNr for that day to be 100 for anything before 11/05/2023 and 1200 for evertything after in this case, it might also switch to 1400 and 1600 (so there might be 4 prices for one specific date) further along the years. Basically it should always take the price where the ProductNr is the Max(ProductNr).
In which way can I achieve this? Thanks in advance.
ProductNr | Location | Date | Price |
1000 | King of Prussia | 01/05/2023 | 1,274449 |
1200 | King of Prussia | 01/05/2023 | 0 |
1000 | King of Prussia | 02/05/2023 | 1,274449 |
1200 | King of Prussia | 02/05/2023 | 0 |
1000 | King of Prussia | 03/05/2023 | 1,274449 |
1200 | King of Prussia | 03/05/2023 | 0 |
1000 | King of Prussia | 04/05/2023 | 1,274449 |
1200 | King of Prussia | 04/05/2023 | 0 |
1000 | King of Prussia | 05/05/2023 | 1,274449 |
1200 | King of Prussia | 05/05/2023 | 0 |
1000 | King of Prussia | 06/05/2023 | 1,274449 |
1200 | King of Prussia | 06/05/2023 | 0 |
1000 | King of Prussia | 07/05/2023 | 1,274449 |
1200 | King of Prussia | 07/05/2023 | 0 |
1000 | King of Prussia | 08/05/2023 | 1,274449 |
1200 | King of Prussia | 08/05/2023 | 0 |
1000 | King of Prussia | 09/05/2023 | 1,274449 |
1200 | King of Prussia | 09/05/2023 | 0 |
1000 | King of Prussia | 10/05/2023 | 1,274449 |
1200 | King of Prussia | 10/05/2023 | 0 |
1000 | King of Prussia | 11/05/2023 | 1,274449 |
1200 | King of Prussia | 11/05/2023 | 1,21865 |
1000 | King of Prussia | 12/05/2023 | 1,274449 |
1200 | King of Prussia | 12/05/2023 | 1,21865 |
1000 | King of Prussia | 13/05/2023 | 1,274449 |
1200 | King of Prussia | 13/05/2023 | 1,21865 |
1000 | King of Prussia | 14/05/2023 | 1,274449 |
1200 | King of Prussia | 14/05/2023 | 1,21865 |
1000 | King of Prussia | 15/05/2023 | 1,274449 |
1200 | King of Prussia | 15/05/2023 | 1,21865 |
1000 | King of Prussia | 16/05/2023 | 1,274449 |
1200 | King of Prussia | 16/05/2023 | 1,21865 |
Instead of:
Hi, you can try reducing data using:
Inner Join (DataTableName)
LOAD
Date,
Max(ProductNr) as ProductNr
Resident DataTableName
Where Price <>0
Group By Date;
Or instead of inner join use left join to add a new field like "1 as isLastPrice" and use this flag in set analysis to get the last price.
Thanks for the suggestion Ruben, I managed to get a first result, now tho there is the possibility that actually we should not only look at the highest ProductNr because it is possible that it had already switched before from the higher to a lower one (in this case 1200 was ued in 2022 and not 1000). I suppose we have to include that it always look at which line has the Max(Date).
Table example:
ProductNr | Location | Date | Price |
1000 | King of Prussia | 01/05/2022 | 0 |
1200 | King of Prussia | 01/05/2022 | 1,24 |
1000 | King of Prussia | 02/05/2022 | 0 |
1200 | King of Prussia | 02/05/2022 | 1,24 |
1000 | King of Prussia | 01/05/2022 | 0 |
1200 | King of Prussia | 01/05/2022 | 1,24 |
1000 | King of Prussia | 02/05/2022 | 0 |
1200 | King of Prussia | 02/05/2022 | 1,24 |
1000 | King of Prussia | 01/05/2022 | 0 |
1200 | King of Prussia | 01/05/2022 | 1,24 |
1000 | King of Prussia | 01/05/2023 | 1,274449 |
1200 | King of Prussia | 01/05/2023 | 0 |
1000 | King of Prussia | 02/05/2023 | 1,274449 |
1200 | King of Prussia | 02/05/2023 | 0 |
1000 | King of Prussia | 03/05/2023 | 1,274449 |
1200 | King of Prussia | 03/05/2023 | 0 |
1000 | King of Prussia | 04/05/2023 | 1,274449 |
1200 | King of Prussia | 04/05/2023 | 0 |
1000 | King of Prussia | 05/05/2023 | 1,274449 |
1200 | King of Prussia | 05/05/2023 | 0 |
1000 | King of Prussia | 06/05/2023 | 1,274449 |
1200 | King of Prussia | 06/05/2023 | 0 |
1000 | King of Prussia | 07/05/2023 | 1,274449 |
1200 | King of Prussia | 07/05/2023 | 0 |
1000 | King of Prussia | 08/05/2023 | 1,274449 |
1200 | King of Prussia | 08/05/2023 | 0 |
1000 | King of Prussia | 09/05/2023 | 1,274449 |
1200 | King of Prussia | 09/05/2023 | 0 |
1000 | King of Prussia | 10/05/2023 | 1,274449 |
1200 | King of Prussia | 10/05/2023 | 0 |
1000 | King of Prussia | 11/05/2023 | 1,274449 |
1200 | King of Prussia | 11/05/2023 | 1,21865 |
1000 | King of Prussia | 12/05/2023 | 1,274449 |
1200 | King of Prussia | 12/05/2023 | 1,21865 |
1000 | King of Prussia | 13/05/2023 | 1,274449 |
1200 | King of Prussia | 13/05/2023 | 1,21865 |
1000 | King of Prussia | 14/05/2023 | 1,274449 |
1200 | King of Prussia | 14/05/2023 | 1,21865 |
Hi, the conditions are not clear, maybe you only need to remove the 0's before doing the join
Hi Ruben,
The data looks like this now:
ProductNr | Location | Date | Price |
1200 | King of Prussia | 01/05/2023 | 1,16260000000000000000 |
1000 | King of Prussia | 01/05/2023 | 0 |
1200 | King of Prussia | 02/05/2023 | 1,16260000000000000000 |
1000 | King of Prussia | 02/05/2023 | 0 |
1200 | King of Prussia | 03/05/2023 | 1,16260000000000000000 |
1000 | King of Prussia | 03/05/2023 | 1,34937899999999990000 |
1200 | King of Prussia | 04/05/2023 | 1,16260000000000000000 |
1000 | King of Prussia | 04/05/2023 | 1,34937899999999990000 |
1200 | King of Prussia | 05/05/2023 | 1,16260000000000000000 |
1000 | King of Prussia | 05/05/2023 | 1,34937899999999990000 |
1200 | King of Prussia | 06/05/2023 | 1,274449 |
1000 | King of Prussia | 06/05/2023 | 0 |
1200 | King of Prussia | 07/05/2023 | 1,274449 |
1000 | King of Prussia | 07/05/2023 | 0 |
1200 | King of Prussia | 08/05/2023 | 1,274449 |
1000 | King of Prussia | 08/05/2023 | 0 |
As you can see this product started with product 1200 price (1,16) only being available where it then switched to 1000 (1,34) becoming available for product 1000 as of 03/05/2023). At this point it should only show me the price of 1000 as it is the latest availabe price (even though the ProductNr is lower, we can ignore that actually, it should always take the latest availabe price within the ProdcutNr). Then on 06/005/2023 the 1000 Price is not available anymore and the new 1200 price is available and should be used.
So we end up with only this:
ProductNr | Location | Date | Price |
1200 | King of Prussia | 01/05/2023 | 1,16260000000000000000 |
1200 | King of Prussia | 02/05/2023 | 1,16260000000000000000 |
1000 | King of Prussia | 03/05/2023 | 1,34937899999999990000 |
1000 | King of Prussia | 04/05/2023 | 1,34937899999999990000 |
1000 | King of Prussia | 05/05/2023 | 1,34937899999999990000 |
1200 | King of Prussia | 06/05/2023 | 1,274449 |
1200 | King of Prussia | 07/05/2023 | 1,274449 |
1200 | King of Prussia | 08/05/2023 | 1,274449 |
HI, but if the dates are the same ¿which one is the latest? If you want to retrieve the minimun ProdcutNr when the date is the same use Min instead of Max:
Inner Join (DataTableName)
LOAD
Date,
Min(ProductNr) as ProductNr
Resident DataTableName
Where Price <>0
Group By Date;
Hi Ruben,
That is exactly what my question was haha. Apparently rows were being created for both product in the database. I have gotten the dataset changed now to the following:
ProductNr | Location | Date | Price |
1200 | King of Prussia | 04/08/2012 | 1.1624 |
1200 | King of Prussia | 31/08/2012 | 1.1552 |
1200 | King of Prussia | 19/09/2012 | 1.1654 |
1200 | King of Prussia | 30/09/2012 | 1.1625 |
1200 | King of Prussia | 18/10/2012 | 1.1626 |
1200 | King of Prussia | 31/12/2012 | 1.1626 |
1000 | King of Prussia | 23/01/2013 | 1.1518 |
1000 | King of Prussia | 31/01/2013 | 1.1626 |
1000 | King of Prussia | 01/03/2013 | 1.175 |
1000 | King of Prussia | 03/04/2013 | 1.1593 |
1000 | King of Prussia | 30/04/2013 | 1.1615 |
1000 | King of Prussia | 13/05/2013 | 1.1017 |
1000 | King of Prussia | 30/06/2013 | 1.1181 |
1200 | King of Prussia | 01/08/2013 | 1.1347 |
1200 | King of Prussia | 27/08/2013 | 1.1027 |
As you can see it is not possible for two Products to be available on the same date. Now the desired end table should have a Price for every date possible in the dataset. So I think the solution would be to create a EndDate based on Location/Date (so ignoring ProductNr). For example:
ProductNr | Location | Date | Price | EndDate |
1200 | King of Prussia | 04/08/2012 | 1.1624 | 30/08/2012 |
1200 | King of Prussia | 31/08/2012 | 1.1552 | 18/09/2012 |
1200 | King of Prussia | 19/09/2012 | 1.1654 | 29/09/2012 |
1200 | King of Prussia | 30/09/2012 | 1.1625 | 17/10/2012 |
1200 | King of Prussia | 18/10/2012 | 1.1626 | 30/12/2012 |
1200 | King of Prussia | 31/12/2012 | 1.1626 | 22/01/2013 |
1000 | King of Prussia | 23/01/2013 | 1.1518 | 30/01/2013 |
1000 | King of Prussia | 31/01/2013 | 1.1626 | 28/02/2013 |
1000 | King of Prussia | 01/03/2013 | 1.175 | 02/04/2013 |
1000 | King of Prussia | 03/04/2013 | 1.1593 | 29/04/2013 |
1000 | King of Prussia | 30/04/2013 | 1.1615 | 12/05/2013 |
1000 | King of Prussia | 13/05/2013 | 1.1017 | 29/06/2013 |
1000 | King of Prussia | 30/06/2013 | 1.1181 | 31/07/2013 |
1200 | King of Prussia | 01/08/2013 | 1.1347 | 26/08/2013 |
1200 | King of Prussia | 27/08/2013 | 1.1027 | 27/08/2013 |
With the Range between Date and EndDate we can then create a line per date with that specific price that was valid. You know what I mean? How can I achieve this?
ProductNr | Location | Date | Price | EndDate |
1200 | King of Prussia | 31/12/2012 | 1.1626 | 22/01/2013 |
1200 | King of Prussia | 01/01/2013 | 1.1626 | 22/01/2013 |
1200 | King of Prussia | 02/01/2013 | 1.1626 | 22/01/2013 |
1200 | King of Prussia | 03/01/2013 | 1.1626 | 22/01/2013 |
1200 | King of Prussia | 04/01/2013 | 1.1626 | 22/01/2013 |
. | ||||
. | ||||
1200 | King of Prussia | 22/01/2013 | 1.1626 | 22/01/2013 |
1000 | King of Prussia | 23/01/2013 | 1.1518 | 30/01/2013 |
1000 | King of Prussia | 24/01/2013 | 1.1518 | 30/01/2013 |
1000 | King of Prussia | 25/01/2013 | 1.1518 | 30/01/2013 |
1000 | King of Prussia | 26/01/2013 | 1.1518 | 30/01/2013 |
. | ||||
. |
Hi, to create the middle table you can do a sorted load by date descendant as use peek to retrieve the previous Date to create EndDate
DataWithEndDate:
LOAD
ProductNr,
Location,
Date,
Price
If(Location=Peek('Location')
,Date(Peek('Date')-1
,Date // or YearEnd(Date)
) as EndDate
Resident DataTableName
Order By Location, Date desc;
And from this table you can create all dates with:
DataWithAllDates:
NoConcatenate LOAD
ProductNr,
Location,
Date(Date+IterNo()-1) as Date,
Price,
DateEnd
Resident DataWithEndDate
While Date+IterNo()-1<=DateEnd;
DROP Table DataWithEndDate;
Hi Ruben,
Thanks this works for the small dataset with only one location. But once other Locations get added it does not seems to return correct values (EndDate always is the EndOfYear Date).
This you can just paste in Qlik Sense to see what I mean:
Instead of: