Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a source table that records every time a price changes for an item that looks like this:
Product | Zone | Date | Price |
A | X | 5/1/2010 | 3 |
A | X | 6/1/2010 | 5 |
A | X | 8/1/2010 | 4 |
B | X | 4/1/2010 | 1 |
B | X | 6/1/2010 | 3 |
A | Y | 5/1/2010 | 3 |
A | Y | 8/1/2010 | 6 |
I need to end up with a table that only records the dimensions and the amount the price changed by:
Product | Zone | Date | PriceChange |
A | X | 6/1/2010 | 2 |
A | X | 8/1/2010 | -1 |
B | X | 6/1/2010 | 2 |
A | Y | 8/1/2010 | 3 |
Now, this is working fine using FOR - NEXT LOOPS with PEEK and PREVIOUS functions, but it is horribly inefficient. Now that we are scaling to a production database (from 35,000 to 15 million price changes), I do not have 3 weeks to refresh the data. And doing some sort of incremental load would be difficult as well, because each final price change record depends on 2 rows of data.
Does anybody have a more efficient way of accomplishing this?
Hi Aaron, there's no need to use for loops. The following code, is the best solution that came to my mind:
Prices:
load * inline [
Product, Zone, Date, Price
A, X, 5/1/2010, 3
A, X, 6/1/2010, 5
A, X, 8/1/2010, 4
B, X, 4/1/2010, 1
B, X, 6/1/2010, 3
A, Y, 5/1/2010, 3
A, Y, 8/1/2010, 6
];
store Prices into Prices.qvd;
drop table Prices;
Prices:
Load *,
if(rangemaxstring(previous(Product)) <> Product or rangemaxstring(previous(Zone)) <> Zone,1,0) as FlagDelete,
rangesum(-previous(Price),Price) as PriceDelta
from Prices.qvd (qvd);
Where:
right keep(Prices)
load 0 as FlagDelete
autogenerate(1);
drop table Where;
Regards
Well if you just needed the last price minus the first price you could do it with a Group By. But since you care about each price change you need an Order By. Something like the following:
LOAD
if(Product=previous(Product),
if(Price<>Previous(Price),Product)
) as Product,
if(Product=previous(Product),
if(Price<>Previous(Price),Zone)
) as Zone,
if(Product=previous(Product),
if(Price<>Previous(Price),Date)
) as Date,
if(Product=previous(Product),
if(Price<>Previous(Price),Price-peek(Price))
) as PriceChange
RESIDENT data
ORDER BY Product,Date;
Regards,
I forgot to mention that you have to include a "group by Product, Zone, Date" when you load the Prices table for the first time, eventhough I guess you already knew that.
Regards
Hi
Why do you use FOR-NEXT and PEEK to do that?
Woulden't the following be sufficent?
LOAD
BR
Hans
*,Price-Price_prev as PriceChange;Jeez, I was complicating this WAY TOO MUCH. I am sure it is the nested FOR/NEXT loops that are slowing it down. Vlad is correct: I do need each individual price change, not just the first minus the last. Your solution is so simple. Sometimes I think too much.
I will give your code a shot.
Thanks,
Aaron
Vlad's approach worked like a charm. I had to add one more nested IF to handle the change in price zone and I had to add a field for price, since the PEEK looks for the price in the resulting file to calculate the price change. Here was the finished code that did the trick:
PriceChange2:
LOAD IF(ProdOrAggSID=PREVIOUS(ProdOrAggSID),
IF(PriceZoneSID=PREVIOUS(PriceZoneSID),
IF(Price<>PREVIOUS(Price),ProdOrAggSID))) AS ProdOrAggSID,
IF(ProdOrAggSID=PREVIOUS(ProdOrAggSID),
IF(PriceZoneSID=PREVIOUS(PriceZoneSID),
IF(Price<>PREVIOUS(Price),PriceZoneSID))) AS PriceZoneSID,
IF(ProdOrAggSID=PREVIOUS(ProdOrAggSID),
IF(PriceZoneSID=PREVIOUS(PriceZoneSID),
IF(Price<>PREVIOUS(Price),Date))) AS Date,
IF(ProdOrAggSID=PREVIOUS(ProdOrAggSID),
IF(PriceZoneSID=PREVIOUS(PriceZoneSID),
IF(Price<>PREVIOUS(Price),Price-PEEK(Price)))) AS PriceChange,
IF(ProdOrAggSID=PREVIOUS(ProdOrAggSID),
IF(PriceZoneSID=PREVIOUS(PriceZoneSID),
IF(Price<>PREVIOUS(Price),Price))) AS Price
RESIDENT PriceChange1
ORDER BY ProdOrAggSID,PriceZoneSID,Date;
Thanks again for everybody's suggestions.
Aaron