Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I hope someone can help me with this problem that I'm facing.. I have a lot of price files I need to create a lookup table of.
The price files have ~50k rows each, are dumped once per day, and looks like this:
ArtNum | Date | Price |
12345 | 20210420 | 1000 |
From these files, I want to generate a lookup table that looks like this:
ArtNum | DateFrom | DateTo | Price |
12345 | 20210101 | 20210211 | 870 |
12345 | 20210212 | 20210310 | 950 |
12345 | 20210311 | 20210401 | 1000 |
12345 | 20210402 | 20210421 | 950 |
Any tips on how I can achieve this?
Then I'm planning to use IntervalMatch to map up the correct price to a record.
Or if there is an easier/faster/alternative way to achieve what I'm trying to do, please let me know.
I've previously made a huge file where I created a key out of the ArtNum and Date to link up the correct price, but the file is 10's of millions of records, so I don't see that as a viable solution.
try this:
data:
load if(peek(Price)=Price, peek(RNUM), RowNo()) as RNUM, *;
load * inline [
ArtNum, Date, Price
12345, 1/1/2021, 100
12345, 1/2/2021, 100
12345, 1/3/2021, 100
12345, 1/4/2021, 101
12345, 1/5/2021, 101
12345, 1/6/2021, 100
];
load date(min(Date)) as DateFrom, date(max(Date)) as DateTo, RNUM, Price, ArtNum
resident data
group by RNUM, Price, ArtNum
;
drop table data;
the first part where there is an RNUM field just identifies contiguous data sets. it is possible for prices to go up and down - go back to the same price previously. since my solution does a min and max over price, if you dont identify non-contiguous data sets you will get overlapping start and end dates.
you can drop RNUM as it was just to allow for a proper group by but retaining it may help in testing.
this assumes prices can go back if not then you dont need the RNUM field. but if possible determining dates that go together should be done outside of QV (an ETL) as the peek function is slow
Heck, this should really be done in ELT
try this:
data:
load if(peek(Price)=Price, peek(RNUM), RowNo()) as RNUM, *;
load * inline [
ArtNum, Date, Price
12345, 1/1/2021, 100
12345, 1/2/2021, 100
12345, 1/3/2021, 100
12345, 1/4/2021, 101
12345, 1/5/2021, 101
12345, 1/6/2021, 100
];
load date(min(Date)) as DateFrom, date(max(Date)) as DateTo, RNUM, Price, ArtNum
resident data
group by RNUM, Price, ArtNum
;
drop table data;
the first part where there is an RNUM field just identifies contiguous data sets. it is possible for prices to go up and down - go back to the same price previously. since my solution does a min and max over price, if you dont identify non-contiguous data sets you will get overlapping start and end dates.
you can drop RNUM as it was just to allow for a proper group by but retaining it may help in testing.
this assumes prices can go back if not then you dont need the RNUM field. but if possible determining dates that go together should be done outside of QV (an ETL) as the peek function is slow
Heck, this should really be done in ELT
this is the result btw:
Works exactly like I wanted. Thank you so much, really appreciate your time and help. 😊
I was trying out the Previous(Price)=Price thing myself too, but couldn't quite get my head around how to fix the problem I was facing.
You are probably right with the ETL, but we don't have that kind of software (yet) in my company, so you use what you have. Hehe. Now let's see how quickly this runs with some million rows.
you are welcome.
the solution needs to be tweaked a little if you are talking about millions of rows. i would run this for the first run, get the maximum date (and here you need to profile the data and understand how the data is updated), save a QVD.
then for next runs, do incremental. don't reprocess everything. get only the ones that are > the date you saved. you may want to play it safe and reprocess the current month, or up to last week, depending on how the data changes.
of course this is assuming historical prices don't change which i think is reasonable
this was very interesting so i did a quick and dirty to test an idea.
the pattern is that you get updated pricing daily. it is tempting to just process the single date coming in (that will be simple) but you need to allow for breakdowns - what if the source wasn't able to run and sends you a catchup with 1 week of new data?
the solution below assumes you have a baseline from prior runs that have already digested prior data and transformed that data into start and end ranges. therefore you just need to decide which dates do i process today. if the answer is just starting from where prior runs left off - just find the last date.
then you loop through the dates you want to add (pretty much similar in logic to the peek - i cant think of a more complicated logic theres got to be a mathematical solution here but hey, its free advise after all)
you also need to figure out what the latest date range is per article number as you may want to add to it or create a new date range should the price be different.
the following code incorporates two sets of logic that need to be split but is here to demo the idea. the first logic should be done one time and second logic is your incremental process
data:
load if(peek(Price)=Price, peek(RNUM), RowNo()) as RNUM, *;
load * inline [
ArtNum, Date, Price
12345, 1/1/2021, 100
12345, 1/2/2021, 100
12345, 1/3/2021, 100
12345, 1/4/2021, 101
12345, 1/5/2021, 101
12345, 1/6/2021, 100
22345, 1/1/2021, 200
22345, 1/2/2021, 200
22345, 1/3/2021, 200
22345, 1/4/2021, 201
22345, 1/5/2021, 201
22345, 1/6/2021, 200
];
lookup:
load date(min(Date)) as DateFrom, date(max(Date)) as DateTo, RNUM, Price, ArtNum
resident data
group by RNUM, Price, ArtNum
;
//identify historical and latest Pricing:
left join (lookup)
load ArtNum, max(RNUM) as RNUM, 1 as Current
Resident lookup
group by ArtNum;
//build the baseline Historical table:
NoConcatenate
Historical:
load * Resident lookup where isnull(Current);
NoConcatenate
Current:
load * Resident lookup where Current=1;
config:
NoConcatenate
load max(RNUM) as lastRNUM, date(max(DateTo)) as lastDate
resident lookup;
let vLastRNUM = peek('lastRNUM', 0, config);
let vLastDate = peek('lastDate', 0, config);
drop table data, lookup;
NoConcatenate
newData:
load ArtNum, Date as NewDate, Price as NewPrice inline [
ArtNum, Date, Price
12345, 1/1/2021, 100
12345, 1/2/2021, 100
12345, 1/3/2021, 100
12345, 1/4/2021, 101
12345, 1/5/2021, 101
12345, 1/6/2021, 100
12345, 1/7/2021, 100
12345, 1/8/2021, 100
12345, 1/9/2021, 100
12345, 1/10/2021, 101
12345, 1/11/2021, 101
12345, 1/12/2021, 100
22345, 1/1/2021, 200
22345, 1/2/2021, 200
22345, 1/3/2021, 200
22345, 1/4/2021, 201
22345, 1/5/2021, 201
22345, 1/6/2021, 200
22345, 1/7/2021, 209
22345, 1/8/2021, 209
22345, 1/9/2021, 200
22345, 1/10/2021, 205
22345, 1/11/2021, 205
22345, 1/12/2021, 209
]
where Date>date('$(vLastDate)');
NoConcatenate
Dates:
load null() as Dt AutoGenerate(0);
let i=1;
FOR Each Dt in FieldValueList('NewDate')
left join (Current)
load * Resident newData where NewDate = Date('$(Dt)');
Concatenate (Historical)
load DateFrom, DateTo, RNUM, Price, ArtNum, 'Added' as Note Resident Current where Price <> NewPrice;
NoConcatenate
NewCurrent:
load DateFrom as DateFrom, NewDate as DateTo, RNUM as RNUM, Price as Price, ArtNum as ArtNum Resident Current where Price = NewPrice;
Concatenate (NewCurrent)
load NewDate as DateFrom, NewDate as DateTo, $(vLastRNUM) + Rowno() as RNUM, NewPrice as Price, ArtNum as ArtNum Resident Current where Price <> NewPrice;
drop table Current;
rename table NewCurrent to Current;
drop table config;
config:
NoConcatenate
load max(RNUM) as lastRNUM
resident Current;
let vLastRNUM = peek('lastRNUM', 0, config);
next;
rename table Historical to lookup;
concatenate (lookup) load *, 1 as Current resident Current;
drop table newData, Current;
ive tested it and it does work. at least this uses a lot of table joins and just loops through incremental dates which should be very few if systems are monitored correctly - which is better than looping through millions of records. good luck
this could be simplified:
NoConcatenate
NewCurrent:
load DateFrom as DateFrom, NewDate as DateTo, RNUM as RNUM, Price as Price, ArtNum as ArtNum Resident Current where Price = NewPrice;
Concatenate (NewCurrent)
load NewDate as DateFrom, NewDate as DateTo, $(vLastRNUM) + Rowno() as RNUM, NewPrice as Price, ArtNum as ArtNum Resident Current where Price <> NewPrice;
as a single load and just add if expressions in the fields.