Skip to main content
Announcements
Save $600 on Qlik Connect registration! Sign up by Dec. 6 to get an extra $100 off with code CYBERSAVE: REGISTER
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Incremental Load stock & price data

Hi guys,

Not easy to describe the issue, but I'll do my best to keep it simple:
Our data system drops a .csv file every day with product information such as product reference no., price, stock amount, etc in a folder. I would like to pick it up every day and add it to the previous day's file, building up historic data by doing so. For example:

The 20170601 csv file:

ProductRefPriceAmount
10010001€100.0050
10010002€49.9923
10020004€150.0011

The 20170602 csv file:

ProductRefPriceAmount
10010001€100.0048
10010002€49.9923
10020004€145.0011

The 20170603 csv file:

ProductRefPriceAmount
10010001€100.0047
10010002€49.9922
10020004€145.0011

The easiest solution would be to add a date stamp and to concatenate the files:

DateProductRefPriceAmount
2017060110010001€100.0050
2017060110010002€49.9923
2017060110020004€150.0011
2017060210010001€100.0048
2017060210010002€49.9923
2017060210020004€145.0011
2017060310010001€100.0047
2017060310010002€49.9922
2017060310020004€145.0011

However, the daily file is about 23 columns wide and 130.000 rows long. Appending/concatenating this every single day would lead up to a lot of duplicate rows (except for the date field) and a huge file size. To prevent this, the ideal situation would be if I could get to an end result similar to:

FromToProductRefPriceAmount
201706012017060110010001€100.0050
201706022017060210010001€100.0048
201706032017060310010001€100.0047
201706012017060210010002€49.9923
201706032017060310010002€49.9922
201706012017060110020004€150.0011
201706022017060310020004€145.0011

Extra tricky is: A product could have an initial stock of 50 items on 20170601, could slowly be depleted (stock = 0 at 20171231) and be replenished to a stock of 50 on 20180101. In this case a new record needs to be inserted on 20180101 (and not an edit of the existing 20170601 record's "to"-date, even though all fields except the date would be the same).

I've been looking into using hash functions to add a unique key to accomplish this, and though it has helped me to get a bit closer to a solution, I can't seem to wrap my head around this issue. Is this even at all feasible to begin with? And if so, could you point me into the right direction?

All help is very much appreciated!

Cheers!

1 Solution

Accepted Solutions
effinty2112
Master
Master

Hi Wouter,

I think you'll need to decide what you want to get out of the data that you're going to store. You'll need to decide which of your 23 columns has data you want to keep and hopefully you can disregard some them. Then from the columns you can't discard maybe there are some that you can summarise, that is aggregate over time periods. Some data you might need to keep a daily track of, some you might aggregate over weeks or longer.

So it's possible that instead of building up one big table the best solution would be to create a number of tables, some aggregated over weeks, some aggregated over months and some not aggregated at all.

You may even only add records to your table(s) that show a change in a value from the previous day.

Sounds like fun!

Good luck

Andrew

View solution in original post

3 Replies
Gysbert_Wassenaar

Perhaps like this:

MyTable:

LOAD

     min(Date) as From,

     max(Date) as To,

     ProductRef,

     Price,

     Amount

FROM

    (txt, ..etc..)

GROUP BY

     ProductRef,

     Price,

     Amount

     ;


talk is cheap, supply exceeds demand
effinty2112
Master
Master

Hi Wouter,

I think you'll need to decide what you want to get out of the data that you're going to store. You'll need to decide which of your 23 columns has data you want to keep and hopefully you can disregard some them. Then from the columns you can't discard maybe there are some that you can summarise, that is aggregate over time periods. Some data you might need to keep a daily track of, some you might aggregate over weeks or longer.

So it's possible that instead of building up one big table the best solution would be to create a number of tables, some aggregated over weeks, some aggregated over months and some not aggregated at all.

You may even only add records to your table(s) that show a change in a value from the previous day.

Sounds like fun!

Good luck

Andrew

Anonymous
Not applicable
Author

Hi guys,

Thanks for your help. I have decided to go for Andrew's approach; this was the fastest and easiest to accomplish and the resulting .qvd file is not as big as I'd expect (17 MB for 10 days of data), with a decreasing daily file size (+3.5 MB in the first 3 days, +1.5 MB in the last 2 days)