Solved: Re: Incremental Load stock & price data - Qlik Community

Anonymous · ‎2017-06-30

Hi guys,

Not easy to describe the issue, but I'll do my best to keep it simple:
Our data system drops a .csv file every day with product information such as product reference no., price, stock amount, etc in a folder. I would like to pick it up every day and add it to the previous day's file, building up historic data by doing so. For example:

The 20170601 csv file:

ProductRef	Price	Amount
10010001	€100.00	50
10010002	€49.99	23
10020004	€150.00	11

The 20170602 csv file:

ProductRef	Price	Amount
10010001	€100.00	48
10010002	€49.99	23
10020004	€145.00	11

The 20170603 csv file:

ProductRef	Price	Amount
10010001	€100.00	47
10010002	€49.99	22
10020004	€145.00	11

The easiest solution would be to add a date stamp and to concatenate the files:

Date	ProductRef	Price	Amount
20170601	10010001	€100.00	50
20170601	10010002	€49.99	23
20170601	10020004	€150.00	11
20170602	10010001	€100.00	48
20170602	10010002	€49.99	23
20170602	10020004	€145.00	11
20170603	10010001	€100.00	47
20170603	10010002	€49.99	22
20170603	10020004	€145.00	11

However, the daily file is about 23 columns wide and 130.000 rows long. Appending/concatenating this every single day would lead up to a lot of duplicate rows (except for the date field) and a huge file size. To prevent this, the ideal situation would be if I could get to an end result similar to:

From	To	ProductRef	Price	Amount
20170601	20170601	10010001	€100.00	50
20170602	20170602	10010001	€100.00	48
20170603	20170603	10010001	€100.00	47
20170601	20170602	10010002	€49.99	23
20170603	20170603	10010002	€49.99	22
20170601	20170601	10020004	€150.00	11
20170602	20170603	10020004	€145.00	11

Extra tricky is: A product could have an initial stock of 50 items on 20170601, could slowly be depleted (stock = 0 at 20171231) and be replenished to a stock of 50 on 20180101. In this case a new record needs to be inserted on 20180101 (and not an edit of the existing 20170601 record's "to"-date, even though all fields except the date would be the same).

I've been looking into using hash functions to add a unique key to accomplish this, and though it has helped me to get a bit closer to a solution, I can't seem to wrap my head around this issue. Is this even at all feasible to begin with? And if so, could you point me into the right direction?

All help is very much appreciated!

Cheers!

effinty2112 · ‎2017-06-30

Hi Wouter,

I think you'll need to decide what you want to get out of the data that you're going to store. You'll need to decide which of your 23 columns has data you want to keep and hopefully you can disregard some them. Then from the columns you can't discard maybe there are some that you can summarise, that is aggregate over time periods. Some data you might need to keep a daily track of, some you might aggregate over weeks or longer.

So it's possible that instead of building up one big table the best solution would be to create a number of tables, some aggregated over weeks, some aggregated over months and some not aggregated at all.

You may even only add records to your table(s) that show a change in a value from the previous day.

Sounds like fun!

Good luck

Andrew

View solution in original post

Gysbert_Wassenaar · ‎2017-06-30

Perhaps like this:

MyTable:

LOAD

min(Date) as From,

max(Date) as To,

ProductRef,

Price,

Amount

FROM

(txt, ..etc..)

GROUP BY