Not easy to describe the issue, but I'll do my best to keep it simple:
Our data system drops a .csv file every day with product information such as product reference no., price, stock amount, etc in a folder. I would like to pick it up every day and add it to the previous day's file, building up historic data by doing so. For example:
The 20170601 csv file:
The 20170602 csv file:
The 20170603 csv file:
The easiest solution would be to add a date stamp and to concatenate the files:
However, the daily file is about 23 columns wide and 130.000 rows long. Appending/concatenating this every single day would lead up to a lot of duplicate rows (except for the date field) and a huge file size. To prevent this, the ideal situation would be if I could get to an end result similar to:
Extra tricky is: A product could have an initial stock of 50 items on 20170601, could slowly be depleted (stock = 0 at 20171231) and be replenished to a stock of 50 on 20180101. In this case a new record needs to be inserted on 20180101 (and not an edit of the existing 20170601 record's "to"-date, even though all fields except the date would be the same).
I've been looking into using hash functions to add a unique key to accomplish this, and though it has helped me to get a bit closer to a solution, I can't seem to wrap my head around this issue. Is this even at all feasible to begin with? And if so, could you point me into the right direction?
All help is very much appreciated!