3 Replies Latest reply: Jul 10, 2017 9:29 AM by Wouter Bulen RSS

    Incremental Load stock & price data

    Wouter Bulen

      Hi guys,

       

      Not easy to describe the issue, but I'll do my best to keep it simple:
      Our data system drops a .csv file every day with product information such as product reference no., price, stock amount, etc in a folder. I would like to pick it up every day and add it to the previous day's file, building up historic data by doing so. For example:

       

      The 20170601 csv file:

      ProductRefPriceAmount
      10010001€100.0050
      10010002€49.9923
      10020004€150.0011

       

      The 20170602 csv file:

      ProductRefPriceAmount
      10010001€100.0048
      10010002€49.9923
      10020004€145.0011

       

      The 20170603 csv file:

      ProductRefPriceAmount
      10010001€100.0047
      10010002€49.9922
      10020004€145.0011

       

      The easiest solution would be to add a date stamp and to concatenate the files:

      DateProductRefPriceAmount
      2017060110010001€100.0050
      2017060110010002€49.9923
      2017060110020004€150.0011
      2017060210010001€100.0048
      2017060210010002€49.9923
      2017060210020004€145.0011
      2017060310010001€100.0047
      2017060310010002€49.9922
      2017060310020004€145.0011

       

      However, the daily file is about 23 columns wide and 130.000 rows long. Appending/concatenating this every single day would lead up to a lot of duplicate rows (except for the date field) and a huge file size. To prevent this, the ideal situation would be if I could get to an end result similar to:

       

      FromToProductRefPriceAmount
      201706012017060110010001€100.0050
      201706022017060210010001€100.0048
      201706032017060310010001€100.0047
      201706012017060210010002€49.9923
      201706032017060310010002€49.9922
      201706012017060110020004€150.0011
      201706022017060310020004€145.0011

       

      Extra tricky is: A product could have an initial stock of 50 items on 20170601, could slowly be depleted (stock = 0 at 20171231) and be replenished to a stock of 50 on 20180101. In this case a new record needs to be inserted on 20180101 (and not an edit of the existing 20170601 record's "to"-date, even though all fields except the date would be the same).

       

      I've been looking into using hash functions to add a unique key to accomplish this, and though it has helped me to get a bit closer to a solution, I can't seem to wrap my head around this issue. Is this even at all feasible to begin with? And if so, could you point me into the right direction?

       

      All help is very much appreciated!

       

      Cheers!

        • Re: Incremental Load stock & price data
          Gysbert Wassenaar

          Perhaps like this:

           

          MyTable:

          LOAD

               min(Date) as From,

               max(Date) as To,

               ProductRef,

               Price,

               Amount

          FROM

              [D:\CSV_Files\*.csv] (txt, ..etc..)

          GROUP BY

               ProductRef,

               Price,

               Amount

               ;

          • Re: Incremental Load stock & price data
            Andrew Walker

            Hi Wouter,

            I think you'll need to decide what you want to get out of the data that you're going to store. You'll need to decide which of your 23 columns has data you want to keep and hopefully you can disregard some them. Then from the columns you can't discard maybe there are some that you can summarise, that is aggregate over time periods. Some data you might need to keep a daily track of, some you might aggregate over weeks or longer.

            So it's possible that instead of building up one big table the best solution would be to create a number of tables, some aggregated over weeks, some aggregated over months and some not aggregated at all.

            You may even only add records to your table(s) that show a change in a value from the previous day.

            Sounds like fun!

            Good luck

             

            Andrew

            • Re: Incremental Load stock & price data
              Wouter Bulen

              Hi guys,

               

              Thanks for your help. I have decided to go for Andrew's approach; this was the fastest and easiest to accomplish and the resulting .qvd file is not as big as I'd expect (17 MB for 10 days of data), with a decreasing daily file size (+3.5 MB in the first 3 days, +1.5 MB in the last 2 days)