4 Replies Latest reply: Feb 15, 2014 2:07 PM by Massimo Grossi RSS

    Concatenation - De - duplication of Rows

    Richard Cripps

      Hello

       

      I guess this question stems around the behaviour of the concatenate function, but I was wondering if any one had seen anything similar.

       

      I have a piece of script that loads in some data (c. 60 million rows) from a QVD and then takes some data from an existing resident table (c. 85k rows) and concatenates it to the data from the qvd. I would expect the result of this piece of script to be a table with a total number of rows that is the number from the QVD and the number from the resident table i.e. 60 million + 85k. However, the result is a table with around 40 million rows in it.

       

      Below is the script in question.

       

      UserAction:
      LOAD  join.Subscriber,
           join.ResourceCode,
           join.UserActionType,
           Month,
        Year,
        YearMonth,
        Week,
        YearWeek,
        Hour,
        Day,
        join.ResourceCommentID,
        ActionCount
      FROM
      [UserAction.qvd]
      (qvd);

      CONCATENATE(UserAction)  
      LOAD DISTINCT Resource.SITEUSERCODE AS join.Subscriber,
        join.ResourceCode,
        99 AS join.UserActionType,
         MONTH(join.PubDate) AS Month,
        YEAR(join.PubDate) AS Year,
        YEAR(join.PubDate) & NUM(MONTH(join.PubDate),'00') AS YearMonth,
        WEEK(join.PubDate) AS Week,
        YEAR(join.PubDate) & NUM(WEEK(join.PubDate),'00') AS YearWeek,
        HOUR(join.PubDate) AS Hour,
        DAY(join.PubDate) AS Day,
        null() AS join.ResourceCommentID,
         1 AS ActionCount
         Resident Resource;

       

      So what I did was to amend my code so that when loading from the QVD it uses the recno() function to basically make every row unique. And this resulted in a table with the number of rows that I would expect, see the code below.

       

      UserAction:
      LOAD recno() AS REC,
        join.Subscriber,
           join.ResourceCode,
           join.UserActionType,
           Month,
        Year,
        YearMonth,
        Week,
        YearWeek,
        Hour,
        Day,
        join.ResourceCommentID,
        ActionCount
      FROM
      [UserAction.qvd]
      (qvd);

      CONCATENATE(UserAction)  
      LOAD DISTINCT Resource.SITEUSERCODE AS join.Subscriber,
        join.ResourceCode,
        99 AS join.UserActionType,
         MONTH(join.PubDate) AS Month,
        YEAR(join.PubDate) AS Year,
        YEAR(join.PubDate) & NUM(MONTH(join.PubDate),'00') AS YearMonth,
        WEEK(join.PubDate) AS Week,
        YEAR(join.PubDate) & NUM(WEEK(join.PubDate),'00') AS YearWeek,
        HOUR(join.PubDate) AS Hour,
        DAY(join.PubDate) AS Day,
        null() AS join.ResourceCommentID,
         1 AS ActionCount
         Resident Resource;

       

      Has anyone else seen any similar behaviour? Is the concatenation function supposed to behave like this?

       

      Thanks