Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
richardcripps
Partner - Contributor III
Partner - Contributor III

Concatenation - De - duplication of Rows

Hello

I guess this question stems around the behaviour of the concatenate function, but I was wondering if any one had seen anything similar.

I have a piece of script that loads in some data (c. 60 million rows) from a QVD and then takes some data from an existing resident table (c. 85k rows) and concatenates it to the data from the qvd. I would expect the result of this piece of script to be a table with a total number of rows that is the number from the QVD and the number from the resident table i.e. 60 million + 85k. However, the result is a table with around 40 million rows in it.

Below is the script in question.

UserAction:
LOAD  join.Subscriber,
     join.ResourceCode,
     join.UserActionType,
     Month,
  Year,
  YearMonth,
  Week,
  YearWeek,
  Hour,
  Day,
  join.ResourceCommentID,
  ActionCount
FROM
[UserAction.qvd]
(qvd);

CONCATENATE(UserAction)  
LOAD DISTINCT Resource.SITEUSERCODE AS join.Subscriber,
  join.ResourceCode,
  99 AS join.UserActionType,
   MONTH(join.PubDate) AS Month,
  YEAR(join.PubDate) AS Year,
  YEAR(join.PubDate) & NUM(MONTH(join.PubDate),'00') AS YearMonth,
  WEEK(join.PubDate) AS Week,
  YEAR(join.PubDate) & NUM(WEEK(join.PubDate),'00') AS YearWeek,
  HOUR(join.PubDate) AS Hour,
  DAY(join.PubDate) AS Day,
  null() AS join.ResourceCommentID,
   1 AS ActionCount
   Resident Resource;

So what I did was to amend my code so that when loading from the QVD it uses the recno() function to basically make every row unique. And this resulted in a table with the number of rows that I would expect, see the code below.

UserAction:
LOAD recno() AS REC,
  join.Subscriber,
     join.ResourceCode,
     join.UserActionType,
     Month,
  Year,
  YearMonth,
  Week,
  YearWeek,
  Hour,
  Day,
  join.ResourceCommentID,
  ActionCount
FROM
[UserAction.qvd]
(qvd);

CONCATENATE(UserAction)  
LOAD DISTINCT Resource.SITEUSERCODE AS join.Subscriber,
  join.ResourceCode,
  99 AS join.UserActionType,
   MONTH(join.PubDate) AS Month,
  YEAR(join.PubDate) AS Year,
  YEAR(join.PubDate) & NUM(MONTH(join.PubDate),'00') AS YearMonth,
  WEEK(join.PubDate) AS Week,
  YEAR(join.PubDate) & NUM(WEEK(join.PubDate),'00') AS YearWeek,
  HOUR(join.PubDate) AS Hour,
  DAY(join.PubDate) AS Day,
  null() AS join.ResourceCommentID,
   1 AS ActionCount
   Resident Resource;

Has anyone else seen any similar behaviour? Is the concatenation function supposed to behave like this?

Thanks

1 Solution

Accepted Solutions
simon_hallworth
Contributor II
Contributor II

Hi Richard,

This is the way that concatenate works in QlikView. The DISTINCT will be applied to the whole of the UserAction table even though it is only defined in the second statement. When I originally came across this behaviour I used the same approach as you to work around it, i.e. create a unique row ID in the first table.

I have not clarified this behaviour with QlikTech support but it was highlighted as being WAD by one of their UK consultants some time ago.

Kind regards,

Simon

View solution in original post

4 Replies
simon_hallworth
Contributor II
Contributor II

Hi Richard,

This is the way that concatenate works in QlikView. The DISTINCT will be applied to the whole of the UserAction table even though it is only defined in the second statement. When I originally came across this behaviour I used the same approach as you to work around it, i.e. create a unique row ID in the first table.

I have not clarified this behaviour with QlikTech support but it was highlighted as being WAD by one of their UK consultants some time ago.

Kind regards,

Simon

richardcripps
Partner - Contributor III
Partner - Contributor III
Author

Cheers Simon!

I would prefer it if I could avoid having the row id in the document to reduce the size of it.

However it doesn't look like it is possible.

I tried to drop the field after I had done the concatenation and that reduced the number of rows in my table as well.

So I will have to leave it in.

Not applicable

I just noticed this as well after doing multiple concatenations so now I am placing a RecNo or RowNo all the time on concatenated tables to prevent a distinct from happening

maxgro
MVP
MVP

if you need a distinct only on the small table Resource I think you can (without recno / rowno)

TmpResource:

load distinct ......Resident Resource;

UserAction:

noconcatenate load * ........... FROM [UserAction.qvd] (qvd);

concatenate (UserAction) load * resident Resource;

drop table Resource;