Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Load data without duplicate records - multiple columns

Hi,

i want to load tables, but without duplicate records. The value in column Product + Customer must be unique. It should be load always the record with the highes SaleNumber.

How can i do it?

Example - the red marked records shouldn´t load:

2017-01-20_16h57_13.jpg

Regards,

sam


1 Solution

Accepted Solutions
petter
Partner - Champion III
Partner - Champion III

You can do it in a single LOAD:

LOAD

    Customer,

    Product,

    Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

    Max(SaleNumber) AS SaleNumber

FROM

  [lib://DATA/mehrere-spalten-vergleichen.xlsx]

  (ooxml, embedded labels, table is Tabelle1)

GROUP BY

  Product, Customer

;

It does not depend on a certain sort order from the source data.

View solution in original post

30 Replies
sunny_talwar

May be this:

Table:

LOAD Date,

    Product,

    SaleNumber,

    Customer

FROM

[..\..\Downloads\mehrere-spalten-vergleichen.xlsx]

(ooxml, embedded labels, table is Tabelle1);

Right Join (Table)

LOAD Product,

  Customer,

  Max(SaleNumber) as SaleNumber

Resident Table

Group By Product, Customer;

Anonymous
Not applicable
Author

Hi,

i will try it, thanks.

Is it possible to do that by only 1 load command, directly in the 1st step? Because i have a csv file and its very big (about 500 MB). I want to load less data from this csv. In your solution i would load all data.

Is it correct, that if i load this csv without duplicates like the description in my first thread here. that Qlik Sense need less time to load? Need less memory?

Regards,

sam

sunny_talwar

If the data is sorted in this manner

1st by Product,

2nd by Customer

3rd by SaleNumber descending order

then you can do it using a single load... else I don't think you will be able to do this another way

petter
Partner - Champion III
Partner - Champion III

You can do it in a single LOAD:

LOAD

    Customer,

    Product,

    Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

    Max(SaleNumber) AS SaleNumber

FROM

  [lib://DATA/mehrere-spalten-vergleichen.xlsx]

  (ooxml, embedded labels, table is Tabelle1)

GROUP BY

  Product, Customer

;

It does not depend on a certain sort order from the source data.

sunny_talwar

Totally forgot about this

petter
Partner - Champion III
Partner - Champion III

  Wasn't sure myself so I had to try it out ...

Anonymous
Not applicable
Author

Thanks a lot!

Why do you do this? Is it important?

Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

Because, i try it without this line...

LOAD

    Product,

    Customer,

    //Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

    Max(SaleNumber) AS SaleNumber     

FROM [lib://concatenate/mehrere-spalten-vergleichen.xlsx]

(ooxml, embedded labels, table is Tabelle1)

Group By Product, Customer

;

... and get the correct result. But perhaps it was a coincidence?!

If i try this code, too:

LOAD

    Product,

    Customer,

    //Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

    //Max(SaleNumber) AS SaleNumber     

    SaleNumber

FROM [lib://concatenate/mehrere-spalten-vergleichen.xlsx]

(ooxml, embedded labels, table is Tabelle1)

Group By Product, Customer

;

and get an error:

2017-01-20_20h56_58.jpg

But don´t really understand why.

Regards,

sam

sunny_talwar

You only need Group By clause when you are aggregating....

LOAD 

    Product, 

    Customer, 

    //Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

    //Max(SaleNumber) AS SaleNumber       

    SaleNumber 

FROM [lib://concatenate/mehrere-spalten-vergleichen.xlsx] 

(ooxml, embedded labels, table is Tabelle1) 

Group By Product, Customer;


Since you commented out both your aggregations, you don't need Group By any more


LOAD 

    Product, 

    Customer, 

    //Date(FirstSortedValue( Date, -SaleNumber)) AS Date,

    //Max(SaleNumber) AS SaleNumber       

    SaleNumber 

FROM [lib://concatenate/mehrere-spalten-vergleichen.xlsx] 

(ooxml, embedded labels, table is Tabelle1);

Anonymous
Not applicable
Author

Sorry, but i got the next problem and question.

Now i try to create some fields in the load script and use "Group by" for these fields and get an error.

LOAD

    Date,

    Product,

    Customer,

    Max(SaleNumber) AS SaleNumber,    

    month (Date) as sMonth,

    SubstringCount(Customer, ' ')+1 as countProductWords

FROM [lib://concatenate/mehrere-spalten-vergleichen.xlsx]

(ooxml, embedded labels, table is Tabelle1)

Group By sMonth, countProductWords;


2017-01-20_21h22_05.jpg


Is it possible to fix it?


regards,

sam