Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data Redundancy

Hi guys,

I have a excel file A which contains 2015 and 2014 data.

Also, I have another excel file B which contains 2016 and 2015 data.

A .qvd file is generated by appending B to A.

As you might have guessed 2015 data is loaded twice.

how do I rectify this issue.

thanks in advance.

1 Solution

Accepted Solutions
swuehl
MVP
MVP

That's correct, if multiple lines for same Year exist.

Maybe it's better not going for the Year field, but for a primary key, something that identifies the record as being redundant (can't tell from the OP description).

LOAD

     Key,

     Year,

     Value

FROM A.xls (...);

LOAD

     Key,

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(Key);

View solution in original post

8 Replies
swuehl
MVP
MVP

You can use WHERE NOT EXISTS():

LOAD

     Year,

     Value

FROM A.xls (...);

LOAD

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(Year);

himanshi
Contributor III
Contributor III

you can use inner join which will give all the records once.

Regards,

Himanshi

Anonymous
Not applicable
Author

you may use like:

A_Excel:

Load

Year,

Value

From A.Excel;

let VMaxYear=max(Year);

B_Excel:

Load

*

From B.Excel where Year>$(VMaxYear);

jonathandienst
Partner - Champion III
Partner - Champion III

>>WHERE NOT EXISTS(Year);

Won't this load only the first line with Year, because after that the Year value does exist?

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
jonathandienst
Partner - Champion III
Partner - Champion III

You might want to do it like this:

LOAD

     Year,

     Year As LoadedYear,

     Value

FROM A.xls (...);

LOAD

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(LoadedYear, Year);

DROP Field LoadedYear;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
swuehl
MVP
MVP

That's correct, if multiple lines for same Year exist.

Maybe it's better not going for the Year field, but for a primary key, something that identifies the record as being redundant (can't tell from the OP description).

LOAD

     Key,

     Year,

     Value

FROM A.xls (...);

LOAD

     Key,

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(Key);

maxgro
MVP
MVP

replace my load inline with your excel load

A:

load year, value, year as year2 inline [

year, value

2014,1

2014,2

2015,1

2015,2

];

Concatenate (A)

load * inline [

year, value

2016,1

2016,2

2015,1

2015,2

]

Where not Exists(year2, year)

;

DROP Field year2;

Not applicable
Author

thanks a lot for you help!!