Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data Redundancy

Hi guys,

I have a excel file A which contains 2015 and 2014 data.

Also, I have another excel file B which contains 2016 and 2015 data.

A .qvd file is generated by appending B to A.

As you might have guessed 2015 data is loaded twice.

how do I rectify this issue.

thanks in advance.

1 Solution

Accepted Solutions
swuehl
MVP
MVP

That's correct, if multiple lines for same Year exist.

Maybe it's better not going for the Year field, but for a primary key, something that identifies the record as being redundant (can't tell from the OP description).

LOAD

     Key,

     Year,

     Value

FROM A.xls (...);

LOAD

     Key,

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(Key);

View solution in original post

8 Replies
swuehl
MVP
MVP

You can use WHERE NOT EXISTS():

LOAD

     Year,

     Value

FROM A.xls (...);

LOAD

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(Year);

himanshi
Contributor III
Contributor III

you can use inner join which will give all the records once.

Regards,

Himanshi

Anonymous
Not applicable
Author

you may use like:

A_Excel:

Load

Year,

Value

From A.Excel;

let VMaxYear=max(Year);

B_Excel:

Load

*

From B.Excel where Year>$(VMaxYear);

jonathandienst
Partner - Champion III
Partner - Champion III

>>WHERE NOT EXISTS(Year);

Won't this load only the first line with Year, because after that the Year value does exist?

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
jonathandienst
Partner - Champion III
Partner - Champion III

You might want to do it like this:

LOAD

     Year,

     Year As LoadedYear,

     Value

FROM A.xls (...);

LOAD

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(LoadedYear, Year);

DROP Field LoadedYear;

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
swuehl
MVP
MVP

That's correct, if multiple lines for same Year exist.

Maybe it's better not going for the Year field, but for a primary key, something that identifies the record as being redundant (can't tell from the OP description).

LOAD

     Key,

     Year,

     Value

FROM A.xls (...);

LOAD

     Key,

     Year,

     Value

FROM B.xls (...)

WHERE NOT EXISTS(Key);

maxgro
MVP
MVP

replace my load inline with your excel load

A:

load year, value, year as year2 inline [

year, value

2014,1

2014,2

2015,1

2015,2

];

Concatenate (A)

load * inline [

year, value

2016,1

2016,2

2015,1

2015,2

]

Where not Exists(year2, year)

;

DROP Field year2;

Not applicable
Author

thanks a lot for you help!!