Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Conditional elimination of duplicates in script

Hello,

I have two columns of concern in a qvw file: VIN and Trim Level.  In some cases, Trim Level is listed as 'Unknown'.  There are two possibilities for these cases: 1. VIN is unique or 2. VIN is duplicated on another row, with the other row showing the correct Trim Level (e.g. AA in example below).

In case 1. I want to keep these unique VIN's (even though Trim is 'Unknown').

In case 2. I want to eliminate the rows with 'Unknown' and keep the rows with the correct Trim Level.

In all cases of duplicates, all other columns (not discussed here) are the same.

Ex.

VIN        TRIM            Want

X1          AA                Keep

X1          Unknown      Eliminate (replace by above)

X2          BB                Keep

X3          Unknown      Keep

Ideally, I would like to accomplish this in the Script so the effect cascades to all other sheet elements.

Thanks!

1 Solution

Accepted Solutions
m_woolf
Master II
Master II

The Want column is what allows you to identify the records that you want to keep.

After creating the Want column, you do a resident load on that table where Want = 'Keep'

Then you can drop the Want Field.

See the attached qvw.

View solution in original post

7 Replies
m_woolf
Master II
Master II

Order your table by VIN and then by TRIM

Load

     *,

     if(VIN)=peek('VIN') and TRIM = 'Unknown','Eliminate','Keep') as Want

resident your table

Then resident load where Want = 'Keep'.

Anonymous
Not applicable
Author

Thank you very much for the response.  Perhaps I need to clarify... the 'Want' column listed above is not an additional desired column, but instead indicates the outcome I would like to see in each scenario.

I want to keep every instance where VIN is not duplicated, and I want to keep only one instance when VIN is duplicated (in this case, I want to keep the instance where the Trim Level is defined (i.e. not 'Unknown')).

Could you please offer another suggestion?

Thank you.

m_woolf
Master II
Master II

The Want column is what allows you to identify the records that you want to keep.

After creating the Want column, you do a resident load on that table where Want = 'Keep'

Then you can drop the Want Field.

See the attached qvw.

rubenmarin

Hi Noelle, another option with Exists:

// Load TRIM<>'Unknown'

Final:

LOAD *,

     VIN as chkVIN

Resident Table

Where TRIM<>'Unknown' and not exists('chkVIN',VIN);


// Add records that only has TRIM='Unknown'

Concatenate (Final)

LOAD *,

     VIN as chkVIN

Resident Table

Where TRIM='Unknown' and not exists('chkVIN', VIN);

DROP Field chkVIN;

Anonymous
Not applicable
Author

Hello m w,

Thank you for the additional response.  I understand what you are suggesting with the Resident tables.  Unfortunately, I am not able to get the script to fully run, with the following errors:

Table not found

VINSort:

...

Table not found

Drop table statement

...

Table not found

EliminateDups:

Table not found

Drop table statement

Here is the script I'm using:

Initial:
LOAD vin,
trim_level,
(all my other variables)

FROM
(
qvx);

VINSort:
NoConcatenate
LOAD
vin,
trim_level,
(all my other variables),
if(vin=peek('vin') and trim_level = 'Unknown','Eliminate','Keep') as Want
resident EliminateDups;
drop table EliminateDups;

EliminateDups:
NoConcatenate
LOAD  vin,
trim_level
Resident VINSort
where Want = 'Keep';
DROP table VINSort;

exit script;

Are you able to spot my error?

Thanks again very much for your time.

sunny_talwar

Shouldn't this be Resident from Initial, rather than from ElminateDups

Initial:
LOAD vin,
     trim_level,
     (all my other variables)
FROM (qvx);

VINSort:
NoConcatenate
LOAD vin,
     trim_level,
     (all my other variables),
     if(vin=peek('vin') and trim_level = 'Unknown','Eliminate','Keep') as Want
Resident EliminateDups Initial;
DROP Table EliminateDups Initial;

EliminateDups:
NoConcatenate
LOAD  vin,
     trim_level
Resident VINSort
Where Want = 'Keep';


DROP Table VINSort;

EXIT Script;

Anonymous
Not applicable
Author

Thank you all for the responses.  Using the base code from m w and the tip from Sunny, I was close.

Since my trim_level could be after the 'Unknown' alphebetically, (e.g. trim_level = XL), I needed to reassign the name of 'Unknown' as 'ZZZZ' in order to work properly with the 'peek' function.  This seemed to do the trick.

Thanks again - appreciate the kind and patient answers to a new user question.

Noelle