Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Exclude file 2 from file 1

File1 has a few million records. timePersonnel is Time +Personnel

Time StampPersonnel IdInfo1Info2timePersonnel
08:001001test1001test200108:00-1001
08:001002test1002test200208:00-1002
08:001003test1003test200308:00-1003
08:001004test1004test200408:00-1004
08:001005test1005test200508:00-1005
09:001001abc1001abc200109:00-1001

File2.csv has the following content

timePersonnel
08:00-1001
08:00-1002
08:00-1003
09:00-1001

I would like to obtain the results similar to the below.

Time StampPersonnel IdInfo1Info2timePersonnel
08:001001test1001test200108:00-1001
08:001002test1002test200208:00-1002
08:001003test1003test200308:00-1003
08:001004test1004test200408:00-1004
08:001005test1005test200508:00-1005
09:001001abc1001abc200109:00-1001

------------------------

SET ThousandSep=',';

SET DecimalSep='.';

SET MoneyThousandSep=',';

SET MoneyDecimalSep='.';

SET MoneyFormat='$#,##0.00;($#,##0.00)';

SET TimeFormat='h:mm:ss TT';

SET DateFormat='D/M/YYYY';

SET TimestampFormat='D/M/YYYY h:mm:ss[.fff] TT';

SET MonthNames='Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec';

SET DayNames='Mon;Tue;Wed;Thu;Fri;Sat;Sun';

Directory;

LOAD [Time Stamp],

     [Personnel Id],

     Info1,

     Info2,

     timePersonnel

FROM

(txt, utf8, embedded labels, delimiter is '\t', msq);

Where not exists

  

LOAD timePersonel

FROM

(txt, utf8, no labels, delimiter is ',', msq);

------------------------

I would like to exclude the items that are found in file2.  I read somewhere about "where not exists". but it does not seem to work.  Can i know how to exclude the IDs found in file2 in the final output?  Previouls i have asked about inner join, to output the items found in file2.

thanks.

5 Replies
Gysbert_Wassenaar

Try it the other way around:

T1:

LOAD timePersonel

FROM

(txt, utf8, no labels, delimiter is ',', msq);

T2:

LOAD [Time Stamp],

     [Personnel Id],

     Info1,

     Info2,

     timePersonnel

FROM

(txt, utf8, embedded labels, delimiter is '\t', msq)

where not exists(timePersonnel);

// optional:

// drop table T1;


talk is cheap, supply exceeds demand
Not applicable
Author

Can someone let me know what i have missed out?

I have encountered the following error, and load fails.  Did i do something wrong?

Thanks a lot.

-----------

Field not found - <timePesonnel>

T2:

LOAD [Time Stamp],

     [Personnel Id],

     Info1,

     Info2,

     timePersonnel

FROM

(txt, codepage is 936, embedded labels, delimiter is ',', msq)

where not exists (timePesonnel)

-----------

My script reads like this:

-----------

Directory;

T1:

LOAD timePersonnel

FROM

(txt, codepage is 936, embedded labels, delimiter is '\t', msq);

T2:

LOAD [Time Stamp],

     [Personnel Id],

     Info1,

     Info2,

     timePersonnel

FROM

(txt, codepage is 936, embedded labels, delimiter is ',', msq)

  where not exists (timePesonnel);

Not applicable
Author

Can someone advise? Apologies that this might be a simple question.

Not applicable
Author

Find attachment It may help you..

If you  need clarification  reply me..

Karthik

Gysbert_Wassenaar

Field not found - <timePesonnel>

Typo: timePesonnel instead of timePersonnel


talk is cheap, supply exceeds demand