LOAD DISTINCT

Report Inappropriate Content · ‎2010-11-09

Hi

I have the following

FINALCAC:

LOAD DISTINCT

CAC_CLIENTNUMBER,

CAC_CURRENCYCODE,

CAC_SEQUENCENO,

CAC_INVOICECODE,

CAC_COVERJOIN,

CAC_INVOICEDATE,

CAC_TOTALDUE,

JNL_SEQUENCENUMBER,

DebtorType,

Source,

RCP_BATCHNUMBER,

TLG_RUNNUMBER

RESIDENT CAC;

Through the wonderful design of 2 of the tables I have previously left joined additional information I have duplicate rows even when I load distinct.

A combination of CAC_INVOICECODE and CAC_SEQUENCENO are 100% unique. The information that I left joined into this table being TLG_RUNNUMBER creates duplicate rows and being blunt I don't care which line(s) disappear.

Thanks in advance for any help you can offer.

Rob

prieper · ‎2010-11-09

Not quite clear to me, what you are intending to do.
Are you able to post an example?

Peter

Report Inappropriate Content · ‎2010-11-09

Hello,

- joining in QV means to merge two table together into one table. As I understood your question, you joined before the distinct load you are showing above. This it is to late for your distinct because the it includes every field of the load statement and as you said: "A combination of CAC_INVOICECODE and CAC_SEQUENCENO are 100% unique".

- so take a look at the left join itself: I assume that the two tables got a keyfield with a common name. Producing duplicates rows with a left join then is a strong sign that your second (right) table has duplicate values in your keyfield. This is the point to look at.

Regards, Roland

Report Inappropriate Content · ‎2010-11-09

Hi Roland,

You are on the right track with your logic, my problem is that they left joined tables "CAC" and "JNL" do not have a unique join nor for want of trying can I make one.

I have duplicates and I just want to get rid of them and from what you are saying LOAD DISTINCT is the wrong path to go down.

Do I have any other options for cleansing my data, I am thinking along the lines of MAX and GROUP BY?

Thanks

Rob

Report Inappropriate Content · ‎2010-11-09

If I understand correctly then then TLG_RUNNUMBER in the load prevents the distinct doing its thing in the load you show. You therefore need to revisit the data before doing the left join and do the distinct there. The left join must be matching on some columns?

To answer your question directly though you could create an autonumber hash on the two columns that define uniqueness and then use it, something like:

FINALCAC:

Load ...

autonumberhash128(CAC_INVOICECODE & '-' & CAC_SEQUENCENO) as hashkey

resident CAC

where not exists(autonumberhash128(CAC_INVOICECODE & '-' & CAC_SEQUENCENO));

drop field hashkey;

which means only the first row with this unique combination would be loaded.

Regards,

Gordon

Regards,

Gordon

Report Inappropriate Content · ‎2010-11-09

Hello Rob,

I understand your approach. But in my experience it is always a better praxis to solve a problem where it is generated. Since I do not know your data I can't tell you exactly the correct load-statements. But I think you should look deeply into your joining. Try to use an inner join or inner keep to load your data into this one table.

RR

Report Inappropriate Content · ‎2010-11-09

Hi Roland,

First of all thanks for your time much appreciated,

I'll rewind a little and try and give you a little more tools and hopefully help you understand our clunky database.

CACJNL:

LOAD

CAC_CLIENTNUMBER,

CAC_CURRENCYCODE,

CAC_SEQUENCENO,

CAC_TRANIDENT&CAC_INVOICENO as CAC_INVOICECODE,

CAC_TRANIDENT&CAC_INVOICENO&CAC_TOTALDUE&CAC_INVOICEDATE as CAC_INVOICECODEAMT,

CAC_COVERNUMBER&'/'&CAC_VERNO as CAC_COVERJOIN,

CAC_INVOICEDATE,

CAC_TOTALDUE,

'CLT' as DebtorType,

'JNL' as Source

FROM

(qvd)

LEFT JOIN (CACJNL)

LOAD

JNL_TRANIDENT&JNL_INVOICENO&JNL_AMOUNT&JNL_DATEDONE as CAC_INVOICECODEAMT,

JNL_SEQUENCENUMBER,

JNL_RUNNUMBER as xJNL_RUNNUMBER

Resident JNL1;

CAC or Client Accounting (CACJNL) contains all our invoices, journals, cash etc. and other tables contain a run number and I have absolutely no 100% unique join between these two tables nor is their an option to include a middle table

CAC has

INVCODE CURR AMOUNT SEQ DATE

M12345, EUR, 5000 1 01/01/2010

M12345, EUR, 5000 2 01/01/2010

JNL has

JNLCODE CURR AMOUNT SEQ DATE RUN

M12345, EUR, 5000 98 01/01/2010 1650

M12345, EUR, 5000 99 01/01/2010 1650

Sequence is CAC and JNL are not related at all.

A run number is unique to a date if that helps,

The nearest I can come up to a unique key is CODE(TRANIDENT and CODE) , DATE, AMOUNT.

Not sure on your inner join or inner keep as I have never used them before.

All I need to pull out is run number?

Thanks again for your help.

Rob

Report Inappropriate Content · ‎2010-11-11

Hi Rob,

so you did some heroisch things to gernerate a appropriate key for both tables. Nevertheless you are not responsible for the design of your source database.

To get rid of your duplicate rows there are now two possibilities: Let's assume there are still duplicates in your resident table JNL1. Then use the distinct within join - load like "left join (CAJNL) load distinct JNL.... resident JNL1". If you have it in your table CACJNL use similar "load distinct .. from ... (qvd);". If you still have after that duplicates the only way is to create a temp. unique number (a rowid like in oracle is not availible in QV). See therefor at the post above from Gordon.

To give you a view behind left/right keep and join I attached a small sample app. Start playing around with different loads while sliding the tabs.

RR

Report Inappropriate Content · ‎2010-11-11

Hi Rob,

sorry now with the correct application.

RR

Related Topics