Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
larsc
Partner - Contributor III
Partner - Contributor III

Load and distinct

When you run a script containing "load *;" before another load statement, the "load *;" statement's data is normally base of the data from the load statement immediately below.

However, this does not seemt to be the rule if the lower load statement contain the distinct predicate. Study the code below:

[CODE]
t1:
LOAD * INLINE [
F1
1
1
];

qualify *;

t2:
LOAD distinct F1 Resident t1;

t3:
load *;
LOAD Distinct * Resident t1;
[/CODE]

I think most people would assume field F1 has 1 record in table t2, and 1 record in table t3. This is not true - in table t2 the field has 1 record, but in table t3 it has 2 records. Why?

Where does the "load *;" load its data from, and why doesn't it base it on what "LOAD Distinct * Resident t1" returns?

Also, in the code below, if you modify the last statemet to load from table t2 instead, table t3 suddently has 1 record. What is the difference regarding what data "load *;" is based on, compared to the first code sample?

Could someone post a step-by-step explanation of how the code executes from top to bottom in these scenaros, and what rules apply when you "load on load" when distinct is involved?

[CODE]
t1:
LOAD * INLINE [
F1
1
1
];
qualify *;
t2:
LOAD distinct F1 Resident t1;
t3:
load *;
LOAD Distinct * Resident t2;
[/CODE]

9 Replies
gandalfgray
Specialist II
Specialist II

Sorry I can't explain this behaviour, but I verified it,

and I will keep this in mind in the future.

It certainly does not behave as you would expect it to.

Not applicable

Yes, I have also seen this behaviour using v9sr4.

Regards,

Gordon

Miguel_Angel_Baeyens

Hello Lars,

First of all, excuse me if I'm missing something but I see no reason to use a

LOAD *;
withouth being followed by an INLINE, RESIDENT, FROM or SQL clause. So first, I would remove that from the code. Probably what is doing is loading all records from the next load, so that's why you are getting 2 records in t3. Moreover, t3 comes from two tables with the same table configuration (one field named alike), so both tables are being concatenated implicitly.

The code commented as requested:

t1: LOAD * INLINE [ F1 1 1 ]; qualify *; // makes all fields being prefixed with their corresponding table namet2:LOAD distinct F1 Resident t1; // loads field F1 with all distinct values in field F1 from table t1 into table t2 t3:LOAD Distinct * Resident t2; // loads all fields with distinct record values from table t2 into table t3


The following example is a bit different, though,

t1: LOAD * INLINE [ F1, F2 1, A 1, B 1, A]; qualify *; t2:LOAD distinct F1 Resident t1; t3:LOAD Distinct * Resident t1; // this will load two records, as there are two distinct records


Hope that helps

larsc
Partner - Contributor III
Partner - Contributor III
Author

Thank's for your explanation Miguel. I was more interested in how the script engine "thinks" in these situations.

How is the code executed from top to bottom, and why does the "load *;" return 2 vs1 record in these situations.

This is just a sample, and I don't have the business problem in front of me, but what I wanted to do is first make sure I have only distinct records, then perform further data manipulation and join the result to another table.

That "load *;" was actually someting like "left join (tableX) load *, applymap(...) as... etc", but this sample explain the problem I had.

I am sure there are many situations where you want to make sure you only have distinct data before you make further data manipulation.

Miguel_Angel_Baeyens

The usual behavior for "LOAD *" is "load all the fields from all the records pulled from the statement below". That statement may be one of the several ways QlikView loads data (I forgot to mention above the AUTOGENERATE way). So a "LOAD *" followed by another LOAD statement will load all possible records returned by the second statement.

The most simple and easy way to check it is

t1: LOAD * INLINE [ F1, F2 1, A 1, B 1, A]; qualify *; t3:LOAD *;LOAD * Resident t1 WHERE F2 = 'A';


t3 will have 2 records, those returned by the resident load.

In your example, several things seem to converge that result in a not distinct loading several values, but my guess is the main one is the concatenation between the two latter LOAD statements, which seems to be ignoring the distinct. In fact,

t1: LOAD * INLINE [ F1, F2 2, A 2, A 2, A]; t3:LOAD F1 AS F3, F2 AS F4;LOAD DISTINCT * Resident t1;


Will return not only three lines (all records) but the fields renamed as well.

Regards.

larsc
Partner - Contributor III
Partner - Contributor III
Author

The help file says "If no source of data is given by means of a from, inline, resident or autogenerate clause, data will be loaded from the result of the immediately succeeding select or load statement. The succeeding statement should not have a prefix."

OK, but what happens if the succeeding statement has a prefix anyway - where does it load the data from in that case. I believe the proper way of handling this situation would be to bring up an error message - not load "something else instead" 🙂

Miguel_Angel_Baeyens


Lars Christensen wrote:OK, but what happens if the succeeding statement has a prefix anyway - where does it load the data from in that case. I believe the proper way of handling this situation would be to bring up an error message - not load "something else instead" 🙂


I agree with the error message or change that Note in the manual to "The succeeding statement must not have a prefix" or "should not have a prefix, otherwise it will be ignored" instead, because what is doing is ignoring the prefix. Call it "default behavior"?

johnw
Champion III
Champion III

Since the result of this script is one row:

LOAD DISTINCT F1 AS F2 RESIDENT T1;

And since a preceeding load "will be loaded from the result of the immediately succeeding select or load statement", then doing a preceeding load...

LOAD *;
LOAD DISTINCT F1 AS F2 RESIDENT T1;

...should also return one row, at least as I've always understood it, and as I read the help text. So as I'm reading it, this is a bug (or incorrect documentation). The script does not appear to do what the documentation appears to say it should do.

I'm also not understanding "should not have a prefix" as an explanation for the extra row, if it was being offered as an explanation. "Distinct" isn't a prefix; it's a predicate. Prefixes, if I understand correctly, are things like concatenate and left join. I can understand why those wouldn't be allowed, and while I agree it should give an error message instead of continuing, it doesn't seem to explain what we're seeing here. Now, it may well be that it is ignoring predicates, and that's the cause of the problem, but I don't think it should be ignoring predicates.

gussfish
Creator II
Creator II

I've just been discovering this problem in QV10 SR3. 

I note, btw, that this problem doesn't require that '*' be used in the prior LOAD statement e.g. "LOAD F1;" will cause the effect to manifest. 

And just to dot the 'i's and cross the 't's, it doesn't occur merely because all fields are being selected by the prior LOAD - if the LOAD DISTINCT selects a number of fields and the prior LOAD merely selects a subset, then the DISTINCT still seems to be ignored.

Was this ever logged as a bug and resolved, to anyone's knowledge?