Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Load Distinct problem

I have an app with multiple fact tables which i am trying to optimize. For that purpose trying to create a link table (have not included that part)

For doing this i am doing a load distnct

load

   Distinct

   Employee_ID,

   Parent,

FactType

resident

MasterData

where FactType='xyz';

Master table has approx 120 million records. 

the result of a expression count(distinct Employee_ID&Parent) is around 400k records 

But somehow the load distinct doesnt work. the above load results in 120 million records

its weird. i remember reading somewhere about this but couldnt find the article again

but basically i replace above script with

load

   Distinct

   Employee_ID,

   Parent,

MinString(FactType)

resident

MasterData

where FactType='xyz'

group by Employee_ID , Parent;

 

this one worked perfectly.

Can someone explain theory behind this?

@rwunderlich @Oleg_Troyansky

1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Got the answer some errant data in one of the fields were causing the issue. truncating the field solved the issue.
Knew it must have been something simple
thanks for the assistance

View solution in original post

10 Replies
mikevwang
Contributor III
Contributor III

Are you doing a preceding load before this? If so, the distinct needs to be on the top load and any distinct in the following loads are ignored.

Anil_Babu_Samineni

Second script is not equal to first script. How you are comparing both scripts. DISTINCT Keyword removes only Unique rows. But, Qlik will automate that function.
MinString can help to show only one row as always based on aggregate to Emploee_ID and Parent.
Please add me Anil_Babu_Samineni to interact faster when reply back. Speak low think High.

Before develop something, think If placed (The Right information | To the right people | At the Right time | In the Right place | With the Right context)
Anonymous
Not applicable
Author

No I am not doing a preceding load
Anonymous
Not applicable
Author

I understand it's not technically equivalent. But Look at the where clause. The value on facttype column is just 'xyz'.
Still in the first load it doesn't give unique values. I tried distict on just Employee_ID and Parent with the where clause still doesn't work
Anil_Babu_Samineni

Can you try this?

Spoiler
Sum(Aggr(count(distinct Employee_ID&Parent), Empolyee_ID, Parent))
Please add me Anil_Babu_Samineni to interact faster when reply back. Speak low think High.

Before develop something, think If placed (The Right information | To the right people | At the Right time | In the Right place | With the Right context)
Gysbert_Wassenaar

Why do you think it doesn't do a distinct load? How are you checking that?
Try explicitly creating a new table so the data won't accidentally be appended to an existing table and store that table in a qvd so you can check outside of your qlikview document.

TempLink:
NOCONCATENATE
Load Distinct
    Employee_ID,
    Parent,
    FactType
resident
    MasterData
where 
    FactType='xyz';

STORE TempLink INTO [C:\CheckMe\CheckMe.qvd] (qvd);

 


talk is cheap, supply exceeds demand
Anonymous
Not applicable
Author

Hi
Value for count(distinct Employee_ID&Parent)
with facttype filtered is around 400k.
The result of first load is the full 100 million rows.

Yes. I loaded the data in a separate qvw. Same result.
In fact found the work around with minstring in the separate qvw. I vaguely recall an article mentioning distinct doesn't always mean unique and to use aggregation to make it work. Not sure where I read it
Gysbert_Wassenaar

Ok, count(distinct Employee_ID&Parent) should always return the distinct number of combinations. But only if you really specify the distinct keyword in the expression. You're using that expression in a chart or text object, right?

Does it happen with a smaller data set too? Could you try to create a small qlikview document that reproduces the problem?

talk is cheap, supply exceeds demand
Anonymous
Not applicable
Author

Yes. I am using it as a chart.

Will try and recreate on a smaller data set with hashed values. Not sure I can over the weekend the though