Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
EISSA
Contributor III
Contributor III

Enhance the performance of grouping with count distinct in the Qlik Sense Script

Hello,

I am trying to use group by with count distinct inside my script.

below is a part of the script where I will have about 26 millions records to do a group by and get only few records instead or row data so I will have a small dashboard instead of a big size dashboard with low performance.

What I am facing is the slowness of the group by with the count distinct where I tried to use the order by (https://community.qlik.com/t5/QlikView-Documents/Optimize-Group-By-Performance/ta-p/1481470)

to enhance the performance but still it is taking for each section about 2 minutes to finish and I have about 25 sections where I have to do the same way with different dimensions 

QTDGroupingTableACGlobal:
NoConcatenate
Load
Year,
IsQTD,
IsYTD,
[KPI Name],
[Player Days SK],
[Customer SK]
Resident GroupingTableAC_Temp
where IsQTD = 1
and IsYTD = 1
order by Year,
IsQTD,
IsYTD,
[KPI Name];

//Global
GroupingTable:
Load Year, IsQTD,IsYTD,[KPI Name],
'QTD' as [Period],
'Total' as [Grouping],
count(DISTINCT [Player Days SK]) as [Player Days Count],
count(DISTINCT [Customer SK]) as [Customer Count]
resident QTDGroupingTableACGlobal
group by Year,IsQTD,IsYTD,[KPI Name];

Any Idea what could be the best practice to enhance the performance of the load so my task will not take more than 10-20 min max.

Thanks.

Labels (1)
1 Reply
eddie_wagt
Partner - Creator III
Partner - Creator III

Hello @EISSA ,

I tested your case with random data. I have setup 26 mln rows and tried to optimize your loading time. And I did manage to load the data in 44 seconds instead of 2 minutes. To achieve this result you will have to need the Source as a QVD file and you will have to add two fields to the source when writing to disk.

Because I have used some mockup data your results can be different. I am curious if you can accomplish the same results with the following script.

Regards Eddie

// Preparation
// ----------
// Write your source to a QVD
// Add two fieldnames: 
// 1. Year&':'&IsQTD&':'&IsYTD&':'&[KPI Name]&':'&[Player Days SK]&':'&[Customer SK] as Key
// 2. IsQTD&IsYTD as ExistsField


// Loading the 'existsfield' because we want to have an optimized QVD load in next step
Exists_TMP:
LOAD * INLINE [ExistsField
11
];

// Loading from source optimized
QTDGroupingTableACGlobal:
LOAD *
FROM [lib://Temp QVD/Temp.qvd]
(qvd)
Where Exists(ExistsField)
;

// drop the temporary table
drop table Exists_TMP;

// Table with preceding loads, read from bottom to top

GroupingTable:
NoConcatenate
// Again we extract the values and loading all fields necessary
LOAD	subfield(Index,':',1) 					as Year
,		subfield(Index,':',2) 					as IsQTD
,		subfield(Index,':',3) 					as IsYTD
,		subfield(Index,':',4) 					as [KPI Name]
,		'QTD' 									as [Period]
,		'Total' 								as [Grouping]
,		[Player Days Count]
,		[Customer Count]
;
// Count the measures you want (notice, no distinct because we already retrieved the distinct values)
LOAD 	Index
,	 	count([Player Days SK]) 				as [Player Days Count]
,		count([Customer SK]) 					as [Customer Count]
group by Index
;
// Making a new key 
Load    Year&':'&IsQTD&':'&IsYTD&':'&[KPI Name] as Index
,		*
;
// Extract fieldvalues with subfield from Key
LOAD	subfield(Key,':',1) 					as Year
,		subfield(Key,':',2) 					as IsQTD
,		subfield(Key,':',3) 					as IsYTD
,		subfield(Key,':',4) 					as [KPI Name]
,		'QTD' 									as [Period]
,		'Total' 								as [Grouping]
,		subfield(Key,':',5) 					as [Player Days SK]
,		subfield(Key,':',6) 					as [Customer SK]
;
;
// Loading all unique values from Key
LOAD FieldValue('Key', recno()) 				as Key
AUTOGENERATE FieldValueCount('Key'); 
DROP TABLE QTDGroupingTableACGlobal
;