Enhance the performance of grouping with count dis... - Qlik Community

EISSA · ‎2022-03-31

Hello,

I am trying to use group by with count distinct inside my script.

below is a part of the script where I will have about 26 millions records to do a group by and get only few records instead or row data so I will have a small dashboard instead of a big size dashboard with low performance.

What I am facing is the slowness of the group by with the count distinct where I tried to use the order by (https://community.qlik.com/t5/QlikView-Documents/Optimize-Group-By-Performance/ta-p/1481470)

to enhance the performance but still it is taking for each section about 2 minutes to finish and I have about 25 sections where I have to do the same way with different dimensions

QTDGroupingTableACGlobal:
NoConcatenate
Load
Year,
IsQTD,
IsYTD,
[KPI Name],
[Player Days SK],
[Customer SK]
Resident GroupingTableAC_Temp
where IsQTD = 1
and IsYTD = 1
order by Year,
IsQTD,
IsYTD,
[KPI Name];

//Global
GroupingTable:
Load Year, IsQTD,IsYTD,[KPI Name],
'QTD' as [Period],
'Total' as [Grouping],
count(DISTINCT [Player Days SK]) as [Player Days Count],
count(DISTINCT [Customer SK]) as [Customer Count]
resident QTDGroupingTableACGlobal
group by Year,IsQTD,IsYTD,[KPI Name];

Any Idea what could be the best practice to enhance the performance of the load so my task will not take more than 10-20 min max.

Thanks.

eddie_wagt · ‎2022-04-04

Hello @EISSA ,

I tested your case with random data. I have setup 26 mln rows and tried to optimize your loading time. And I did manage to load the data in 44 seconds instead of 2 minutes. To achieve this result you will have to need the Source as a QVD file and you will have to add two fields to the source when writing to disk.

Because I have used some mockup data your results can be different. I am curious if you can accomplish the same results with the following script.

Regards Eddie

// Preparation
// ----------
// Write your source to a QVD
// Add two fieldnames: 
// 1. Year&':'&IsQTD&':'&IsYTD&':'&[KPI Name]&':'&[Player Days SK]&':'&[Customer SK] as Key
// 2. IsQTD&IsYTD as ExistsField


// Loading the 'existsfield' because we want to have an optimized QVD load in next step
Exists_TMP:
LOAD * INLINE [ExistsField
11
];

// Loading from source optimized
QTDGroupingTableACGlobal:
LOAD *
FROM [lib://Temp QVD/Temp.qvd]
(qvd)
Where Exists(ExistsField)
;

// drop the temporary table
drop table Exists_TMP;

// Table with preceding loads, read from bottom to top

GroupingTable:
NoConcatenate
// Again we extract the values and loading all fields necessary
LOAD	subfield(Index,':',1) 					as Year
,		subfield(Index,':',2) 					as IsQTD
,		subfield(Index,':',3) 					as IsYTD
,		subfield(Index,':',4) 					as [KPI Name]
,		'QTD' 									as [Period]
,		'Total' 								as [Grouping]
,		[Player Days Count]
,		[Customer Count]
;
// Count the measures you want (notice, no distinct because we already retrieved the distinct values)
LOAD 	Index
,	 	count([Player Days SK]) 				as [Player Days Count]
,		count([Customer SK]) 					as [Customer Count]
group by Index
;
// Making a new key 
Load    Year&':'&IsQTD&':'&IsYTD&':'&[KPI Name] as Index
,		*
;
// Extract fieldvalues with subfield from Key
LOAD	subfield(Key,':',1) 					as Year
,		subfield(Key,':',2) 					as IsQTD
,		subfield(Key,':',3) 					as IsYTD
,		subfield(Key,':',4) 					as [KPI Name]
,		'QTD' 									as [Period]
,		'Total' 								as [Grouping]
,		subfield(Key,':',5) 					as [Player Days SK]
,		subfield(Key,':',6) 					as [Customer SK]
;
;
// Loading all unique values from Key
LOAD FieldValue('Key', recno()) 				as Key
AUTOGENERATE FieldValueCount('Key'); 
DROP TABLE QTDGroupingTableACGlobal
;

Enhance the performance of grouping with count distinct in the Qlik Sense Script

General Question