Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Count Distinct is painfully slow, please help

Experts,

I have a table that has 62 million records.  I created a concatenated field, Member Months Counter.  See column A below so that I could total the number of members per each enrollment period in a chart.  Each Member can have as many as 48 enrollment periods going all the way back to Jan 2012.  So, I might have a total of 200 distinct members in one of my groups but there would be as many as 9600 enrollment records.  I have to use Count(distinct) because of column C.  Each member may have up to 5 different Claims_BenTypeCd.  Previously, I had a filter to only show the Claims_BenTypeCd=M and I only needed to use the count function..  However, there are some members that do not have M, but only have 9.  So, I am now using Count(Distinct) to get an accurate total of the members per month.  It is taking 45 secs to 2 minutes to render the chart now that I have made the change to Count(Distinct)

My goal is just to get an expression total so that it can be used to calculate an average.

Here is a shortened version of my expression and a sample of the top 25 rows of my chart.

=(count(distinct(if([Enrollment Period] = 'Dec-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Nov-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Oct-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Sep-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Aug-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Jul-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Jun-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'May-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Apr-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Mar-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Feb-2014' ,[Member Months Counter])))+
count(distinct(if([Enrollment Period] = 'Jan-2014' ,[Member Months Counter]))))

Sample Enrollment 2-26.jpg

My data is health care data and I unfortunately can not post an app.  Hope someone has a solution.

Thanks is advance for any help

1 Solution

Accepted Solutions
Gysbert_Wassenaar

If Enrollment Period is your dimension then you only need count(distinct [Member Months Counter]) as expression. You can set the Total mode of the expression to Sum of Rows.


talk is cheap, supply exceeds demand

View solution in original post

8 Replies
Gysbert_Wassenaar

Perhaps like this: sum(aggr(count(distinct([Member Months Counter]),[Enrollment Period]))

Or perhaps you can create a summary table in the script:

MemberCountSummary:

LOAD [Enrollment Period], count(distinct [Member Months Counter]) as MemberCount

FROM ...source

GROUP BY [Enrollment Period]

;


talk is cheap, supply exceeds demand
Anonymous
Not applicable
Author

Thanks for your reply. The second solution would be static and unfortunately not updated as my users made selections in the data model.  So, we do need a solution that counts or sums.  I could see making a master table with a counter field...

I have tried your aggr command and can not get it to work.  It returns no result.

Gysbert_Wassenaar

The aggr function needs real field names as the second argument. Make sure to use the correct case sensitive field names in the expression.


talk is cheap, supply exceeds demand
Anonymous
Not applicable
Author

Okay, so now I get the total for MemberMonthCounts and it renders much faster.  However, in my straight table, I only get the first month in my enrollment period.  Is there a quick way to get an aggregate by Enrollment period which will add up to the total on the top row?  Say Enrollment Period is now the dimension in the straight table.

Thanks again for your help!

Gysbert_Wassenaar

If Enrollment Period is your dimension then you only need count(distinct [Member Months Counter]) as expression. You can set the Total mode of the expression to Sum of Rows.


talk is cheap, supply exceeds demand
Anonymous
Not applicable
Author

Thank you.  You helped me think outside the box and I was able to implement a form of your solution.  My rendering is down to 5 seconds on average, which is tolerable.

Anonymous
Not applicable
Author

you may also want to think about using an autonumber function on the concatenated key field.

Anonymous
Not applicable
Author

That sure is a thoughtful idea.  I could create a link table and then I would have a unique value assigned to all of the  records with my key.  I will give it a try.