Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
stevelord
Specialist
Specialist

How to factor our duplicate records in Sums on Straight Chart?

Hi QlikCommunity, I know that QlikView table boxes filter duplicated records in such a way that if two records are identical on all points listed in the table box, the table box will only show one line for that record like so:

Records/Values in database might be like this:

User ID          Date               Minutes     Uploaded

123               12/13/2013      50               12/14/2013

123               12/13/2013      50               12/15/2013

(Uploaded field isn't in my qvds, but I am guessing something like it is hiding somewhere or snuck through somehow.)

QlikView TableBox shows this

User ID          Date               Minutes

123               12/13/2013       50

QlikView Straight Chart shows this

User ID          Date               Minutes

123               12/13/2013     100

Now, how do I make a QlikView Straight Chart that sums in such a way as to only count duplicated records once?  Mine is doing this currently for one group that had some kind of technical issue and might have been reloaded or might have something about two processes feeding the same data table or something.  The issue seems to be persistent and consistent in causing my sums to show double the minutes actually logged.  I'm hoping for some simple way to make my formula spin its sums of whatever QlikView is doing itself to handle the duplicate records.  Thanks!

1 Solution

Accepted Solutions
Anonymous
Not applicable

As Rachel says, it is better to fix in the script.  Still on the front end, this may work:

sum(aggr(sum(distinct minutes), user, date))

View solution in original post

11 Replies
Gysbert_Wassenaar

Use sum(distinct Minutes)

Alternatively use Minutes as a dimension and add an expression: =1. Then hide the expression.


talk is cheap, supply exceeds demand
stevelord
Specialist
Specialist
Author

Sorry, forgot to mention the Minutes field in the straight chart is a Sum(Minutes) expression, and if I had more than one day of activity I'd expect it to sum those.  I do not want it to be inflated by the duplicate records though.  The minutes field in the Table box is just Minutes, and would show whatever values were logged for a particular day by a particular user.

stevelord
Specialist
Specialist
Author

Not quite.  I should have given a little more robust example.  If I sum distinct minutes, but logged 50 minutes on two different days, it will come out to 50 minutes.  (I just tried that on the expression in my script just now.)

I would want to sum the minutes for each unique user and date combination, since there should only be one distinct value within that combination.  Formula for that?   (We only see one value in the field on our log, but something is quietly creating duplicated records in background somehow.)

rachael_selman
Partner - Contributor II
Partner - Contributor II

If I were you I'd try to get to the bottom of the duplicate fields rather than trying to fix in the charts. You might end up missing something and have spurious results go undetected!

You could add a count field to the table in your load script ('1' AS Count) and then create a straight table which sums the Count field to identify the duplicates. To fix it you will need to add a unique primary key to your table or load distinct values only.

stevelord
Specialist
Specialist
Author

I do have minutes as a dimension on a table box with the breakouts by day, but on the chart where I want to just total up the minutes.. actually, the straight chart doesn't have the date field in it since I am summing from all dates selected.

Straight table is like this:

User Id           Minutes

123               100

When I have only the one day selected and only 50 minutes logged on that day.

If I have two days selected and 50 minutes on both of those, then I'd want the straight table to show 100 Minutes total.

stevelord
Specialist
Specialist
Author

Yeah, you're right Rachel.  It's on a dashboard and I only see it happening for one group after checking through all the others, so I'm confident the dashboard is actually okay and something is wrong in the processing of the one group.  I was just looking for a patch to use in the interim because I don't know how long it will take over-extended engineering team to get to my issue and resolve it.  (Right now my workaround would just be export the daily counts to excel and do the sums there since the group is small enough for that.)

Anonymous
Not applicable

As Rachel says, it is better to fix in the script.  Still on the front end, this may work:

sum(aggr(sum(distinct minutes), user, date))

stevelord
Specialist
Specialist
Author

Thanks, you nailed it and I've worked enough with Aggr() to understand that part is doing the thing I wanted about separating the minutes by user and date, and the distinct is only counting them once within each date.  (Even so I stared at the chart to be sure the minutes all halved themselves when I pushed apply, and I'll audit some more to be sure I didn't just cause ten other problems. )

stevelord
Specialist
Specialist
Author

added helpful here since this expression went in to the correct answer guy's expression, and might've been correct if my initial report had been complete.