Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
cgT
Creator
Creator

Data Aggregation Problem

Hi experts,

I'm having a problem with some data based around capacities in an auditorium.

I have a fixed capacity figure (pulled through from a spreadsheet, I'm using QS Enterprise) which is broken down into each level of the auditorium. Here is an example:

Performance Date & Time Capacity Part of House
29/04/24 19:30 713 Stalls
29/04/24 19:30 442 Dress Circle
29/04/24 19:30 372 Grand Circle

 

These are pulling through correctly. However, when I take out the 'Part of House' dimension, it appears to aggregate this numbers, like so:

Performance Date & Time Capacity
29/04/24 19:30 550.4589178

 

I was expecting this numbers to sum together, creating a total of 1,527. It appears that some sort of an aggregation is being applied, as though the 1,527 total is being divided by the number of Parts of House, however this doesn't add up either, as multiplying the 550.4589178 by 3 gives 1,651.38. 

Is anyone able to help me decipher what's going on please?

Many thanks,

Labels (1)
17 Replies
Fabiano_Martino_Intelco
Partner - Creator II
Partner - Creator II

HI,

checking your results you have a capacity of 549,358 and a date count of 998

The division gives exactly 550.4589178 which is your result.

I suggest you check why you have 998 dates and not 3 as expected.

The advice of @TcnCunha_M is correct, start from the data model and see if you have "strange" links between tables or duplicate values.

Regards

cgT
Creator
Creator
Author

Thank you both for your suggestions.

Here is an example of a line of my data (anonymised):

Booking Reference ShowName Venue Capacity Tickets Gross PoH Profile Profile Group Discount Price Band TransactionDate PerformanceDateTime TransactionHour
2340560 Mickey Mouse The Mouse House 442 12 £882 Dress Circle GROUPS (DMGS) Groups Sales GRPS Mon-Thu 10+ A 11/04/2024 29/04/2024 19:30:00

17

 

Each sales record consists of this data. 'Capacity' refers to the total number of seats in the Part of House ( PoH - in this case the Dress Circle). This capacity figure loads in for each record which is why I divide it by the number of performances (determined by counting the number of individual Performance Date Time entires).

I can't find a way of building in a capacity value for the venue itself, broken down by level because then I wouldn't need to fudge it using a calculation in-app.

I'm hoping you'll be able to spot what may be wrong...

Many thanks again

TcnCunha_M
Creator III
Creator III

Does the capacity change when new buy happen ?

I.e., les assume someone by a 10 new tickets, so the new record will have capacity 430 and tickets 10 ?

As you think, so shall you become.
cgT
Creator
Creator
Author

The capacity of the venue is fixed, so, in using your example, the record will always show the capacity as the same number as well as listing the number of tickets sold as 10.

I hope that makes sense?

Fabiano_Martino_Intelco
Partner - Creator II
Partner - Creator II

Hi, this could be the cause of your problem.

If you have multiple rows with the same data (PerformanceDateTime and Capacity) the aggregation functions (Sum and Count) will consider them as multiple values.

In this scenario you can use the DSITINCT clause in the function or process in the script the table and extract with DISTINCT the data you need to create a separate table with each value included in a single row.

Regards

cgT
Creator
Creator
Author

So my load script looks like this:

Sales:
LOAD
"Booking Reference",
Show,
Venue,
Capacity,
Tickets,
Gross,
PoHTrim AS PoH,
Profile,
"Profile Group",
Discount,
"Price Band",
PerformanceDateTime AS [Performance Date & Time],
TransactionDate AS [Transaction Date & Time],
TransactionHour

 Would you be able to give me an example of how to load the Capacity value using the Distinct function? Would it look something like this:

sum(DISTINCT Capacity) AS Capacity,

 Thanks again

Fabiano_Martino_Intelco
Partner - Creator II
Partner - Creator II

Hi,

I would create a separate table:

CapacityTable:
LOAD Distinct
Capacity as PoHCapacity,
PoH,
[Performance Date & Time] as PoHPerformanceDataAndTime
Resident Sales;

Note that I renamed the fields except for the PoH, that is the key between the tables.

Since I don't know your data model the key might be different, but this table should be a good starting point.

Regards

cgT
Creator
Creator
Author

Thanks so much for this @Fabiano_Martino_Intelco ,

I've tweaked your script suggestion and used a Composite Key across the PoH, Venue and Show fields to get it to work correctly.

Your suggestion very much inspired me though, so thank you for your help.