Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
therealdees
Creator III
Creator III

Sum values of every product aggregating by sale_id

Hi,

I have the following tables:

therealdees_0-1712764924465.png

 

1. f_MovimentoMicrovix is a sale fact table

2. f_Reshop is a promotional campaign fact table

3. d_ReshopPromocoes is the promotional campaign dimension table

 

The reason I have the 2. table (f_Reshop) is because there might be more than 1 campaign applied to a product or sale_id. The data that comes from the API has a field that is an ID for that transaction, which will repeat for the whole sale, but there's a field called 'order' that is related to the ordering/sorting of the product in the receipt. Finally, the key between table 1 and table 2 = transaction_id | product_receipt_order.

If a product didn't have any discount applied, it won't have a key in the f_Reshop table, but the sale itself could have other products that had discounts (and therefore keys). What I need to do is sum the whole sale, even if a product doesnt have a key, but at least another one does.

If I simple Sum(product_price) with a campaign_name dimension on a table, for e.g, it will sum only the products that had a discount applied and ignore other products in the same sale.

I got close to the expected result using:

Sum(Aggr({1} Sum(product_price), sale_id))

If i filter a specific campaign it will show the correct value:

therealdees_2-1712765504674.png

 

But if the campaign_name is not filtered it will show an incorrect value (it's pratically half the correct value):

therealdees_3-1712765541148.png

 

How can I make this work even if I don't filter campaigns (but simply have it as a dimension)?

 

Labels (2)
6 Replies
therealdees
Creator III
Creator III
Author

Anyone? 😩

marcus_sommer

Before using heavy weapons like {1} you may try more detailed conditions like:

Sum({< sale_id = p({< campaign = p(campaign) >} sale_id), campaign >} product_price)

Further be aware that by applying any multi-level aggregations like the sum(aggr(sum())) you may need to specify the set analysis in each level and by enforcing the return of the entire data-set with {1} you might need extra parallel conditions, like: sum(aggr(sum())) * MyBooleanCheck. And yes, it may not be trivial and might slowdown the performance (if they is already rather slow you could skip all aggr() thoughts at ones).

Beside the above it might be more sensible to transfer the essential parts of the logic into the data-model. This could mean to add the exists of any campaign to the fact-records and/or also how many it are and which one. Further creating new extra campaign-keys and records which covers all possible overlapping of campaigns to single products or orders and/or trying to implement such logic per: The As-Of Table - Qlik Community - 1466130.

Another way to handle 1:n relations could also be to duplicate the relevant fact-records and by assigning a 1/0 respectively 'origin'/'populated' to the first/following records make them differentiable.  

therealdees
Creator III
Creator III
Author

Hi, Marcus

Thanks for the reply and explanations!!

I tried the expression provided, but it gave me the same result as Sum(product_price), which is the sum of only the products that had a discount applied. I get an idea of the logic, but I'm a bit confused. In this expression, I'm saying: get every sale_id that had any campaign involved (a nested P()). Would that be the same as saying p({<campaign = {"*"}>}) (just curious)? Besides that, why ignore the filtering of campaign by adding ", campaign" in the set analysis?

 

I manage to achieve the expected result by calculating the total sum of each sale_id in the loader script, then I joined the result to the sale_fact by using the sale_id as key. Considering each sale might have more than 1 row (1 for each product_sku), I got repeated values for each row (the total sum is replicated throughout the sale_id records), then I used the following expression to sum only distinct values for each sale_id:

Sum(Aggr(sale_total_value, sale_id, campaign_id))

 

I'm not sure if this approach is somewhat ideal, but at least it worked. Do you think in terms of processing and efficiency, it's better than calculating it on chart/front level? It feels kinda messy to add a repeated value for each product of a sale. I could add a single value per sale, but that seems even more improvised to me.

 

About the "sum(aggr(sum())) * MyBooleanCheck". Can you explain me better the logic behind this? What hypothetically would be the boolean check? Will MyBooleanCheck return a boolean indeed, true or false? If so, do you actually multiply the expression by a boolean?

therealdees
Creator III
Creator III
Author

About the link provided for "As-Of-Table", it's actually really interesting and I'll try to implement it on some other stuff I'm working on. Thanks for sharing

marcus_sommer

The aim of the conditions in:

Sum({< sale_id = p({< campaign = p(campaign) >} sale_id), campaign >} product_price)

is to get all sale_id which have a possible campaign and ignoring any selections within the campaigns to sum over the entire order regardless if all order-positions belong to a campaign or not. And yes, it should give rather ideas which further approaches might be workable as being a final solution.

Beside this I think it would be useful to have a real information within each record if there is a campaign or not - means the ones with a campaign would have the appropriate key-value and the others a value like: 'no campaign' which could be directly shown in dimensions or selected or be used within conditions which isn't possible if it's only NULL.

Approaches like:

sum(aggr(sum())) * MyBooleanCheck

are necessary if there are contradictory conditions which couldn't be resolved within a single calculation. In such cases it's a workaround to enforce the return of bigger data-set per aggr() and/or {1} as really needed and within further steps it's filtered again to the final data. A quite common use-case are accumulations in a chart and the first row/column will mandatory return NULL because there isn't a previous one available. With the above hinted method the aggr-dataset could be bigger as the dimension-values of the current selection state and the boolean check ensured that these extra rows are synchronized again to the selection state.

Which way is the most suitable depends on the entire requirements of the application and the efforts needs to be balanced against the wanted results. In this regard you may also to consider to distribute the views to several applications, for example one showing sales-views with extra information of campaign exists or not and maybe another one which provides detailed views to the campaigns but skipped most of the standard sales-requirements. Beside reducing the complexity which might not growing linear else increasing exponentially by combining requirements it may also helpful to focus on the different topics and simplifying the usability.   

therealdees
Creator III
Creator III
Author

Oh, I see it now. The boolean is actually a conditional value, defined by the boolean result, not a "true" or "false" as output.

Well, I'll keep the calculation using the new field I added for now, but I'll def dig into other approaches following your suggestions.

Thanks for the knowledge.