Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
sushantV
Creator
Creator

How do I aggregate data, conditionally?

I've a raw file which has data which looks like this:

 

account_code|payment_period|payment_date|transaction_type_code|fund_type_code|amount|next_date|payment_status

7H0068T1|201901|2019-01-15|RFP|C|2521.40000000|2019-06-14|DONE
7H0068T1|201901|2019-01-15|RFX|C|678.60000000|2019-06-14|DONE
7H0068T1|201901|2019-01-15|WHT|C|800.00000000|2019-06-14|DONE

7J0004T1|201912|2019-12-13|RFP|C|985.31000000|2019-06-14|PENDING
7J0004T1|201912|2019-12-13|RFX|C|614.69000000|2019-06-14|PENDING
7J0004T1|201912|2019-12-13|WHT|C|400.00000000|2019-06-14|PENDING

 

I want to sum up the values from the amount column, for each account (account_code), for each day (payment_date) -> only for the records where transaction_type is RFP or RFX. So in the above data, for 7H0068T1, I need to sum up 2521.4 and 678.6 (and exclude WHT/ 800.0)

 

The iAggregate node doesn't seem to work for me or I am missing something.

 

Technically it's a combo of 'group by' and 'where' and the tAggregate only provides me the option to group by 

 

Labels (2)
12 Replies
Anonymous
Not applicable

Hi,

 

    If you are using a DB, why you need to combine the dataset? You can add a tDBOutput to each stream so that each dataset will be loaded independently.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

sushantV
Creator
Creator
Author

The reason is that I wish to hit the DB once. As in, combine everything into one stream and then update the DB using tDBOutput

Anonymous
Not applicable

Hi,

 

    I don't think DB will have any serious impact if we hit 3 times instead of 1 especially in batch mode.

 

    If you want to use a single session, you can use a tDBConnection at the beginning and use the same connection for all 3 DB objects. End of day, you will have to use DB resources to push based on the number of records.

 

    But if you really want to collect the data to one dataset and then push to DB (definitely not my preferred method), you can first collect the data to three tHashOutput components (where the second and third hash will be in append mode to first one). Then read this Hash using a tHashInput and then push to DB (this means memory overhead due to usage of Hash).

 

    You can use files instead of Hash to do the same operation but this means you are doing additional IO and CPU operations. So my view is that the optimum way is to push through three DB components.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂