Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hi community,
i'm stuck now, i need to transpose and mapping my sum result, i already build my job like this, one for sum column TOTAL_AMOUNT, and another one for count column TOTAL_AMOUNT
and this my job result for sum_process
ID | Month | categories | total_amount |
130011 | 01 | ELECTRONICS | 4313000 |
130011 | 02 | ELECTRONICS | 2514000 |
130011 | 01 | SUPERMARKET | 230640 |
130011 | 01 | RESTAURANT | 84000 |
130012 | 01 | ELECTRONICS | 76000 |
130012 | 01 | SUPERMARKET | 321000 |
130012 | 02 | SUPERMARKET | 211000 |
130012 | 02 | RESTAURANT | 16029200 |
then i want to transform it to something like :
ID | ELECTRONICS_MONTH_01 | ELECTRONICS_MONTH_02 | SUPERMARKET_MONTH_01 | SUPERMARKET_MONTH_02 | RESTAURANT_MONTH_01 | RESTAURANT_MONTH_02 |
130011 | 4313000 | 2514000 | 230640 | 84000 | ||
130012 | 76000 | 321000 | 211000 | 16029200 |
Please let me know if anyone has resolved this on.
Thanks, Jackson
There are two assumptions that could be made about this. The first is the you know the number of months and categories per month you will receive before you receive the data, and can manually configure columns for them. In this case, it is simply a case of adding a tSortRow (sorting by ID) and then a tMap, using the tMap variables to store your values between rows. Then (and this is the last bit) using a tAggregateRow to group by ID and setting all of the functions for the rows to "Last". Because the tMap variables hold their values between rows, the last row per ID will be fully populated with data. You might need a bit of logic to clear the data down at the end of each group, but that is easy. You can see a loosely related example here (https://community.talend.com/t5/How-Tos-and-Best-Practices/Compare-row-value-against-a-value-from-th...).
If you need it in the way of my second assumption (that you do not know before the data arrives how many months and how many categories there will be), then this makes it a little more complicated. Essentially you will need to first parse the data set to identify how many months and how many categories you have. You will need a distinct list of these permutations calculated (maybe added to a tHash component). Then, multiply each row in this distinct list by the number of IDs you are working with. If you have 10 IDs and 10 month/category permutations, you will have 100 rows in you tHash.
Next you will need to use a tMap driven by the list you have just created (sorted by ID and month/category combinations), with your original data as a lookup. Join your original data to your newly created list using the ID columns and your new month/category combined column with those columns concatenated from your lookup. Then use a tMap variable to simply concatenate each value that comes back with a separator, ensuring that when the ID changes, you clear the value of the variable. Pass this data to a single output column ensuring that an ID column is also passed out. Then, after the tMap, add a tAggregateRow and group by ID. Set the function on your other column to be "last". This should result in two output columns; an ID column and a column with all of your concatenated month/category data in the same order for every row. If you output this to a flat file, and open it in Excel it will appear as you want it.
There are two assumptions that could be made about this. The first is the you know the number of months and categories per month you will receive before you receive the data, and can manually configure columns for them. In this case, it is simply a case of adding a tSortRow (sorting by ID) and then a tMap, using the tMap variables to store your values between rows. Then (and this is the last bit) using a tAggregateRow to group by ID and setting all of the functions for the rows to "Last". Because the tMap variables hold their values between rows, the last row per ID will be fully populated with data. You might need a bit of logic to clear the data down at the end of each group, but that is easy. You can see a loosely related example here (https://community.talend.com/t5/How-Tos-and-Best-Practices/Compare-row-value-against-a-value-from-th...).
If you need it in the way of my second assumption (that you do not know before the data arrives how many months and how many categories there will be), then this makes it a little more complicated. Essentially you will need to first parse the data set to identify how many months and how many categories you have. You will need a distinct list of these permutations calculated (maybe added to a tHash component). Then, multiply each row in this distinct list by the number of IDs you are working with. If you have 10 IDs and 10 month/category permutations, you will have 100 rows in you tHash.
Next you will need to use a tMap driven by the list you have just created (sorted by ID and month/category combinations), with your original data as a lookup. Join your original data to your newly created list using the ID columns and your new month/category combined column with those columns concatenated from your lookup. Then use a tMap variable to simply concatenate each value that comes back with a separator, ensuring that when the ID changes, you clear the value of the variable. Pass this data to a single output column ensuring that an ID column is also passed out. Then, after the tMap, add a tAggregateRow and group by ID. Set the function on your other column to be "last". This should result in two output columns; an ID column and a column with all of your concatenated month/category data in the same order for every row. If you output this to a flat file, and open it in Excel it will appear as you want it.
Did this resolve your issue?