<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splitting a dataset into multiple datasets based on column values in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368789#M132013</link>
    <description>&lt;P&gt;I’ve been racking my brain for a few days now as to how to accomplish this task, and I really need some help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a dataset with a few thousand rows.&amp;nbsp;The schema is this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Id&lt;/P&gt;&lt;P&gt;SheetColumn&lt;/P&gt;&lt;P&gt;SubjectId&lt;/P&gt;&lt;P&gt;RowId&lt;/P&gt;&lt;P&gt;Value&lt;/P&gt;&lt;P&gt;WorksheetName&lt;/P&gt;&lt;P&gt;WorksheetIndex&lt;/P&gt;&lt;P&gt;WorksheetRow&lt;/P&gt;&lt;P&gt;SortRowId&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I sort the data like so:&lt;/P&gt;&lt;P&gt;First by SortRowId.&amp;nbsp;Within that sorted list sort by WorksheetIndex.&amp;nbsp;Within that sorted list sort by WorksheetRow.&amp;nbsp;Finally, within that list sort by SheetColumn.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now I need to be able to break the data set into multiple data sets, first by splitting up the data sets by SortRowId, then split the SortRowId data sets into a set of new data sets by WorksheetIndex, then by Worksheet Row.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Within each data set I then need to identify all of the RowId values that occur more than once.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For each of the sets of RowIds that occur more than once I need to append an incremental integer to the end of it.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, for example, if I have the following data – (I have left out those columns that are not part of the calculations in order to fit the data better):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;An example:&lt;/P&gt;&lt;P&gt;Let’s say that I have now removed all of the rows that do not contain more than one instance of the RowId within the set of rows within the same WorksheetIndex and we have the data set shown below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SheetColumn	RowId	WorksheetIndex	WorksheetRow	SortRowId&lt;/P&gt;&lt;P&gt;A	40	1	9	123456789&lt;/P&gt;&lt;P&gt;E	40	1	9	123456789&lt;/P&gt;&lt;P&gt;R	40	1	9	123456789&lt;/P&gt;&lt;P&gt;R	94	1	14	987654321&lt;/P&gt;&lt;P&gt;AC	94	1	14	987654321&lt;/P&gt;&lt;P&gt;BE	94	1	14	987654321&lt;/P&gt;&lt;P&gt;BY	94	1	14	987654321&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I need to append values to the RowId as below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SheetColumn	RowId	WorksheetIndex	WorksheetRow	SortRowId&lt;/P&gt;&lt;P&gt;D	40_1	1	9	123456789&lt;/P&gt;&lt;P&gt;U	40_2	1	9	123456789&lt;/P&gt;&lt;P&gt;AR	40_3	1	9	123456789&lt;/P&gt;&lt;P&gt;G	94_1	1	14	987654321&lt;/P&gt;&lt;P&gt;N	94_2	1	14	987654321&lt;/P&gt;&lt;P&gt;BE	94_3	1	14	987654321&lt;/P&gt;&lt;P&gt;BY	94_4	1	14	987654321&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Again, please note that there will be 10s to 100s of rows between each instance of a repeating RowId value.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any and all help is appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 16 Nov 2024 01:52:08 GMT</pubDate>
    <dc:creator>ecurren-lmi</dc:creator>
    <dc:date>2024-11-16T01:52:08Z</dc:date>
    <item>
      <title>Splitting a dataset into multiple datasets based on column values</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368789#M132013</link>
      <description>&lt;P&gt;I’ve been racking my brain for a few days now as to how to accomplish this task, and I really need some help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a dataset with a few thousand rows.&amp;nbsp;The schema is this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Id&lt;/P&gt;&lt;P&gt;SheetColumn&lt;/P&gt;&lt;P&gt;SubjectId&lt;/P&gt;&lt;P&gt;RowId&lt;/P&gt;&lt;P&gt;Value&lt;/P&gt;&lt;P&gt;WorksheetName&lt;/P&gt;&lt;P&gt;WorksheetIndex&lt;/P&gt;&lt;P&gt;WorksheetRow&lt;/P&gt;&lt;P&gt;SortRowId&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I sort the data like so:&lt;/P&gt;&lt;P&gt;First by SortRowId.&amp;nbsp;Within that sorted list sort by WorksheetIndex.&amp;nbsp;Within that sorted list sort by WorksheetRow.&amp;nbsp;Finally, within that list sort by SheetColumn.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now I need to be able to break the data set into multiple data sets, first by splitting up the data sets by SortRowId, then split the SortRowId data sets into a set of new data sets by WorksheetIndex, then by Worksheet Row.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Within each data set I then need to identify all of the RowId values that occur more than once.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For each of the sets of RowIds that occur more than once I need to append an incremental integer to the end of it.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, for example, if I have the following data – (I have left out those columns that are not part of the calculations in order to fit the data better):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;An example:&lt;/P&gt;&lt;P&gt;Let’s say that I have now removed all of the rows that do not contain more than one instance of the RowId within the set of rows within the same WorksheetIndex and we have the data set shown below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SheetColumn	RowId	WorksheetIndex	WorksheetRow	SortRowId&lt;/P&gt;&lt;P&gt;A	40	1	9	123456789&lt;/P&gt;&lt;P&gt;E	40	1	9	123456789&lt;/P&gt;&lt;P&gt;R	40	1	9	123456789&lt;/P&gt;&lt;P&gt;R	94	1	14	987654321&lt;/P&gt;&lt;P&gt;AC	94	1	14	987654321&lt;/P&gt;&lt;P&gt;BE	94	1	14	987654321&lt;/P&gt;&lt;P&gt;BY	94	1	14	987654321&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I need to append values to the RowId as below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SheetColumn	RowId	WorksheetIndex	WorksheetRow	SortRowId&lt;/P&gt;&lt;P&gt;D	40_1	1	9	123456789&lt;/P&gt;&lt;P&gt;U	40_2	1	9	123456789&lt;/P&gt;&lt;P&gt;AR	40_3	1	9	123456789&lt;/P&gt;&lt;P&gt;G	94_1	1	14	987654321&lt;/P&gt;&lt;P&gt;N	94_2	1	14	987654321&lt;/P&gt;&lt;P&gt;BE	94_3	1	14	987654321&lt;/P&gt;&lt;P&gt;BY	94_4	1	14	987654321&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Again, please note that there will be 10s to 100s of rows between each instance of a repeating RowId value.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any and all help is appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 01:52:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368789#M132013</guid>
      <dc:creator>ecurren-lmi</dc:creator>
      <dc:date>2024-11-16T01:52:08Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a dataset into multiple datasets based on column values</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368790#M132014</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;Call built-in function Numeric.sequence(row4.RowId,1,1)&amp;nbsp;​to generate a sequence id for each RowId on tMap, see&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693p000008vQKtAAM.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/151560iE119485B46B64435/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693p000008vQKtAAM.png" alt="0693p000008vQKtAAM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please try and let me know if you have any questions. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Shong&lt;/P&gt;</description>
      <pubDate>Sat, 18 Jul 2020 01:12:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368790#M132014</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2020-07-18T01:12:38Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a dataset into multiple datasets based on column values</title>
      <link>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368791#M132015</link>
      <description>&lt;P&gt;Fantastic!!  Thanks Shong.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 19:08:27 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/Splitting-a-dataset-into-multiple-datasets-based-on-column/m-p/2368791#M132015</guid>
      <dc:creator>ecurren-lmi</dc:creator>
      <dc:date>2020-07-20T19:08:27Z</dc:date>
    </item>
  </channel>
</rss>

