<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splitting a Column in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207816#M992</link>
    <description>&lt;P&gt;I am looking to split a column which looks like this&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1 can cream of mushroom soup&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1 carton sour cream&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;2 (16 oz.) pkg. frozen corn&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1 (8 oz.) pkg. cream cheese&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;cubed&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1/3 c. butter&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;cubed&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would like to split it into: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;Quantity, Measurement, Item&amp;nbsp;&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;For ex: 1 | can | Cream of Mushroom Soup&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Since all the rows in this particular column have a different format, I am finding it difficult to use any specific delimiter to split it.  And also I am not familiar with regex. I am new to Talend.  Any help/ pointers is appreciated. Thank You!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Nov 2024 23:09:50 GMT</pubDate>
    <dc:creator>flor1</dc:creator>
    <dc:date>2024-11-15T23:09:50Z</dc:date>
    <item>
      <title>Splitting a Column</title>
      <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207816#M992</link>
      <description>&lt;P&gt;I am looking to split a column which looks like this&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1 can cream of mushroom soup&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1 carton sour cream&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;2 (16 oz.) pkg. frozen corn&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1 (8 oz.) pkg. cream cheese&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;cubed&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;1/3 c. butter&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;cubed&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would like to split it into: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;Quantity, Measurement, Item&amp;nbsp;&lt;/P&gt;&lt;P&gt; &amp;nbsp; &amp;nbsp;For ex: 1 | can | Cream of Mushroom Soup&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Since all the rows in this particular column have a different format, I am finding it difficult to use any specific delimiter to split it.  And also I am not familiar with regex. I am new to Talend.  Any help/ pointers is appreciated. Thank You!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 23:09:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207816#M992</guid>
      <dc:creator>flor1</dc:creator>
      <dc:date>2024-11-15T23:09:50Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Column</title>
      <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207817#M993</link>
      <description>&lt;P&gt;Do you have any other information to help with this? For example, a list of all of the products/items that you will be looking for? Is this list produced by a human (ie is it typed) or by another system? If it is typed by a human, you will have trouble with spelling mistakes and differing measurements (kg, KG, kilo, kilogram, kilograms, etc).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you know of all measurements and/or all items, then it will be a lot easier to do this.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 18:01:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207817#M993</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-03-02T18:01:00Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Column</title>
      <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207818#M994</link>
      <description>&lt;P&gt;This is extracted from recipe websites. There is no lookup data to support this.  So basically right now , I have to clean this column and create lookups from this data to create a normalized database. &lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 18:05:37 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207818#M994</guid>
      <dc:creator>flor1</dc:creator>
      <dc:date>2022-03-02T18:05:37Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Column</title>
      <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207819#M995</link>
      <description>&lt;P&gt;OK, I just knocked this up quickly. It won't be perfect, will need some fine tuning and probably isn't the most efficient way of doing this......but it works for your examples &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This all takes place in a tJavaFlex Main code section.....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;// Split the sentence by spaces&lt;/P&gt;&lt;P&gt;String[] words = row2.sentence.split(" ");&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;boolean&lt;/B&gt; quantityFound = &lt;B&gt;false&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;&lt;B&gt;boolean&lt;/B&gt; measurementFound = &lt;B&gt;false&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;String quantity = "";&lt;/P&gt;&lt;P&gt;String measurement = "";&lt;/P&gt;&lt;P&gt;String item = "";&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//Loop over each word&lt;/P&gt;&lt;P&gt;&lt;B&gt;for&lt;/B&gt;(&lt;B&gt;int&lt;/B&gt; i = 0; i&amp;lt;words.length; i++){&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;	//Check for words that match quantity values&lt;/P&gt;&lt;P&gt;	&lt;B&gt;if&lt;/B&gt;(words[i].matches("\\d*|(\\d*\\/\\d*)")&amp;amp;&amp;amp;!quantityFound){&lt;/P&gt;&lt;P&gt;		quantity = quantity+words[i];&lt;/P&gt;&lt;P&gt;		quantityFound = &lt;B&gt;true&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;		i++;&lt;/P&gt;&lt;P&gt;	}&lt;/P&gt;&lt;P&gt;		&lt;/P&gt;&lt;P&gt;	//Check for measurement words&lt;/P&gt;&lt;P&gt;	&lt;B&gt;if&lt;/B&gt;(quantityFound&amp;amp;&amp;amp;!measurementFound){&lt;/P&gt;&lt;P&gt;		&lt;B&gt;if&lt;/B&gt;(words[i].matches("\\(\\d*|(.*?)\\)|\\d*\\/\\d*|(pkg.)|(can)|(carton)|(c.)")){&lt;/P&gt;&lt;P&gt;			measurement = measurement+" " +words[i];&lt;/P&gt;&lt;P&gt;		}&lt;B&gt;else&lt;/B&gt;{&lt;/P&gt;&lt;P&gt;			measurementFound = &lt;B&gt;true&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;		}&lt;/P&gt;&lt;P&gt;		&lt;/P&gt;&lt;P&gt;	}&lt;/P&gt;&lt;P&gt;	&lt;/P&gt;&lt;P&gt;	//The rest is the item&amp;nbsp;&lt;/P&gt;&lt;P&gt;	&lt;B&gt;if&lt;/B&gt;(quantityFound&amp;amp;&amp;amp;measurementFound){&lt;/P&gt;&lt;P&gt;		item = item + " " +words[i];&lt;/P&gt;&lt;P&gt;	}&lt;/P&gt;&lt;P&gt;	&lt;/P&gt;&lt;P&gt;	row3.quantity = quantity;&lt;/P&gt;&lt;P&gt;	row3.measurement = measurement.trim();&lt;/P&gt;&lt;P&gt;	row3.item = item.trim();&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run this against the examples you gave, it returns this.....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1|can|cream of mushroom soup&lt;/P&gt;&lt;P&gt;1|carton|sour cream&lt;/P&gt;&lt;P&gt;2|(16 oz.) pkg.|frozen corn&lt;/P&gt;&lt;P&gt;1|(8 oz.) pkg.|cream cheese cubed&lt;/P&gt;&lt;P&gt;1/3|c.|butter cubed&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 19:57:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207819#M995</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-03-02T19:57:15Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Column</title>
      <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207820#M996</link>
      <description>&lt;P&gt;Wow thank you so much, with the sample data it worked like magic. But with the dataset I am using it gives me below exception.  &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Exception in component tJavaFlex_1 (CodeCleaning)&lt;/P&gt;&lt;P&gt;java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 5&lt;/P&gt;&lt;P&gt;	at local_project.codecleaning_0_1.CodeCleaning.tFileInputDelimited_1Process(CodeCleaning.java:1211)&lt;/P&gt;&lt;P&gt;	at local_project.codecleaning_0_1.CodeCleaning.runJobInTOS(CodeCleaning.java:1696)&lt;/P&gt;&lt;P&gt;	at local_project.codecleaning_0_1.CodeCleaning.main(CodeCleaning.java:1534)&lt;/P&gt;&lt;P&gt;[statistics] disconnected&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 23:39:35 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207820#M996</guid>
      <dc:creator>flor1</dc:creator>
      <dc:date>2022-03-02T23:39:35Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Column</title>
      <link>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207821#M997</link>
      <description>&lt;P&gt;Add a System.out.println() to the code to reveal the data that caused it to fail. As I said, it was just a quick piece of code, so I  wasn't expecting it to be perfect. Also, I suspect there are other examples that don't meet the rules. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This error will be caused by something like this.....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;One kg baked beans Heinz 57&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By the way, I noticed a slight error in the code I gave you. The output fields should be outside of the for loop. Like this....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;// Split the sentence by spaces&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;String[] words = row3.sentence.split(" ");&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;boolean&lt;/B&gt; quantityFound = &lt;B&gt;false&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;boolean&lt;/B&gt; measurementFound = &lt;B&gt;false&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;String quantity = "";&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;String measurement = "";&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;String item = "";&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//Loop over each word&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;for&lt;/B&gt;(&lt;B&gt;int&lt;/B&gt; i = 0; i&amp;lt;words.length; i++){&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//Check for words that match quantity values&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;if&lt;/B&gt;(words[i].matches("\\d*|(\\d*\\/\\d*)")&amp;amp;&amp;amp;!quantityFound){&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;quantity = quantity+words[i];&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;quantityFound = &lt;B&gt;true&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i++;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//Check for measurement words&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;if&lt;/B&gt;(quantityFound&amp;amp;&amp;amp;!measurementFound){&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;if&lt;/B&gt;(words[i].matches("\\(\\d*|(.*?)\\)|\\d*\\/\\d*|(pkg.)|(can)|(carton)|(c.)")){&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;measurement = measurement+" " +words[i];&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;B&gt;else&lt;/B&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;measurementFound = &lt;B&gt;true&lt;/B&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;//The rest is the item&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;B&gt;if&lt;/B&gt;(quantityFound&amp;amp;&amp;amp;measurementFound){&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;item = item + " " +words[i];&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;row4.quantity = quantity;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;row4.measurement = measurement.trim();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;row4.item = item.trim();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This code won't fix the problem you have here. To get a better idea of how to fix it, we'll nee more variations of possibilities.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Mar 2022 00:24:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Splitting-a-Column/m-p/2207821#M997</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-03-03T00:24:08Z</dc:date>
    </item>
  </channel>
</rss>

