<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Problem with Data Mapping and Matching in huge dataset in Connectivity &amp; Data Prep</title>
    <link>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2413829#M13311</link>
    <description>&lt;P&gt;Hello everyone,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have the following problem.&lt;/P&gt;
&lt;P&gt;So I have a table A that looks like this... which would be my data source. As you can see there is a column for Text A and a Column for a classification.&lt;/P&gt;
&lt;TABLE width="246"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="151"&gt;Table A&lt;/TD&gt;
&lt;TD width="95"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Text A&lt;/TD&gt;
&lt;TD&gt;Classification A&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;The mouse eats a carrot&lt;/TD&gt;
&lt;TD&gt;Nature&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Prague is a nice city&lt;/TD&gt;
&lt;TD&gt;Geography&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I am a human&lt;/TD&gt;
&lt;TD&gt;Human&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;The cat eats the mouse&lt;/TD&gt;
&lt;TD&gt;Nature&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Sun is shinning&lt;/TD&gt;
&lt;TD&gt;Weather&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;She is a professional&lt;/TD&gt;
&lt;TD&gt;Social&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then I have table B, that has text but no classification. In the text, you find words that are to be found in Text A, such as Cats, Sun, Human, etc. I want based an algorithm or Formula that based on this words, go to the Table A, and brings me the classification. These two tables are just an example, in reality I have two huge datasets.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for example, for "the cat is my pet", the classification B should be "Nature"&lt;/P&gt;
&lt;P&gt;What could I do to solve this? Could I solve it on Qlik?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thaanks&lt;/P&gt;
&lt;TABLE width="239"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="136"&gt;Table B&lt;/TD&gt;
&lt;TD width="103"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Text B&lt;/TD&gt;
&lt;TD&gt;Classification B&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;The human is complex&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;That cat is my pet&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I do not like the mouse&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;the sun is yellow&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;he is a professional&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Prage is in europe&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
    <pubDate>Fri, 02 Feb 2024 07:45:33 GMT</pubDate>
    <dc:creator>Nemo1</dc:creator>
    <dc:date>2024-02-02T07:45:33Z</dc:date>
    <item>
      <title>Problem with Data Mapping and Matching in huge dataset</title>
      <link>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2413829#M13311</link>
      <description>&lt;P&gt;Hello everyone,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have the following problem.&lt;/P&gt;
&lt;P&gt;So I have a table A that looks like this... which would be my data source. As you can see there is a column for Text A and a Column for a classification.&lt;/P&gt;
&lt;TABLE width="246"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="151"&gt;Table A&lt;/TD&gt;
&lt;TD width="95"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Text A&lt;/TD&gt;
&lt;TD&gt;Classification A&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;The mouse eats a carrot&lt;/TD&gt;
&lt;TD&gt;Nature&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Prague is a nice city&lt;/TD&gt;
&lt;TD&gt;Geography&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I am a human&lt;/TD&gt;
&lt;TD&gt;Human&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;The cat eats the mouse&lt;/TD&gt;
&lt;TD&gt;Nature&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Sun is shinning&lt;/TD&gt;
&lt;TD&gt;Weather&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;She is a professional&lt;/TD&gt;
&lt;TD&gt;Social&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then I have table B, that has text but no classification. In the text, you find words that are to be found in Text A, such as Cats, Sun, Human, etc. I want based an algorithm or Formula that based on this words, go to the Table A, and brings me the classification. These two tables are just an example, in reality I have two huge datasets.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for example, for "the cat is my pet", the classification B should be "Nature"&lt;/P&gt;
&lt;P&gt;What could I do to solve this? Could I solve it on Qlik?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thaanks&lt;/P&gt;
&lt;TABLE width="239"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="136"&gt;Table B&lt;/TD&gt;
&lt;TD width="103"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Text B&lt;/TD&gt;
&lt;TD&gt;Classification B&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;The human is complex&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;That cat is my pet&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I do not like the mouse&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;the sun is yellow&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;he is a professional&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Prage is in europe&lt;/TD&gt;
&lt;TD&gt;?&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
      <pubDate>Fri, 02 Feb 2024 07:45:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2413829#M13311</guid>
      <dc:creator>Nemo1</dc:creator>
      <dc:date>2024-02-02T07:45:33Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Data Mapping and Matching in huge dataset</title>
      <link>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414067#M13315</link>
      <description>&lt;P&gt;Qlik has very powerful string-functions and mapping-features. Therefore it would be possible to develop an appropriate categorizing. But the most and hardest work is not the technically implementation else to develop a sensible and valid set of rules for the categorizing especially in regard to clean and prepare the data in beforehand and to determine the order of the execution and the prioritizing of the matches.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 14:38:32 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414067#M13315</guid>
      <dc:creator>marcus_sommer</dc:creator>
      <dc:date>2024-02-02T14:38:32Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Data Mapping and Matching in huge dataset</title>
      <link>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414080#M13316</link>
      <description>&lt;P&gt;Hey, thanks for your answer. I have already prepare the data in two datasets.. but i do not know how I could keep going now... what would you do? any suggestions is welcome, thanks&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 14:52:31 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414080#M13316</guid>
      <dc:creator>Nemo1</dc:creator>
      <dc:date>2024-02-02T14:52:31Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Data Mapping and Matching in huge dataset</title>
      <link>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414090#M13317</link>
      <description>&lt;P&gt;how many combinations do you have ?&lt;/P&gt;
&lt;P&gt;like&amp;nbsp;&lt;SPAN&gt;Cats, Sun, Human, professional&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 15:10:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414090#M13317</guid>
      <dc:creator>Nagaraju_KCS</dc:creator>
      <dc:date>2024-02-02T15:10:26Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with Data Mapping and Matching in huge dataset</title>
      <link>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414096#M13318</link>
      <description>&lt;P&gt;You have really a set of rules by differentiating between nouns/verbs/adjectives and further&amp;nbsp;expletive and all kinds of&amp;nbsp;punctuation marks? Also is the context within a sentence important or not? How to handle typos? In which order should be searched and matched?&lt;/P&gt;
&lt;P&gt;... the human looked like the mouse to the shining sun ... // which one should win ?&lt;/P&gt;
&lt;P&gt;Beside this take a look on mapsubstring() which could include multiple match-returns into a string which could be later evaluated.&lt;/P&gt;
&lt;P&gt;Another common way would be to load the strings with subfield() to split it into n records on which you may apply a normal mapping, maybe something like:&lt;/P&gt;
&lt;P&gt;m: mapping load Lookup, Return from MyRules;&lt;/P&gt;
&lt;P&gt;t: &lt;BR /&gt;load *, applymap('m', Substring, '#NV') as Category, rowno() as RowNo;&lt;BR /&gt;load Key, subfield(String, ' ', iterno()) as SubString, recno() as RecNo, iterno() as IterNo&lt;BR /&gt;from MyDataset while iterno() &amp;lt;= substringcount(String, ' ') +1;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 15:18:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Connectivity-Data-Prep/Problem-with-Data-Mapping-and-Matching-in-huge-dataset/m-p/2414096#M13318</guid>
      <dc:creator>marcus_sommer</dc:creator>
      <dc:date>2024-02-02T15:18:13Z</dc:date>
    </item>
  </channel>
</rss>

