topic Problem with Data Mapping and Matching in huge dataset in Connectivity & Data Prep

Problem with Data Mapping and Matching in huge dataset

Nemo1 — Fri, 02 Feb 2024 07:45:33 GMT

Hello everyone,

I have the following problem.

So I have a table A that looks like this... which would be my data source. As you can see there is a column for Text A and a Column for a classification.

Table A

Text A	Classification A

The mouse eats a carrot	Nature
Prague is a nice city	Geography
I am a human	Human
The cat eats the mouse	Nature
Sun is shinning	Weather
She is a professional	Social

Then I have table B, that has text but no classification. In the text, you find words that are to be found in Text A, such as Cats, Sun, Human, etc. I want based an algorithm or Formula that based on this words, go to the Table A, and brings me the classification. These two tables are just an example, in reality I have two huge datasets.

So for example, for "the cat is my pet", the classification B should be "Nature"

What could I do to solve this? Could I solve it on Qlik?

Thaanks

Table B

Text B	Classification B

The human is complex	?
That cat is my pet	?
I do not like the mouse	?
the sun is yellow	?
he is a professional	?
Prage is in europe	?

Re: Problem with Data Mapping and Matching in huge dataset

marcus_sommer — Fri, 02 Feb 2024 14:38:32 GMT

Qlik has very powerful string-functions and mapping-features. Therefore it would be possible to develop an appropriate categorizing. But the most and hardest work is not the technically implementation else to develop a sensible and valid set of rules for the categorizing especially in regard to clean and prepare the data in beforehand and to determine the order of the execution and the prioritizing of the matches.

Re: Problem with Data Mapping and Matching in huge dataset

Nemo1 — Fri, 02 Feb 2024 14:52:31 GMT

Hey, thanks for your answer. I have already prepare the data in two datasets.. but i do not know how I could keep going now... what would you do? any suggestions is welcome, thanks

Re: Problem with Data Mapping and Matching in huge dataset

Nagaraju_KCS — Fri, 02 Feb 2024 15:10:26 GMT

how many combinations do you have ?

like Cats, Sun, Human, professional

Re: Problem with Data Mapping and Matching in huge dataset

marcus_sommer — Fri, 02 Feb 2024 15:18:13 GMT

You have really a set of rules by differentiating between nouns/verbs/adjectives and further expletive and all kinds of punctuation marks? Also is the context within a sentence important or not? How to handle typos? In which order should be searched and matched?

... the human looked like the mouse to the shining sun ... // which one should win ?

Beside this take a look on mapsubstring() which could include multiple match-returns into a string which could be later evaluated.

Another common way would be to load the strings with subfield() to split it into n records on which you may apply a normal mapping, maybe something like:

m: mapping load Lookup, Return from MyRules;

t:
load *, applymap('m', Substring, '#NV') as Category, rowno() as RowNo;
load Key, subfield(String, ' ', iterno()) as SubString, recno() as RecNo, iterno() as IterNo
from MyDataset while iterno() <= substringcount(String, ' ') +1;