Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Crichter141
Creator
Creator

Why I hate Free Text - Text Analysis help please.

QLIKSENSE only please.

So our input platform has a lot of free text and not very many drop downs to standardize the data.  This has become a nightmare for certain dimensions that we need to report on.  

Example:

One Field:  Color  Seems simple enough.  We have over 1 million product color names.  Partly because the users put in various versions of the same name, but there's still a lot of actual individual names.

Example:  MOCHA-IVORY can be MOCHA-IVORY or IVORY-MOCHA or Fabric MOCHA-IVORY, etc.  You name it they can put it in the Color field.  And we have a lot of these types of fields (more than I want to write IF statements for).

Is there a way to analyze the text to look at the color field for all variations of a color (MOCHA-IVORY) and then replace it with a standard value for each one?

Can't be value by value...hoping for a code solution, not an extension.

More Valid Colors:

color1:  MOCHA-IVORY - could be reversed, could be with or without hyphen, could have other text

color2:  IVORY-IVORY

color3:  JAVA

color4:  WHITE

color 5:  ALMOND OVER GOLD

color 6:  ALMOND LACE

color 7:  AMETHYST

Thank you!

Labels (5)
1 Solution

Accepted Solutions
BrunPierre
Partner - Master II
Partner - Master II

Hello, you can replace the values in the source table by creating a table mapping with the standard values.

//----- Load mapping table of Valid Colors -----\\
MapValidColor:
mapping LOAD *
Inline [
ColorCode,Valid Color
color1,MOCHA-IVORY
color2,IVORY-IVORY
color3,JAVA
color4,WHITE
color5,ALMOND OVER GOLD
color6,ALMOND LACE
color7,AMETHYST
];

//----- Load Source table containing various versions of the same name.If the color code isn't listed in the mapping table then display no match-----\\
Data:
LOAD ColorCode,
Color,
ApplyMap('MapValidColor',ColorCode,'No match') as "Valid Color"

FROM SourceTable;

//----- Optional----- \\
DROP Field Color From Data;
RENAME Field "Valid Color" to Color;

View solution in original post

5 Replies
BrunPierre
Partner - Master II
Partner - Master II

Hello, you can replace the values in the source table by creating a table mapping with the standard values.

//----- Load mapping table of Valid Colors -----\\
MapValidColor:
mapping LOAD *
Inline [
ColorCode,Valid Color
color1,MOCHA-IVORY
color2,IVORY-IVORY
color3,JAVA
color4,WHITE
color5,ALMOND OVER GOLD
color6,ALMOND LACE
color7,AMETHYST
];

//----- Load Source table containing various versions of the same name.If the color code isn't listed in the mapping table then display no match-----\\
Data:
LOAD ColorCode,
Color,
ApplyMap('MapValidColor',ColorCode,'No match') as "Valid Color"

FROM SourceTable;

//----- Optional----- \\
DROP Field Color From Data;
RENAME Field "Valid Color" to Color;

hic
Former Employee
Former Employee

A more elaborate explanation of data cleansing using the above method can be found on https://community.qlik.com/t5/Design/Data-Cleansing/ba-p/1464362 

Crichter141
Creator
Creator
Author

This will help immensely, thank you both! 

If I may follow up with a bit more complexity.  What if (in this case) the color is in a field with a string - can mapping loads be used with a parsed out string or wildmatch?  example text.  'The color wanted is off-white, but our closest match is IVORY-IVORY'  Can I pull out just the color to put into a mapping load to standardize it?

Again, thank you!

hic
Former Employee
Former Employee

One way could be the following:
 
1) Define the colors in a mapping table:
 
Colors:
Mapping Load Color, '>>>' & Color & '<<<' as TaggedColor
Inline
[Color
MOCHA-IVORY
IVORY-MOCHA
IVORY-IVORY
JAVA
WHITE
ALMOND OVER GOLD
ALMOND LACE
AMETHYST];
 
2) Then you can use 
   TextBetween(MapSubstring('Colors',Field),'>>>','<<<') as Color
 
The MapSubstring() will insert the delimiters in the right place, and the TextBetween() will extract the color.
 
Alternatively, you can define the RGB colors already in the mapping table:
 
Mapping Load Color, '>>>' & RGB(R,G,B) & '<<<' as TaggedColor
Inline
[Color, R, G, B
MOCHA-IVORY, 100, 50, 50
...

 

Or
MVP
MVP

Careful with this sort of analysis - if you text match (case insensitive), it'll match white before it matches ivory-ivory, since white is also a legal color and is ahead of ivory-ivory in the text...