Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
vchuprina
Specialist
Specialist

Inconsistent structure of data in column

Good Morning All,

We receive files and one of columns contains style, color and color code information. The problem that data in this column can be placed by different ways, see examples below:

ExamplesDescription
C99ELS007 BLACK
STYLE COLOR
C99SE7006-ELKMU-BLACK/MUSLIN
STYLE TRASH COLOR
LP61980MQ-BDH-BLACK-WHITE
STYLE TRASH COLOR
PR6B700DFG-BC-G4432-J-CDB-CHAMBR
STYLE TRASH COLOR__CODE TRASH COLOR
WOMEN-CHAMBR.PR6B700DFG
TRASH COLOR STYLE
PO64301PL.IVORY
STYLE COLOR
ED6SD05NL/BLK/70
STYLE COLOR COLOR__CODE
R263C113E
STYLE
MEN R263C343E.OLIVE
TRASH STYLE COLOR

Do you have any ideas how correctly determine Style, Color code and color description.

Thanks,

Vitaliy

Press LIKE if the given solution helps to solve the problem.
If it's possible please mark correct answers as "solutions" (you can mark up to 3 "solutions").
9 Replies
Gysbert_Wassenaar

Not without one or more tables for each element with the valid values for those elements.

LP61980MQ-BDH-BLACK-WHITE

And that's not STYLE TRASH COLOR, but STYLE TRASH COLOR COLOR.


talk is cheap, supply exceeds demand
vchuprina
Specialist
Specialist
Author

Hi Gysbert,

As I understand you mean create tables with correct values and then compare it with source files?

BTW, BLACK-WHITE is one color

Press LIKE if the given solution helps to solve the problem.
If it's possible please mark correct answers as "solutions" (you can mark up to 3 "solutions").
Gysbert_Wassenaar

Yes. That seems to be the only way. Check each value from those tables with the weird strings to see if any value matches. Or do you have an algorithm that can parse such a string and spit out the correct parts?


talk is cheap, supply exceeds demand
vchuprina
Specialist
Specialist
Author

At the first time when I had a few variants for example ED6SD05NL/BLK/70 and PO64301PL.IVORY

I tried use something like this:

IF (substringcount([Style], '.') >0, Subfield([Style], '.', 1),

    IF (substringcount([Style], '/') >0, Subfield([Style], '/', 1) ))  AS STYLE

But it became difficult to use this way when we I started getting a lot of variants.

So I can't say that I have an algorithm

Press LIKE if the given solution helps to solve the problem.
If it's possible please mark correct answers as "solutions" (you can mark up to 3 "solutions").
Gysbert_Wassenaar

Totally randomly put together then. Is there always a non alphanumeric character between two elements? I see dots, spaces and slashes used. Or are elements sometimes concatenated together directly too?

edit: I think I already see the next problem: J-CDB itself contains a character that's also used as a separator. Looks like bloody useless data tbh. Spank the fool that thought up this idiocy.


talk is cheap, supply exceeds demand
vchuprina
Specialist
Specialist
Author

Yes sometimes two elements can be combined.

For example C99ELS007BLACK

Press LIKE if the given solution helps to solve the problem.
If it's possible please mark correct answers as "solutions" (you can mark up to 3 "solutions").
Gysbert_Wassenaar

Bah. Then only brute force matching of substrings remains and you'd better hope that no substring value ever matches or is part of another substring value.


talk is cheap, supply exceeds demand
jpenuliar
Partner - Specialist III
Partner - Specialist III

Hi Vitaliy,

In your other Columns, do you have any associative information like Manufacture, Model, product, supplier etc...

Based on my experience, i noticed different Manufacturer or supplier tend to use their own naming conventions.

I had a similar case as you did, yours is more complex. I ended up building a lookup table (by script and manually).

you must have other data sources which you can pull these info from

vchuprina
Specialist
Specialist
Author

Hi Jonathan,

Yes, I have such information.

Press LIKE if the given solution helps to solve the problem.
If it's possible please mark correct answers as "solutions" (you can mark up to 3 "solutions").