Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Good Morning All,
We receive files and one of columns contains style, color and color code information. The problem that data in this column can be placed by different ways, see examples below:
Examples | Description | ||
---|---|---|---|
|
| ||
|
| ||
|
| ||
|
| ||
|
| ||
|
| ||
|
| ||
|
| ||
|
|
Do you have any ideas how correctly determine Style, Color code and color description.
Thanks,
Vitaliy
Not without one or more tables for each element with the valid values for those elements.
|
And that's not STYLE TRASH COLOR, but STYLE TRASH COLOR COLOR.
Hi Gysbert,
As I understand you mean create tables with correct values and then compare it with source files?
BTW, BLACK-WHITE is one color
Yes. That seems to be the only way. Check each value from those tables with the weird strings to see if any value matches. Or do you have an algorithm that can parse such a string and spit out the correct parts?
At the first time when I had a few variants for example ED6SD05NL/BLK/70 and PO64301PL.IVORY
I tried use something like this:
IF (substringcount([Style], '.') >0, Subfield([Style], '.', 1),
IF (substringcount([Style], '/') >0, Subfield([Style], '/', 1) )) AS STYLE
But it became difficult to use this way when we I started getting a lot of variants.
So I can't say that I have an algorithm
Totally randomly put together then. Is there always a non alphanumeric character between two elements? I see dots, spaces and slashes used. Or are elements sometimes concatenated together directly too?
edit: I think I already see the next problem: J-CDB itself contains a character that's also used as a separator. Looks like bloody useless data tbh. Spank the fool that thought up this idiocy.
Yes sometimes two elements can be combined.
For example C99ELS007BLACK
Bah. Then only brute force matching of substrings remains and you'd better hope that no substring value ever matches or is part of another substring value.
Hi Vitaliy,
In your other Columns, do you have any associative information like Manufacture, Model, product, supplier etc...
Based on my experience, i noticed different Manufacturer or supplier tend to use their own naming conventions.
I had a similar case as you did, yours is more complex. I ended up building a lookup table (by script and manually).
you must have other data sources which you can pull these info from
Hi Jonathan,
Yes, I have such information.