Discussion board where members can get started with QlikView.
I've been scouring the QV discussions and a common theme came up: probably shouldn't use QV for data scrubbing purposes.
I get this, but unfortunately I'm stuck in a situation where QV is the only means (for right now) of accomplishing this task. That being said, I have one field called CodeName that needs to be scrubbed and cleaned on a weekly basis as it's being loaded into QV. I'd really rather not create a matchtable to do this because the CodeNames are constantly being created (hundreds on a daily basis), so a MatchTable isn't scalable.
Here are the common factors that need to be scrubbed:
(front)code_123 --> changed to: code_123
(mainstore) --> changed to: null
code_345(station) --> changed to: code_345
Also... trim functions... I'd like to cover any instance of a leading or trailing space? Do I need to do both (e.g. trim( ' abc') and trim('abc ')? or can I just encompasses everyting under one: trim(' abc ')?
For the second question, the answer is as follows:
Ltrim() will remove leading spaces
Rtrim() will remove trailing spaces
Trim() will remove both leading and trailing spaces
The first requirement is not clear. Attach a sample of raw data to be scrubbed along with the desired result after scrubbing for the sample.
Sorry about the lack of clarify. I've attached a qvw with sample data.
Essentially what I meant was:
Any value that leads with (front) needs to be removed.
So... in my above example it would have been from (front)code_123 to code_123
Any value that is just (mainstore) to be changed to a null value
Any value that ends with (station) to be removed. So, from code_234(station) to code_234
Hope that makes sense!
One way of doing this for this type of scrubbing is by using 'Replace' function as shown below:
Replace(Replace(Replace(code_codename,'(station)',''),'(front)',''),'(mainstore)','') as scrubbed_codename
See the attached qvw file.
You may use the MapSubstring() function also.
Hope this helps.
Assuming you will have more cases, I'd use a mapping table with the actual values and converted values:
MAPPING LOAD * INLINE [
MapSubString('OldandNewValuesMap', Field) AS NewField
Yes, in this case all values are converted to null, so the Replace() function would be fine as well, depending on the number of possible values.
Hope that helps.
As the late & great Peter Falk used to say... "Just one more thing..."
In the four (4) examples above, the last underscore '_' and everything after the last underscore needs to be removed. The characters after the last underscore can be varying in length and a mixture of numbers and/or letters.