For the second question, the answer is as follows:
Ltrim() will remove leading spaces
Rtrim() will remove trailing spaces
Trim() will remove both leading and trailing spaces
The first requirement is not clear. Attach a sample of raw data to be scrubbed along with the desired result after scrubbing for the sample.
Sorry about the lack of clarify. I've attached a qvw with sample data.
Essentially what I meant was:
Any value that leads with (front) needs to be removed.
So... in my above example it would have been from (front)code_123 to code_123
Any value that is just (mainstore) to be changed to a null value
Any value that ends with (station) to be removed. So, from code_234(station) to code_234
Hope that makes sense!
ScrubSample.qvw 130.8 K
One way of doing this for this type of scrubbing is by using 'Replace' function as shown below:
Replace(Replace(Replace(code_codename,'(station)',''),'(front)',''),'(mainstore)','') as scrubbed_codename
See the attached qvw file.
You may use the MapSubstring() function also.
Hope this helps.
ScrubSampleDone.qvw 133.8 K
Assuming you will have more cases, I'd use a mapping table with the actual values and converted values:
OldandNewValuesMap: MAPPING LOAD * INLINE [ Old, New (front) (station) (mainstore) ]; Table: LOAD *, MapSubString('OldandNewValuesMap', Field) AS NewField FROM ...
Yes, in this case all values are converted to null, so the Replace() function would be fine as well, depending on the number of possible values.
Hope that helps.
As the late & great Peter Falk used to say... "Just one more thing..."
In the four (4) examples above, the last underscore '_' and everything after the last underscore needs to be removed. The characters after the last underscore can be varying in length and a mixture of numbers and/or letters.
I have modified the load script to include scrubbing this pattern also. The added line in the script is
If(scrubbed_codename like 'code_???_*',Left(scrubbed_codename,8),scrubbed_codename) as scrubbed_codename
The modified file is attached.
Rob's suggestion also will work.
You will have to keep updating your scrubbing script as and when you find the unclean pattern in the data.
ScrubSample.qvw 133.5 K