Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Scrubbing/Cleaning Data when Load

Hey All-

I've been scouring the QV discussions and a common theme came up: probably shouldn't use QV for data scrubbing purposes.

I get this, but unfortunately I'm stuck in a situation where QV is the only means (for right now) of accomplishing this task.  That being said, I have one field called CodeName that needs to be scrubbed and cleaned on a weekly basis as it's being loaded into QV.  I'd really rather not create a matchtable to do this because the CodeNames are constantly being created (hundreds on a daily basis), so a MatchTable isn't scalable.

Here are the common factors that need to be scrubbed:

CodeName

(front)code_123 --> changed to: code_123

(mainstore)  --> changed to: null

code_345(station) --> changed to: code_345

Also...  trim functions...  I'd like to cover any instance of a leading or trailing space?  Do I need to do both (e.g. trim( ' abc') and trim('abc ')?  or can I just encompasses everyting under one: trim(' abc ')? 

Thanks!

10 Replies
nagaiank
Specialist III
Specialist III

For the second question, the answer is as follows:

      Ltrim() will remove leading spaces

     Rtrim() will remove trailing spaces

     Trim() will remove both leading and trailing spaces

The first requirement is not clear. Attach a sample of raw data to be scrubbed along with the desired result after scrubbing for the sample.

Not applicable
Author

Sorry about the lack of clarify.  I've attached a qvw with sample data.

Essentially what I meant was:

Any value that leads with (front) needs to be removed.

So... in my above example it would have been from (front)code_123 to code_123

Any value that is just (mainstore) to be changed to a null value

Any value that ends with (station) to be removed.  So, from code_234(station) to code_234

Hope that makes sense!

Not applicable
Author

Hi,

Try:

replace(code_codename, '(' & TextBetween(code_codename,'(',')') & ')' ,'') as new_code,

Hope this helps

Regards!

nagaiank
Specialist III
Specialist III

One way of doing this for this type of scrubbing is by using 'Replace' function as shown below:

Replace(Replace(Replace(code_codename,'(station)',''),'(front)',''),'(mainstore)','') as scrubbed_codename

See the attached qvw file.

You may use the MapSubstring() function also.

Hope this helps.

Miguel_Angel_Baeyens

Hi,

Assuming you will have more cases, I'd use a mapping table with the actual values and converted values:

OldandNewValuesMap:

MAPPING LOAD * INLINE [

Old, New

(front)

(station)

(mainstore)

];

Table:

LOAD *,

     MapSubString('OldandNewValuesMap', Field) AS NewField

FROM ...

Yes, in this case all values are converted to null, so the Replace() function would be fine as well, depending on the number of possible values.

Hope that helps.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica

Not applicable
Author

Thanks everyone! All of these were really helpful.

Not applicable
Author

As the late & great Peter Falk used to say... "Just one more thing..."

code_123_4567

code_234_ab35623

code_xxx_0785cat

code_14a_td

In the four (4) examples above, the last underscore '_' and everything after the last underscore needs to be removed. The characters after the last underscore can be varying in length and a mixture of numbers and/or letters.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Something like:

if(substringcount(raw, '_') > 1

          ,mid(raw, 1, index(raw, '_', 2)-1)

          , raw

) as edited

-Rob

http://robwunderlich.com

Miguel_Angel_Baeyens

Hi,

Use the following

=Left('code_14a_td', Index('code_14a_td', '_', 2) -1)

Change the string by the name of your field to get only the expected information.

Hope that helps.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica