Re: Clean contents of a field in an Excel table - Qlik Community

datanibbler · ‎2014-10-22

Hi,

I have to process a field in an Excel_table and decide whether this is a "clean" record or not.

That is really hard because there are so many possible variations - the people populating those lists are very creative 😉

A "clean" record, according to my logic, would look like this

8638294 / 4 Stück

(there is some flexibility, the text could also be 'St' or 'Stk' or 'Stk.', that doesn't matter)

An "unclean" record is one where there are actually several lines in one field. I can check that with the Substringcount() function - well, I can usually do that - but now I have a field that looks like this

SN 8638294 / 4 Stück / S80140915

That actually constitutes a record that I cannot process automatically - though that number beginning with "S" is one that I could use. But how to even classify this as an unclean record? The letter S of course is part of any variation of 'Stk", so I cannot just use Purgechar() and exclude the big and small S;

=> When I do >> LEN(PurgeChar(Mat_pre, ' /-0123456789STUECKstueckÜü.')) <<, the letter S will be excluded and only one (the "N") will be counted as an unwanted letter.

Also, that "SN" need not be there in every case - so I cannot even say that the 'S' may be there twice at the max.

Can anybody suggest me what to do in that case?

Thanks a lot!

Best regards,

DataNibbler

datanibbler · ‎2014-10-22

I'm thinking - that "SN" at the beginning may or may not be there.

Likewise, that "Stück" may or may not be there.

A "clean" record might also look like this

7589121/1000

Could I somehow identify those two individual numbers (don't mind the slash, they use anything as a separator, but there is always something inbetween) and then go like, following the second number, there must not be anything more to come?