Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm doing some extense transformation in customer data and at some point I compare address strings looking for typing errors. Things were working fine before my last step that involves using a big nested Replace() expression, along with some SubStringCount() and SubField().
Basically, instead of repeating this big expression I added it to a SET variable and surprisingly it works, but only if I calculate the variable without single quotes -> $(vExample) as MyField instead of '$(vExample)' as MyField.
The problem is that loading is now taking too long, and the weird thing about it is that I only use this in a first table load that prepares the data to a nested loop sequence, but it affects directly the rest of the script.
Why is this happening? The table with the variable as a field is already loaded and it loaded fast. Is it because I'm using the values from the table and it somehow recalculates everytime?
In general the approach should work. Noticeable is that there are no spaces between the concat-parts which prevents in some scenarios the intended parsing. Therefore you could try it like:
... ' ' & Upper(cliente_rua_temp) ...
or alternatively:
... Upper(' ' & cliente_rua_temp) ...
Beside this is each replace-approach difficult to (practically) impossible if there is any overlapping between the replace-substrings and/or their position and frequency is not unique definable. Some tasks are solvable by applying the replace in multiple steps which are hierarchically ordered and are only applied on certain substrings.
But in more complex scenarios I would tend not to try to clean the entire string at ones else separating each sub-string per subfield(), maybe like:
load *, len/isnum/istext(Substring) as ...;
load *, subfield(String, ' ', iterno()) as Substring, recno() as RecNo, iterno() as IterNo
from Source while iterno() <= substringcount(String, ' ');
With the len/isnum/istext() + similar checks and IterNo the content of each substring could be identified and on it afterwards specialized replace/cleaning-logic applied. The final step would be to concatenate the substrings again to the complete string with the help of RecNo and IterNo.
Hi, marcus. Thanks for the reply
It turns out to be something else that I was doing wrong. Using ' '&Upper(cliente_rua_temp) did the job after fixing it.
Still what you said is interesting, about separating the substrings. I actually do this at some part of the script, but this approach seems to be more efficient, I'll definitely try it and compare the results.
Thanks again!!