Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have now developed a code to cleanse an Excel file that we get from a customer where there are often several item_numbers in one line (there is supposed to be only one per line of course).
The issue is, I have yet run it on only one list - of five. They are all the same format, but in the second one, the records look even more chaotic.
I will attach a sample - in short, the customer has packed the complete description into the item_nr_field and I cannot think of any way to identify which of these X lines (in one line) is/are relevant - in the sample, only the 1st is relevant, all the others are like a description.
I can't see that I have anything to go on here.
I have already thought of just outputting those as "post_processing" so that someone can have a look at them and process them manually - but even for that, I would need to find a way to identify that these are records my code cannot handle.
Thanks for any ideas - my own are spent for the time being.
Best regards,
DataNibbler
Hi DataNibbler,
generally you need an approach like Separating records but in this case you split firstly the records per subfield without any cleansing and checked in a second step the content for "LFS-Nr." and/or further chars to identity if this record is garbage or not and in a third step could be the cleasing from the valid record follow.
- Marcus
Hi DataNibbler,
generally you need an approach like Separating records but in this case you split firstly the records per subfield without any cleansing and checked in a second step the content for "LFS-Nr." and/or further chars to identity if this record is garbage or not and in a third step could be the cleasing from the valid record follow.
- Marcus
Hi!
Well, for now I can make it - I go by the nr. of letters. The maximum that can usually be expected to be there in a "clean" record is 6 ("Stueck" - pcs.) - so whenever there's more than 6, I say that record is garbage and needs post_processing.
However - you know the one about the race between developers and nature ... There's no limit to the creativity of users ...
Best regards,
DataNibbler