Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Split out transactions


Hi,

I have now developed a code to cleanse an Excel file that we get from a customer where there are often several item_numbers in one line (there is supposed to be only one per line of course).

The issue is, I have yet run it on only one list - of five. They are all the same format, but in the second one, the records look even more chaotic.

I will attach a sample - in short, the customer has packed the complete description into the item_nr_field and I cannot think of any way to identify which of these X lines (in one line) is/are relevant - in the sample, only the 1st is relevant, all the others are like a description.

I can't see that I have anything to go on here.

I have already thought of just outputting those as "post_processing" so that someone can have a look at them and process them manually - but even for that, I would need to find a way to identify that these are records my code cannot handle.

Thanks for any ideas - my own are spent for the time being.

Best regards,

DataNibbler

1 Solution

Accepted Solutions
marcus_sommer

Hi DataNibbler,

generally you need an approach like Separating records but in this case you split firstly the records per subfield without any cleansing and checked in a second step the content for "LFS-Nr." and/or further chars to identity if this record is garbage or not and in a third step could be the cleasing from the valid record follow.

- Marcus

View solution in original post

2 Replies
marcus_sommer

Hi DataNibbler,

generally you need an approach like Separating records but in this case you split firstly the records per subfield without any cleansing and checked in a second step the content for "LFS-Nr." and/or further chars to identity if this record is garbage or not and in a third step could be the cleasing from the valid record follow.

- Marcus

datanibbler
Champion
Champion
Author


Hi!

Well, for now I can make it - I go by the nr. of letters. The maximum that can usually be expected to be there in a "clean" record is 6 ("Stueck" - pcs.) - so whenever there's more than 6, I say that record is garbage and needs post_processing.

However - you know the one about the race between developers and nature ... There's no limit to the creativity of users ...

Best regards,

DataNibbler