2 Replies Latest reply: Oct 9, 2014 7:37 AM by Friedrich Hofmann RSS

    Split out transactions

    Friedrich Hofmann


      Hi,

       

      I have now developed a code to cleanse an Excel file that we get from a customer where there are often several item_numbers in one line (there is supposed to be only one per line of course).

      The issue is, I have yet run it on only one list - of five. They are all the same format, but in the second one, the records look even more chaotic.

      I will attach a sample - in short, the customer has packed the complete description into the item_nr_field and I cannot think of any way to identify which of these X lines (in one line) is/are relevant - in the sample, only the 1st is relevant, all the others are like a description.

      I can't see that I have anything to go on here.

      I have already thought of just outputting those as "post_processing" so that someone can have a look at them and process them manually - but even for that, I would need to find a way to identify that these are records my code cannot handle.

       

      Thanks for any ideas - my own are spent for the time being.

      Best regards,

       

      DataNibbler

        • Re: Split out transactions
          Marcus Sommer

          Hi DataNibbler,

           

          generally you need an approach like Separating records but in this case you split firstly the records per subfield without any cleansing and checked in a second step the content for "LFS-Nr." and/or further chars to identity if this record is garbage or not and in a third step could be the cleasing from the valid record follow.

           

          - Marcus

            • Re: Split out transactions
              Friedrich Hofmann


              Hi!

               

              Well, for now I can make it - I go by the nr. of letters. The maximum that can usually be expected to be there in a "clean" record is 6 ("Stueck" - pcs.) - so whenever there's more than 6, I say that record is garbage and needs post_processing.

              However - you know the one about the race between developers and nature ... There's no limit to the creativity of users ...

              Best regards,

               

              DataNibbler