6 Replies Latest reply: Apr 21, 2016 2:53 PM by Russ Golden RSS

    Parse delimited column based on dynamic format, among non-delimited columns

    Russ Golden

      Hello again,

       

      My first question here was answered in short order, I'm hoping for the same here! This one is (I think) a little more complex.

       

      I have a tool-generated .xls file that contains many columns. Two of these columns are multipurpose (which sucks). Essentially I need to parse these two columns on load and associate them with one-another in some fashion. I've attached a sample, but bear with me here:

       

      1. The columns are pipe-delimited, with a space on each end of the pipe. Ex: "xyz | abc | 123"
      2. The number of delimited attributes is dynamic. Ex: "xyz | 123" vs. the next row "123 | abc | jkl"
      3. The order of the delimited attributes is dynamic, with respect to other rows, but will always match the same row's other column. That is to say the order of delimited column 1 on row 23 will always match the order of delimited column 2 on row 23, but not necessarily on row 24 (with one exception, see number 6).
      4. The first delimited column contains the names of the attributes delimited in the second column. Ex: column 1 = "type | product | amount" and column 2 = "type abc | product xyz | 123"
      5. Sometimes there are no delimiters, as there is only one value in each column.
      6. The columns may not have the same number of delimited attributes. Ex: column 1 = "serial | product | amount" and column 2 = "product xyz | 7"

      The ideal solution is one that parses the two columns into a number of new columns, let's say a maximum of 6 for each delimited column, named Custom Attribute 1 through 6, then Custom Value 1 through 6 with the corresponding values populating each new column on the row. Ex:

       

      Delimited 1Delimited 2Custom Attribute 1Custom Attribute 2Custom Value 1Custom Value 2
      product | serialabc | 1234-asdfproductserialabc1234-asdf

       

      Now if there is a better way to associate these parsed values to the rest of the table, please do pose some ideas. The real want is to be able to report on this in such a way that I can pick a row ID and say what the values are for the parsed columns, and run various metrics on the information therein, via counts, sums, etc. In a delimited column, it's somewhat useless.

       

      I would also be ok using these delimited columns to create new tables that correspond to an associated transaction ID (in the attached, consider Random Column A or B as a unique key).