Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello again,
My first question here was answered in short order, I'm hoping for the same here! This one is (I think) a little more complex.
I have a tool-generated .xls file that contains many columns. Two of these columns are multipurpose (which sucks). Essentially I need to parse these two columns on load and associate them with one-another in some fashion. I've attached a sample, but bear with me here:
The ideal solution is one that parses the two columns into a number of new columns, let's say a maximum of 6 for each delimited column, named Custom Attribute 1 through 6, then Custom Value 1 through 6 with the corresponding values populating each new column on the row. Ex:
Delimited 1 | Delimited 2 | Custom Attribute 1 | Custom Attribute 2 | Custom Value 1 | Custom Value 2 |
---|---|---|---|---|---|
product | serial | abc | 1234-asdf | product | serial | abc | 1234-asdf |
Now if there is a better way to associate these parsed values to the rest of the table, please do pose some ideas. The real want is to be able to report on this in such a way that I can pick a row ID and say what the values are for the parsed columns, and run various metrics on the information therein, via counts, sums, etc. In a delimited column, it's somewhat useless.
I would also be ok using these delimited columns to create new tables that correspond to an associated transaction ID (in the attached, consider Random Column A or B as a unique key).
This code is not checking for the explicite values, just aligning the available column2 values starting from the end.
For your sample data, this seems to be sufficient:
INPUT:
LOAD [Random Column A],
[Delimited Field 1],
[Delimited Field 2],
[Random Column B],
recno() as RecID
FROM
[Exceptions Transactions - Example.xls]
(biff, embedded labels, table is Data$);
LOAD RecID,
Subfield( [Delimited Field 1],'|',IterNo() ) as DelField1,
Subfield( [Delimited Field 2],'|',IterNo()-(SubStringCount([Delimited Field 1],'|')-SubStringCount([Delimited Field 2],'|') )) as DelField2,
IterNo() as DelIndex
Resident INPUT
While iterno() <= RangeMax(SubStringCount([Delimited Field 1],'|'),SubStringCount([Delimited Field 2],'|'))+1;
You can look into SubField() to separate the values and create new records, but one thing may be problematic:
The columns may not have the same number of delimited attributes. Ex: column 1 = "serial | product | amount" and column 2 = "product xyz | 7"
This example seems to show a missing serial in column2. Is this correct? Wouldn't this break the other rule, that the order is preserved?
Maybe try this for a start:
INPUT:
LOAD [Random Column A],
[Delimited Field 1],
[Delimited Field 2],
[Random Column B],
recno() as RecID
FROM
[Exceptions Transactions - Example.xls]
(biff, embedded labels, table is Data$);
LOAD RecID,
Subfield( [Delimited Field 1],'|',IterNo() ) as DelField1,
Subfield( [Delimited Field 2],'|',IterNo() ) as DelField2,
IterNo() as DelIndex
Resident INPUT
While iterno() <= RangeMax(SubStringCount([Delimited Field 1],'|'),SubStringCount([Delimited Field 2],'|'))+1;
It does break that rule, and it is generally only the case with one exception type. With the exception of the missing field, the order is preserved. I'd have to dig into the thousands of transactions to verify, but I believe I may be able to amend the rule to something like:
The order is preserved except in the instance a serial number is missing. The first delimited attribute in column 1 will be "serial number" or similar, and the first delimited attribute in column 2 SHOULD be the serial number, however it may be missing. In this case, something along the lines of the below may work:
IF(number or pipes in column 1 > number of pipes in column 2, first attribute in column 1 = null) then continue to match the other attributes together
Does that help? I will look into SubField().
Thanks!
This looks like a good start. I will try this out and report back.
Thanks again,
Russ
This code is not checking for the explicite values, just aligning the available column2 values starting from the end.
For your sample data, this seems to be sufficient:
INPUT:
LOAD [Random Column A],
[Delimited Field 1],
[Delimited Field 2],
[Random Column B],
recno() as RecID
FROM
[Exceptions Transactions - Example.xls]
(biff, embedded labels, table is Data$);
LOAD RecID,
Subfield( [Delimited Field 1],'|',IterNo() ) as DelField1,
Subfield( [Delimited Field 2],'|',IterNo()-(SubStringCount([Delimited Field 1],'|')-SubStringCount([Delimited Field 2],'|') )) as DelField2,
IterNo() as DelIndex
Resident INPUT
While iterno() <= RangeMax(SubStringCount([Delimited Field 1],'|'),SubStringCount([Delimited Field 2],'|'))+1;
Thanks again for your help - you're seriously good at this stuff.
This should work quite well.
Best,
Russ