Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello, I have a large data set where the field name and it's value are separated by a ':' and ',' .
This is an example of the raw data:
Field 1 | Field 2 |
Name: Amy, Date of Birth: 13/04/86, Country of Birth: USA, Age: 32, Shoe Size: 11 , ect | 60 |
Name: Kevin, Date of Birth: 04/07/91, Country of Birth: India, Age: 32, Shoe Size: 5 , ect | 70 |
Name: John, Date of Birth: 06/07/95, Country of Birth: UK, Age: 28, Shoe Size: 11 , ect | 80 |
Name: David, Date of Birth: 17/07/96, Country of Birth: Australia, Age: 25, Shoe Size: 13 , ect | 100 |
This is my desired output:
Name | Date of birth | Country | Age | Shoe size | Field 2 |
Amy | 13/04/86 | USA | 37 | 5 | 60 |
Kevin | 04/07/91 | India | 32 | 12 | 70 |
John | 06/07/95 | U.K. | 28 | 12 | 80 |
David | 07/12/96 | Australia | 25 | 13 | 100 |
Ect | Ect | Ect | Ect | Ect | Ect |
Any help would be appreciated! Thank you
I would do this dynamically using the SubField() function and Generic Load keywords.
// Step 1
[Original]:
Load
RowNo() as [ID]
, [Field 1]
, [Field 2]
;
Load * Inline [
Field 1 | Field 2
Name: Amy, Date of Birth: 13/04/86, Country of Birth: USA, Age: 32, Shoe Size: 11 | 60
Name: Kevin, Date of Birth: 04/07/91, Country of Birth: India, Age: 32, Shoe Size: 5 | 70
Name: John, Date of Birth: 06/07/95, Country of Birth: UK, Age: 28, Shoe Size: 11 | 80
Name: David, Date of Birth: 17/07/96, Country of Birth: Australia, Age: 25, Shoe Size: 13 | 100
](Delimiter is '|');
// Step 2
[Split Fields]:
Load
[ID]
, Trim(SubField([Field 1], ',')) as [Split]
, [Field 2]
Resident [Original];
// Step 3
[Generic Table]:
Generic Load
[ID]
, Trim(SubField([Split], ':', 1))
, Trim(SubField([Split], ':', 2))
Resident [Split Fields];
Drop Table [Split Fields];
// Step 4
[New Table]: Load * Inline [ID];
// Step 5
For tbl = NoOfTables() to 0 Step -1
Let vCurrentTable = TableName($(tbl));
// Step 6
If '$(vCurrentTable)' Like 'Generic Table.*' Then
Join ([New Table]) Load * Resident [$(vCurrentTable)];
Drop Table [$(vCurrentTable)];
End If
Next tbl
Here's each step in that script:
ID | Attribute | Value |
1 | Name | Austin |
1 | Region | Northeast |
1 | Favorite Color | Red |
2 | Name | JC |
2 | Region | Southeast |
2 | Favorite Color | Blue |
...and turn it into a table like this:
ID | Name | Region | Favorite Color |
1 | Austin | Northeast | Red |
2 | JC | Southeast | Blue |
We get our [Attribute] and [Value] fields by using the SubField() function again to split those columns on the colon character. We use the Trim() function just to make sure there's no extra whitespace.
The Generic keyword only gets us part of the way there, however. It will put each of our new columns into new, separate tables, all linked together with the [ID] field:
In order to rejoin them all, we can use a simple For Loop, like in step 5.
After all of that, we'll have our desired output:
If you want to have those new split-out fields in the same table as [Field 2], all you need to do is join those two tables.
until first ","
SubField(Field1,',',1) as Name
for second
TextBetween(Field1,',',',',1) as dateof birth
3rd
TextBetween(Field1,',',',',2) as dateof birth
after this you try to use resident load and remove "Name:" just do subfield and length
Hi Channa,
Thanks for taking a look. This solution doesn't seem to be working for me... the subfield function doesn't seem to extract all the values before the ':'.
In my raw data, there are 57-62 entries before the ':'. These then need to be converted to field names.
The TextBetween(Field1,',',',',1) is also extracting the field values correctly.
Any ideas?
I would do this dynamically using the SubField() function and Generic Load keywords.
// Step 1
[Original]:
Load
RowNo() as [ID]
, [Field 1]
, [Field 2]
;
Load * Inline [
Field 1 | Field 2
Name: Amy, Date of Birth: 13/04/86, Country of Birth: USA, Age: 32, Shoe Size: 11 | 60
Name: Kevin, Date of Birth: 04/07/91, Country of Birth: India, Age: 32, Shoe Size: 5 | 70
Name: John, Date of Birth: 06/07/95, Country of Birth: UK, Age: 28, Shoe Size: 11 | 80
Name: David, Date of Birth: 17/07/96, Country of Birth: Australia, Age: 25, Shoe Size: 13 | 100
](Delimiter is '|');
// Step 2
[Split Fields]:
Load
[ID]
, Trim(SubField([Field 1], ',')) as [Split]
, [Field 2]
Resident [Original];
// Step 3
[Generic Table]:
Generic Load
[ID]
, Trim(SubField([Split], ':', 1))
, Trim(SubField([Split], ':', 2))
Resident [Split Fields];
Drop Table [Split Fields];
// Step 4
[New Table]: Load * Inline [ID];
// Step 5
For tbl = NoOfTables() to 0 Step -1
Let vCurrentTable = TableName($(tbl));
// Step 6
If '$(vCurrentTable)' Like 'Generic Table.*' Then
Join ([New Table]) Load * Resident [$(vCurrentTable)];
Drop Table [$(vCurrentTable)];
End If
Next tbl
Here's each step in that script:
ID | Attribute | Value |
1 | Name | Austin |
1 | Region | Northeast |
1 | Favorite Color | Red |
2 | Name | JC |
2 | Region | Southeast |
2 | Favorite Color | Blue |
...and turn it into a table like this:
ID | Name | Region | Favorite Color |
1 | Austin | Northeast | Red |
2 | JC | Southeast | Blue |
We get our [Attribute] and [Value] fields by using the SubField() function again to split those columns on the colon character. We use the Trim() function just to make sure there's no extra whitespace.
The Generic keyword only gets us part of the way there, however. It will put each of our new columns into new, separate tables, all linked together with the [ID] field:
In order to rejoin them all, we can use a simple For Loop, like in step 5.
After all of that, we'll have our desired output:
If you want to have those new split-out fields in the same table as [Field 2], all you need to do is join those two tables.
Thanks very much for the thorough explanation Austin!! This worked really well.
the issue is resolved and u can use the below script
LOAD
subfield("Field 1", ',' , 1) ,
subfield("Field 1", ',' , 2),
subfield("Field 1", ',' , 3),
subfield("Field 1", ',' , 4),
subfield("Field 1", ',' , 5),
"Field 2" as "Field 6",
replace(subfield("Field 1", ',' , 1), ':', ',')
FROM [lib://DataFiles/Unstructured Data.xlsx]
(ooxml, embedded labels, table is Sheet1);
sorry the above code is the wrong one and the below one is the right one
CustomerData:
LOAD
Subfield(subfield("Field 1", ',', 1), ':', 2) as "Name",
subfield(subfield("Field 1", ',', 2), ':',2) as "Date of Birth",
subfield(subfield("Field 1", ',', 3), ':',2) as "Country of Birth",
subfield(subfield("Field 1", ',', 4), ':',2) as "Age",
subfield(subfield("Field 1", ',', 5), ':',2) as "Shoe Size",
"Field 2"
FROM [lib://DataFiles/Unstructured Data.xlsx]
(ooxml, embedded labels, table is Sheet1);