Skip to main content
Announcements
Get Ready. A New Qlik Learning Experience is Coming February 17! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
tr5939AT4
Contributor II
Contributor II

Split column into fields

Hello, I have a large data set where the field name and it's value are separated by  a ':'  and ',' . 

This is an example of the raw data:

Field 1 Field 2
Name: Amy, Date of Birth: 13/04/86, Country of Birth: USA, Age: 32, Shoe Size: 11 , ect 60
Name: Kevin, Date of Birth: 04/07/91, Country of Birth: India, Age: 32, Shoe Size: 5 , ect 70
Name: John, Date of Birth: 06/07/95, Country of Birth: UK, Age: 28, Shoe Size: 11 , ect 80
Name: David, Date of Birth: 17/07/96, Country of Birth: Australia, Age: 25, Shoe Size: 13 , ect 100

 

This is my desired output:

Name Date of birth Country Age Shoe size Field 2
Amy 13/04/86 USA 37 5 60
Kevin 04/07/91 India 32 12 70
John 06/07/95 U.K. 28 12 80
David 07/12/96 Australia 25 13 100
Ect Ect Ect Ect Ect Ect

 

Any help would be appreciated! Thank you 

Labels (2)
1 Solution

Accepted Solutions
AustinSpivey
Partner - Creator
Partner - Creator

I would do this dynamically using the SubField() function and Generic Load keywords.

// Step 1

[Original]:
Load
    RowNo() as [ID]
  , [Field 1]
  , [Field 2]
;
Load * Inline [
Field 1 | Field 2
Name: Amy, Date of Birth: 13/04/86, Country of Birth: USA, Age: 32, Shoe Size: 11 | 60
Name: Kevin, Date of Birth: 04/07/91, Country of Birth: India, Age: 32, Shoe Size: 5 | 70
Name: John, Date of Birth: 06/07/95, Country of Birth: UK, Age: 28, Shoe Size: 11 | 80
Name: David, Date of Birth: 17/07/96, Country of Birth: Australia, Age: 25, Shoe Size: 13 | 100
](Delimiter is '|');

// Step 2

[Split Fields]:
Load
    [ID]
  , Trim(SubField([Field 1], ',')) as [Split]
  , [Field 2]
Resident [Original];

// Step 3

[Generic Table]:
Generic Load
    [ID]
  , Trim(SubField([Split], ':', 1))
  , Trim(SubField([Split], ':', 2))
Resident [Split Fields];

Drop Table [Split Fields];

// Step 4

[New Table]: Load * Inline [ID];

// Step 5

For tbl = NoOfTables() to 0 Step -1
    Let vCurrentTable = TableName($(tbl));
    
    // Step 6
    
    If '$(vCurrentTable)' Like 'Generic Table.*' Then
        Join ([New Table]) Load * Resident [$(vCurrentTable)];
        Drop Table [$(vCurrentTable)];
    End If
Next tbl

Here's each step in that script:

  • Step 1 is to make sure each record has a unique ID. I use the RowNo() function to do that here.
  • Step 2 is to split each field element on the comma character. We use the SubField() function to achieve this. Note that, normally, you would need to provide a third parameter to that function to indicate which split element we want to return. In the Data Load Editor, though, that parameter is optional and will result in all split-out items being returned to separate rows automatically.
  • Step 3 is to do a Generic Load, which will allow us to "unpivot" our table. This means we take something like this...
ID Attribute Value
1 Name Austin
1 Region Northeast
1 Favorite Color Red
2 Name JC
2 Region Southeast
2 Favorite Color Blue

 

...and turn it into a table like this:

ID Name Region Favorite Color
1 Austin Northeast Red
2 JC Southeast Blue

 

We get our [Attribute] and [Value] fields by using the SubField() function again to split those columns on the colon character. We use the Trim() function just to make sure there's no extra whitespace.

The Generic keyword only gets us part of the way there, however. It will put each of our new columns into new, separate tables, all linked together with the [ID] field:

AustinSpivey_0-1677961066755.png

 

In order to rejoin them all, we can use a simple For Loop, like in step 5.

  • Step 4 is to create an empty table that we'll use to combine all of our Generic tables.
  • Step 5 is where we loop through all tables currently in the data model. For each table index (the number 1, 2, 3, 4, etc. that distinguishes each table by when they were loaded or modified), we get the get the table name.
  • Step 6 is where we take the current table name and check to see if it is prepended with our Generic table name, "Generic Table" in this case. If the current table name is, indeed, prepended with the name of the Generic table, then we Join it into our new main table, [New Table] from step 4 in this case.

After all of that, we'll have our desired output:

AustinSpivey_1-1677961463731.png

If you want to have those new split-out fields in the same table as [Field 2], all you need to do is join those two tables.

Austin Spivey | Principal Consultant @ Arc Analytics
ArcAnalytics.us | Add me on LinkedIn

View solution in original post

6 Replies
Channa
Specialist III
Specialist III

until first "," 

SubField(Field1,',',1) as Name 

for second

TextBetween(Field1,',',',',1) as dateof birth

 

3rd

TextBetween(Field1,',',',',2) as dateof birth

after this you try to use resident load and remove "Name:" just do subfield and length

 

Channa
tr5939AT4
Contributor II
Contributor II
Author

Hi Channa, 

Thanks for taking a look. This solution doesn't seem to be working for me... the subfield function doesn't seem to extract all the values before the ':'.

In my raw data, there are 57-62 entries before the ':'. These then need to be converted to field names. 

The TextBetween(Field1,',',',',1) is also extracting the field values correctly. 

 

Any ideas?

AustinSpivey
Partner - Creator
Partner - Creator

I would do this dynamically using the SubField() function and Generic Load keywords.

// Step 1

[Original]:
Load
    RowNo() as [ID]
  , [Field 1]
  , [Field 2]
;
Load * Inline [
Field 1 | Field 2
Name: Amy, Date of Birth: 13/04/86, Country of Birth: USA, Age: 32, Shoe Size: 11 | 60
Name: Kevin, Date of Birth: 04/07/91, Country of Birth: India, Age: 32, Shoe Size: 5 | 70
Name: John, Date of Birth: 06/07/95, Country of Birth: UK, Age: 28, Shoe Size: 11 | 80
Name: David, Date of Birth: 17/07/96, Country of Birth: Australia, Age: 25, Shoe Size: 13 | 100
](Delimiter is '|');

// Step 2

[Split Fields]:
Load
    [ID]
  , Trim(SubField([Field 1], ',')) as [Split]
  , [Field 2]
Resident [Original];

// Step 3

[Generic Table]:
Generic Load
    [ID]
  , Trim(SubField([Split], ':', 1))
  , Trim(SubField([Split], ':', 2))
Resident [Split Fields];

Drop Table [Split Fields];

// Step 4

[New Table]: Load * Inline [ID];

// Step 5

For tbl = NoOfTables() to 0 Step -1
    Let vCurrentTable = TableName($(tbl));
    
    // Step 6
    
    If '$(vCurrentTable)' Like 'Generic Table.*' Then
        Join ([New Table]) Load * Resident [$(vCurrentTable)];
        Drop Table [$(vCurrentTable)];
    End If
Next tbl

Here's each step in that script:

  • Step 1 is to make sure each record has a unique ID. I use the RowNo() function to do that here.
  • Step 2 is to split each field element on the comma character. We use the SubField() function to achieve this. Note that, normally, you would need to provide a third parameter to that function to indicate which split element we want to return. In the Data Load Editor, though, that parameter is optional and will result in all split-out items being returned to separate rows automatically.
  • Step 3 is to do a Generic Load, which will allow us to "unpivot" our table. This means we take something like this...
ID Attribute Value
1 Name Austin
1 Region Northeast
1 Favorite Color Red
2 Name JC
2 Region Southeast
2 Favorite Color Blue

 

...and turn it into a table like this:

ID Name Region Favorite Color
1 Austin Northeast Red
2 JC Southeast Blue

 

We get our [Attribute] and [Value] fields by using the SubField() function again to split those columns on the colon character. We use the Trim() function just to make sure there's no extra whitespace.

The Generic keyword only gets us part of the way there, however. It will put each of our new columns into new, separate tables, all linked together with the [ID] field:

AustinSpivey_0-1677961066755.png

 

In order to rejoin them all, we can use a simple For Loop, like in step 5.

  • Step 4 is to create an empty table that we'll use to combine all of our Generic tables.
  • Step 5 is where we loop through all tables currently in the data model. For each table index (the number 1, 2, 3, 4, etc. that distinguishes each table by when they were loaded or modified), we get the get the table name.
  • Step 6 is where we take the current table name and check to see if it is prepended with our Generic table name, "Generic Table" in this case. If the current table name is, indeed, prepended with the name of the Generic table, then we Join it into our new main table, [New Table] from step 4 in this case.

After all of that, we'll have our desired output:

AustinSpivey_1-1677961463731.png

If you want to have those new split-out fields in the same table as [Field 2], all you need to do is join those two tables.

Austin Spivey | Principal Consultant @ Arc Analytics
ArcAnalytics.us | Add me on LinkedIn
tr5939AT4
Contributor II
Contributor II
Author

Thanks very much for the thorough explanation Austin!! This worked really well.

Prem0212
Creator
Creator

the issue is resolved and u can use the below script

 

LOAD
subfield("Field 1", ',' , 1) ,
subfield("Field 1", ',' , 2),
subfield("Field 1", ',' , 3),
subfield("Field 1", ',' , 4),
subfield("Field 1", ',' , 5),
"Field 2" as "Field 6",
replace(subfield("Field 1", ',' , 1), ':', ',')
FROM [lib://DataFiles/Unstructured Data.xlsx]
(ooxml, embedded labels, table is Sheet1);

 

 

Solution 1.png

Prem0212
Creator
Creator

sorry the above code is the wrong one and the below one is the right one

CustomerData:


LOAD
Subfield(subfield("Field 1", ',', 1), ':', 2) as "Name",
subfield(subfield("Field 1", ',', 2), ':',2) as "Date of Birth",
subfield(subfield("Field 1", ',', 3), ':',2) as "Country of Birth",
subfield(subfield("Field 1", ',', 4), ':',2) as "Age",
subfield(subfield("Field 1", ',', 5), ':',2) as "Shoe Size",
"Field 2"
FROM [lib://DataFiles/Unstructured Data.xlsx]
(ooxml, embedded labels, table is Sheet1);