Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am facing the following challenge. I need to read an Excel column into a QlikView and to split it into subfields somehow.
The problem is that records in this column are of variable length.
There are records in a format of:
400-20-3 (if the row starts with 400 there is only 3 items there)
If a record starts with any other number, the length of the record is unknown - all we know that '-' is the separator, ie. it could be:
932-5-2-50-20-7
or
506-3-90-20-7-2-1-1
I need to be able to read the records starting with 400 into
[region] [subregion] [city] table
and all the others into
[region] [param1] [param2] [param3] ... [param n]
sort of structure, that will allow me to sort the records based on any of the subfields.
Any idea how to do it?
Gogle
Well, you have two options:
1. If your data is not huge, you can load the whole strings into the table and then process the table "row by row" in a loop, using function peek(). Within the loop, you can organize another loop to parse all the '-' delimiters, using function index().
2. The other approach is to use function subfield() for all of those rows where the number of attributes is unknown. However, instead of getting the attributes in different fields, you'll end up with the attributes in a single field across multiple rows. Function subfield() will cause creating multiple rows with all the other fields repeated and the concatenated field parsed based on a specified delimiter.
cheers,
You should read the whole row into QlikView as a single string and then parse it in QlikView using string functions. If you search the Help Section for "String Functions", you'll find all the collection ...
Thanks Oleg,
however, my problem is that the length of the string is unknown. How can I loop through all occurences of the '-' character to catch all the subfields?
Regards
Gogle
Well, you have two options:
1. If your data is not huge, you can load the whole strings into the table and then process the table "row by row" in a loop, using function peek(). Within the loop, you can organize another loop to parse all the '-' delimiters, using function index().
2. The other approach is to use function subfield() for all of those rows where the number of attributes is unknown. However, instead of getting the attributes in different fields, you'll end up with the attributes in a single field across multiple rows. Function subfield() will cause creating multiple rows with all the other fields repeated and the concatenated field parsed based on a specified delimiter.
cheers,
Here is a .qvw that demonstrates one possible solution for you.
The solution processes the records in groups based on how many parameters are in that field.
For example the record with '932-5-2-50-20-7' has 6 subfields, '506-3-90-20-7-2-1-1' has 8 subfields, etc. The 'subfield()' function does the actual splitting, but a loop and some variable trickery is used to generate the proper load statement.
I generate a dynamic load statement based on the number of subfields in the data, so it should work no matter how many subfields you have.
// Sample Original Data
OriginalData:
LOAD * INLINE [
RowNumber, FieldToSplit
1, 400-20-3
2, 932-5-2-50-20-7
3, 506-3-90-20-7-2-1-1
4, 507-3-90-20-7-2-1-1
5, 632-6-55-20-1
6, 345-7-456-3-245
7, 1-2-3-4-5-6-7-8-9
8, 123-234-345-456-567
9, 300-1
10, 722-5-5-98
11, 42
];
// Here is a simple Subfield() split.
// In this case all of the different 'Params' are going to be stored in one single field.
// This will make it hard to answer the question "for region 123, what is param4"
BasicSubfieldTable:
Load
RowNumber as SubfieldRowNumber,
SubField( FieldToSplit, '-' ) as FieldData
Resident OriginalData;
// Preserving the subfields
//
// So what if it is important that we can select values from a particular subfield?
// A more complicated solution is required. Are there easier ways to do this? Probably.
// This is one solution.
// The overall steps are outlined here:
//
// 1. Find and sort rows based on the number of subfields (FieldCount)
// 2. Find the number of distinct FieldCounts
// 3. For every distinct FieldCount, execute a load statement to process all rows with that number of fields
//
// Load the city data first...
CityTable:
Load
Subfield( [FieldToSplit], '-', 1) as CityRegion, // If you name this Region it would be a key - maybe that's what you want?
Subfield( [FieldToSplit], '-', 2) as SubRegion,
Subfield( [FieldToSplit], '-', 3) as City
Resident OriginalData where Subfield( [FieldToSplit], '-', 1) = '400';
// Find the number of subfields in each row...
OriginalDataStaging:
Load
FieldToSplit,
(substringcount( FieldToSplit, '-' ) + 1 ) as FieldCount
Resident OriginalData where Subfield( [FieldToSplit], '-', 1) <> '400'; // exclude the city data we loaded into the other table
// Find all the distinct Field Counts in the data.....
SubfieldCountIsland2:
Load Distinct
FieldCount as DistinctfieldCount
Resident OriginalDataStaging;
// Order highest to lowest
// - Note: when you use Order By in the above load, it does not order as I expected.
// That is why I have split the work into two separate load statements here.
SubfieldCountIsland:
Load
DistinctfieldCount as SubfieldCount
Resident SubfieldCountIsland2 order by DistinctfieldCount Desc;
// Now we have a range for our loop
// Let LargestFieldCount = peek( 'SubfieldCount', 0, 'SubfieldCountIsland' );
Let DistinctFieldCounts = NoOfRows( 'SubfieldCountIsland' );
// Reload and split based on the number of subfield counts in the remaining data
//
// For every SubfieldCount i in SubfieldCountIsland
// Let CurrentFieldCount = i
// Load all rows from OriginalData where FieldCount = CurrentFieldCount
// Split into appropriate number of rows.
//
// This code generates a variable that contains a load statement, then runs that statement.
// An example of a statement it creates:
// Load Subfield( [FieldToSplit], '-', 1) as Region, Subfield( [FieldToSplit], '-', 2) as Param1, Subfield( [FieldToSplit], '-', 3) as Param2, Subfield( [FieldToSplit], '-', 4) as Param3, Subfield( [FieldToSplit], '-', 5) as Param4, Subfield( [FieldToSplit], '-', 6) as Param5, Subfield( [FieldToSplit], '-', 7) as Param6, Subfield( [FieldToSplit], '-', 😎 as Param7, Subfield( [FieldToSplit], '-', 9) as Param8 Resident SubfieldCount where FieldCount = 9
//
// This solution builds load statements 'on the fly'
for i = 0 to ($(DistinctFieldCounts) -1)
Let CurrentFieldCount = peek( 'SubfieldCount', $(i), 'SubfieldCountIsland' );
// Construct the load statement
Let VariableFieldLoad = 'Load Subfield( [FieldToSplit], '&chr(39)&'-'&chr(39)&', 1) as Region';
for j = 2 to CurrentFieldCount
VariableFieldLoad = VariableFieldLoad & ', Subfield( [FieldToSplit], '&chr(39)&'-'&chr(39)&', $(j)) as Param' & evaluate($(j)-1);
next
// If this is not the first load statement, we will have to concatenate load to get it all in the same table
if( $(i) <> 0 ) then
VariableFieldLoad = 'Concatenate ' & VariableFieldLoad;
end if
// Final Version of the load statement
VariableFieldLoad = VariableFieldLoad & ' Resident OriginalDataStaging where FieldCount = $(CurrentFieldCount);';
// Run the load statement
RegionData:
$(VariableFieldLoad);
next
// Clean up
Drop Table OriginalDataStaging;
Drop Table SubfieldCountIsland;
Drop Table SubfieldCountIsland2;
HI Oleg, for option 1, would you be more specific on how we use PEEK and Index? Thanks!
Great clear explanation. Pefect for what i needed.
Thanks