Qlik Community

QlikView Documents

Documents for QlikView related information.

The Secret Life of SubField

Not applicable

The Secret Life of SubField

NOTE: I just realized I made an amateur error on posting this..  When experimenting with SubField, I was looking at the QVD for the first output (without subfield), and looking at a Table Box for the second output (with subfield).  The tablebox automatically eliminates redundant rows, while the QVD, of course does not.  DISTINCT is necessary to eliminate redundant rows, even when using SubField. Oops

The QlikView documentation describes SubField() function as the following:

In its three-parameter version, this script function returns a given substring from a larger string s with delimiter 'delimiter'. index is an optional integer denoting which of the substrings should be returned. If index is omitted when subfield is used in a field expression in a load statement, the subfield function will cause the load statement to automatically generate one full record of input data for each substring that can be found in s.

In its two-parameter version, the subfield function generates one record for each substring that can be taken from a larger string s with the delimiter 'delimiter'. If several subfield functions are used in the same load statement, the Cartesian product of all combinations will be generated.

The really awesome feature of SubField is that it automatically eliminates duplicate records generated by the function (note: with used of DISTINCT in LOAD statement).  This is particularly handy when working with large data sets.

In my current project, I have a dataset that describes what combination of projects and zips that a specific customer wants business. The dataset has three fields: Customer, Zip(List), Project(list).    Zip and Project contain comma-separated lists in their fields, which is not in a friendly format.  To do analysis in QlikView, I need to get each project and zip combination for each customer onto separate records.  Fortunately, SubField does all the heavy lifting for me.

In the following example, the fields are separated by a semicolon.  Within the zip and project fileds, you can see the comma separated lists.

customer;zip;project

338;87124,87125;1,2

338;87124;1,3

339;87124,87129;1,2,3

339;94903;1

338;87124;1,3

Loading this into QlikView without SubField:

LOAD
customer,
zip,
project
FROM
test.txt
(
txt, codepage is 1252, embedded labels, delimiter is ';', msq);

I get the following unusable table.:

customerzipproject
33887124,871251,2
338871241,3
33987124,871291,2,3
339949031
33887124,871251,3

However, incorporating the SubField() function to break out the zip and project fields:

LOAD DISTINCT
customer,
SubField(zip,',')         as zip,
Subfield(project,',')     as project
FROM
test.txt
(
txt, codepage is 1252, embedded labels, delimiter is ';', msq);

I get a Cartesian join containing every DISTINCT combination of zip and project for each customer:

customerzipproject
338871241
338871242
338871243
338871251
338871252
339871241
339871242
339871243
339871291
339871292
339871293
339949031

Even though I have repeated data in my input, my output contains only unique combinations.  I found this exceptionally useful. My production input dataset contained 78,000 lines.  After applying SubField to the LOAD statement on Project and Zip, my table resulted in 20 million unique records ready for analysis.

This is my first post of any type in the QlikView Community. I have read and learned so much from many other folks on this forum, and I felt it was time to give back .  Next up (when I get the chance), I will post on an end-to-end system to split out current QVD's by a specified time period (aka "sharding"), then incrementally load data to these sharded QVD's and create new ones as the specified time period changes (e.g. 2013-April.qvd to 2013-May.qvd), and finally have the Tier 2 loader load these QVD's based on a specified number of time periods back.  It'll be interesting to see if I can get that cleaned up for a post on here.  Cheers!

Labels (1)
Comments
matzenbacher
Contributor II

thanks, nice job!

0 Likes
cabhijit
Valued Contributor

Worth reading. Thank you for sharing.

Cheers!

0 Likes
Not applicable

Thanks for the post.  However, I have the same exact scenario but I don't want the CARTESIAN join to happen.  I am using SubField but I am getting every combination of rows. I want to transform the data  based on the placement of the comma between Zip and Project into seperate rows.

For example,

Unusable Data:

Snapshot_Test1.PNG

I want to Transform the data to look like this:

Snapshot_Test2.PNG

Any suggestions?

Melinda

0 Likes
MVP & Luminary
MVP & Luminary

melindabruck

See Further variations for DOC-4657: The Secret Life of SubField

LOAD

  customer,

     subfield(zip,',',Iterno()) as zip,

     subfield(project,',',Iterno()) as project

    

INLINE [

customer| zip| project

338| 87124,87125| 1,2

339| 87124,87129,94903| 4,5,6

400| 08902,11742,98246| 7,8,9

] (delimiter is '|')

WHILE iterno() <= SubStringCount(zip,',')+1

;

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

Not applicable

Thanks Rob!  Works perfectly.

Melinda

0 Likes
Not applicable

nice message for beginners

0 Likes
meet_purushu
New Contributor III

This is very useful..

0 Likes
Version history
Revision #:
1 of 1
Last update:
‎2013-08-29 12:26 PM
Updated by: