Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Splitting multicoded fields

I am looking for a way to split multicoded fields. See an example in the table below. For example, for ID2, we need to enter a value of 1 and avalue of 2.

uniqueid

 

value

 

ID1

 

1

 

ID2

 

1;2

 

ID3

 

1;2;3

 

ID4

 

2;3

 

ID5

 

3

 

I have found two possible solutions, but neither of theseare right for our purposes.

Solution 1: Subfield

Load uniqueid,
    
subfield(value,';') as ValueSubfield

This indeed splits the value field correctly. However, italso adds rows. For example, there is now a row of data for ID2 value=1 and arow for ID2 value=2. This then causes problems for our expressions, because ofduplicate rows for uniqueids. These duplicate rows cause problems for averages,counts etc. It is possible to compensate for this in the expressions, by usingaggr functions, but this is not ideal and is likely to lead to errors at somepoint.

Solution 2: Excel manipulation and Cross table

For the second solution, we manipulated the raw data inExcel, to create separate fields for each value (value1, value2 and value3 intable below). We then create a crosstable in the QlikView script. This solutiongives us the correct data structure. However, it is very time consuming, Ourdata files contain many multicoded fields, and we need a quick and easy way todeal with these, without having to manipulate data outside of QlikView. Perhapsthere is a way of doing this more easily within the QlikView script?

uniqueid

 

value

 

value1

 

value2

 

value3

 

ID1

 

1

 

1

 

 

 

ID2

 

1;2

 

1

 

2

 

 

ID3

 

1;2;3

 

1

 

2

 

3

 

ID4

 

2;3

 

 

2

 

3

 

ID5

 

3

 

 

 

3

 

crosstable (row, ValueCross)

Load uniqueid,

value1,

value2,

value3

How can I split a multicoded field? Is there perhaps a way of using subfield but creating a crosstable (rather than duplicate rows)?

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

Logically speaking, you can either have one row per subvalue or one column per subvalue.  Those are your only options.  The extra subvalues have to go SOMEWHERE. 

However, duplicate rows shouldn't cause problems as long as you put the ID vs. subvalue connection in a SECOND table.  Don't duplicate OTHER data.  So if you have this:

ID  value otherfield
ID1 1     hello
ID2 1;2   goodbye

You want this:

ID  value
ID1 1
ID2 1
ID2 2

ID  otherfield
ID1 hello
ID2 goodbye

You WILL need to do count(distinct ID) instead of count(ID), but most everything else should work fine without changes.  Let me know if you need help transforming the first data structure to the second, but all the clues are already in your initial attempt and in the thread.

View solution in original post

9 Replies
matt_crowther
Luminary Alumni
Luminary Alumni

If your situation is as simplistic as your example (which I doubt) then the following works for me:

Data:

LOAD uniqueid,

     value,

     SubField(value,';',1) as Sub_Value_1,

     SubField(value,';',2) as Sub_Value_2,

     SubField(value,';',3) as Sub_Value_3,

     SubField(value,';',4) as Sub_Value_4

FROM

C:\SubField.xls

(biff, embedded labels, table is Sheet1$);

Obviously this wouldn't work too well if you have hundreds of values - in which case I'd use the same principal in conjunction with a loop based on a count of the max number of ';' in the value field.

Hope that helps,

Matt - Visual Analytics Ltd

Not applicable
Author

Hi Matt,

Thanks for the suggestions, but your solution does not give us what we need.

It produces the following:

uniqueid

 

value

 

Sub_Value_1

 

Sub_Value_2

 

Sub_Value_3

 

Sub_Value_4

 

ID1

 

1

 

1

 

 

 

 

ID2

 

1;2

 

1

 

2

 

 

 

ID3

 

1;2;3

 

1

 

2

 

3

 

 

ID4

 

2;3

 

2

 

3

 

 

 

ID5

 

3

 

3

 

 

 

 

Hpwever, what we are looking for is this. For this example, I split the value field in Excel, and then created a cross table in QlikView. This avoids the creation of duplicate rows (tghe duplicate rows you see in this table do not actually exist in the data). However, this olution is too labour intensive, because we have many projects with many multicoded fields.

In this case, you can select the value '2', and all IDs with that value become selected. You don;t have that option in the solution you suggested.

uniqueid

 

value

 

ValueCross

 

ID1

 

1

 

1

 

ID2

 

1;2

 

1

 

ID2

 

1;2

 

2

 

ID3

 

1;2;3

 

1

 

ID3

 

1;2;3

 

2

 

ID3

 

1;2;3

 

3

 

ID4

 

2;3

 

2

 

ID4

 

2;3

 

3

 

ID5

 

3

 

3

 

To clarify what we're trying to do, I'll give you some example context. The multicoded field is a question in a market research survey. Respondents could tick all that apply. For example, "which of the following websites did you visit today? 1) Google 2) Amazon 3) Ebay". In our QlikView tool we then need to be able to select all respondents who, for example, visited Amazon.

dominicmander
Partner - Creator
Partner - Creator

If your data is as simple as 1;2;3 or 1;3 etc etc then this will do what you need I think ... it will load each entry into two fields, once as is, and once with 'Field_' appended as the field name. You can then use a Generic Load (which is effectively the opposite of a crosstable) to re-assemble the data into the format you need.

TestData:

LOAD * INLINE [

    UniqueID, Value

    ID1, 1

    ID2, 1;2

    ID3, 1;2;3

    ID4, 2;3

    ID5, 1;3

    ID6, 1;2;3;4

    ID7, 3;4

    ID8, 4

];

Data:

LOAD

    UniqueID

RESIDENT TestData;

Generic:

GENERIC

LOAD

    UniqueID,

    'Field_' & Value AS Field,

    Value

;

LOAD

    UniqueID,

    Subfield(Value, ';') AS Value

RESIDENT TestData;

FOR i = 0 to NoOfTables()

  TableList:

  LOAD '[' & TableName($(i)) & ']' as Tablename AUTOGENERATE 1

  WHERE WildMatch(TableName($(i)), 'Generic.*');

NEXT i

FOR i = 1 to FieldValueCount('Tablename')

  LET vTable = FieldValue('Tablename', $(i));

  LEFT JOIN (Data) LOAD * RESIDENT $(vTable);

  DROP TABLE $(vTable);

NEXT i

DROP TABLE TableList;

DROP TABLE TestData;

... I can attach a sample QVW shortly if you need

... I've updated the example above to include the sample data I was testing with in an inline load so you should be able to copy it directly into a new document and see what it's doing.

Message was edited by: DominicMander

dominicmander
Partner - Creator
Partner - Creator

Incidently ... you could replace the piece of script that loads the table I named "Generic" with the following script ...

Generic:

GENERIC

LOAD

    UniqueID,

    'Field_' & Subfield(Value, ';') AS Field,

    1 AS Value

RESIDENT TestData;

... to give yourself a 1 or null flag in each field rather than the actual value. You might find this easier in your front end, depending on what you're wanting to do with the data.

Not applicable
Author

Dominic, it looks like your solution brings us closer to the solution, but I am still not sure how to get all the values into a single field.

This is what I get: I will still need to combine the 'Field' fields into a single field. When I tried to crosstable these fields, I got the problem of duplicate rows again.

uniqueid

 

Field_1

 

Field_2

 

Field_3

 

ID1

 

1

 

 

 

ID2

 

1

 

2

 

 

ID3

 

1

 

2

 

3

 

ID4

 

 

2

 

3

 

ID5

 

 

 

3

 
Anonymous
Not applicable
Author

could do something like this (if the subfield amount is fixed max?)

TEST:

LOAD Unique ID

          , Field_1

RESIDENT Generic;

CONCATENATE (TEST)

LOAD UniqueID

         , Field_2

RESIDENT Generic;

... etc

if the subfields should be flexible ... try doing this with some kind of loop?

dominicmander
Partner - Creator
Partner - Creator

Sorry, I'm not sure I've understood what you want the data to look like when it's finished ... the table you've posted as an example of what you get is what I thought you wanted, but I've obviously missunderstood. Doing a Generic load and then a Crosstable is a little bit like going round in a circle back to where you started.

Do you just want to be able to select any record that has 1 in it, or any record that has 2 in it or any record that has 3 in it?

In which case you don't need any of the above, Generic or Crosstable ... you just do the following.

TestData:

LOAD * INLINE [

    UniqueID, Value

    ID1, 1

    ID2, 1;2

    ID3, 1;2;3

    ID4, 2;3

    ID5, 3

];

Select:

LOAD * INLINE [

    Value, Select

    1, 1

    1;2, 1

    1;3, 1

    1;2;3, 1

    2, 2

    1;2, 2

    2;3, 2

    1;2;3, 2

    3, 3

    1;3, 3

    2;3, 3

    1;2;3, 3

];

Selecting 1, 2 or 3 in the Select filed will filter the UniqueID field to any entries that have what you selected in the Value field ... is this what you need?

johnw
Champion III
Champion III

Logically speaking, you can either have one row per subvalue or one column per subvalue.  Those are your only options.  The extra subvalues have to go SOMEWHERE. 

However, duplicate rows shouldn't cause problems as long as you put the ID vs. subvalue connection in a SECOND table.  Don't duplicate OTHER data.  So if you have this:

ID  value otherfield
ID1 1     hello
ID2 1;2   goodbye

You want this:

ID  value
ID1 1
ID2 1
ID2 2

ID  otherfield
ID1 hello
ID2 goodbye

You WILL need to do count(distinct ID) instead of count(ID), but most everything else should work fine without changes.  Let me know if you need help transforming the first data structure to the second, but all the clues are already in your initial attempt and in the thread.

Not applicable
Author

Thanks John, I am astonished the answer was that simple all along!

I am now using Subfield to split the multicode fields, and put these values in a separate table.

The averages across other fields (in a separate table) continue to work properly. And we can live with having to use Distinct for the uniqueid field.