Solved: How to combine several column information into one... - Qlik Community

Report Inappropriate Content · ‎2009-07-17

hi, the subject is hardly explaining itself, so here's the example:

this is the raw data:
LOAD * INLINE [
Date, Market, Media, Source, Male$, Female$
a, b, c, d, 12, 25
a, b, c, e, 54, 24
a, b, f, d, 78, 19
];

as you can see, I have two numeric values here, Male$ and Female$.

Now I want to create another column called "Gender" and then combine $ into one column, so the table would be like

Date, Market, Media, Source, Gender, $

a,b,c,d,M,12

a,b,c,d,F,25

a,b,c,e,M,54

a,b,c,e,F,24

a,b,f,d,M,78

a,b,f,d,F,19

Could anyone please tell me how to do the script to make the data like that?

Thank you!!s

Anonymous · ‎2009-07-17

Use crosstable, like this:


CROSSTABLE (Gender, $, 4)
LOAD * INLINE [
 Date, Market, Media, Source, Male$, Female$
 a, b, c, d, 12, 25
 a, b, c, e, 54, 24
 a, b, f, d, 78, 19];

View solution in original post

Anonymous · ‎2009-07-17

Use crosstable, like this:


CROSSTABLE (Gender, $, 4)
LOAD * INLINE [
 Date, Market, Media, Source, Male$, Female$
 a, b, c, d, 12, 25
 a, b, c, e, 54, 24
 a, b, f, d, 78, 19];

johnw · ‎2009-07-17

Edit: Doh! Beaten to the punch.

I think this would work:

CROSSTABLE (Gender,$,4)
LOAD * INLINE [
Date, Market, Media, Source, M, F
a, b, c, d, 12, 25
a, b, c, e, 54, 24
a, b, f, d, 78, 19
];

Report Inappropriate Content · ‎2009-07-17

Thank you!!

What if I have another kind of value besides $?

For example

[
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
]
;

Can we make it as this?

Date, Market, Media, Source,Gender, $, Score

a, b, c, d, M, 12, 45
a, b, c, d, F, 25, 87
a, b, c, e, M, 54 ,91
a, b, c, e, F, 24, 54
a, b, f, d, M, 78, 42
a, b, f, d, F, 19, 28

Anonymous · ‎2009-07-17

Sorry John

In this case probably this way is easier:


table1:
LOAD * INLINE [
 Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
 a, b, c, d, 12, 25, 45, 87
 a, b, c, e, 54, 24, 91, 54
 a, b, f, d, 78, 19, 42, 28];
 //
 table:
 LOAD
 Date,
 Market,
 Media,
 Source,
 'M' as Gender,
 Male$ as $,
 MaleScore as Score
 RESIDENT table1;
 LOAD
 Date,
 Market,
 Media,
 Source,
 'F' as Gender,
 Female$ as $,
 FemaleScore as Score
 RESIDENT table1;
 //
 drop table table1;

Report Inappropriate Content · ‎2009-07-17

Thank both of you!!

johnw · ‎2009-07-17

A couple problems with a temporary table and three loads is high memory usage and slow performance. I can easily exceed the memory on my machine while testing, and have loads that take over an hour, so that's something I need to be conscious of.

If neither is a problem, I'd definitely use Michael's approach. It's the most straightforward.

But if you do have a memory or processing time problem, here are a couple alternatives. The first might (only might) be faster and might (only might) use less memory. It has no temporary table, and makes due with two loads, though both are more complicated than Michael's, which is why I don't guarantee they'll work faster and with less memory. Might be worth a shot if you have problems, though.

The second approach loads twice from your original data source to avoid a temporary table. Depending on the speed of pulling data from the original data source, that might be faster than the temp table approach, and it will certainly use less memory.

I like options. 🙂

// First alternative

Table:
CROSSTABLE (Gender,Quantities,5)
LOAD
recno() as ID
,Date
,Market
,Media
,Source
,Male$&';'&MaleScore as M
,Female$&';'&FemaleScore as F
INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
];
LEFT JOIN (Table)
LOAD
ID
,subfield(Quantities,';',1) as $
,subfield(Quantities,';',2) as Score
RESIDENT Table
;
DROP FIELDS
ID
,Quantities
;

// Second alternative

LOAD
Date
,Market
,Media
,Source
,'M' as Gender
,Male$ as $
,MaleScore as Score
INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
];
CONCATENATE (Table) LOAD
Date
,Market
,Media
,Source
,'F' as Gender
,Female$ as $
,FemaleScore as Score
INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
];

Report Inappropriate Content · ‎2009-07-22

Thanks John,

you are right, memory is truly a huge concern for me becuase I am processing an enormous dataset.

I tried both of the methods you mentioned, the second one worked, however the first one using crosstable has duplicate records.

In the attached file, you can see that the table on the right hand side includes the record which is supposed to be gender M.

There should only be one unique record as the other table.

I prefer crosstable because it saves time to do several where statement to segment different groups from a huge dataset, which reload would be a problem.

Thank you for your help!!

johnw · ‎2009-07-22

Ah, I see the problem. My recno() as ID was happening BEFORE the crosstable, and therefore used the same ID for both GENDERs, leading to the duplication of records when we did the left join. It can be easily fixed by adding GENDER to the list of fields in the left join.

It probably won't be the approach you want if memory is the main problem, though. I think you'll want to use the second option. I had some script using the second option with a loop through 40 different values, loading from a resident table each time. I converted it to a crosstable load. That improved the execution time significantly (my main problem), but also significantly increased the memory requirements (not a big problem in my case). Testing various approaches with your real data is probably a good idea.

How to combine several column information into one column