Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

How to combine several column information into one column

hi, the subject is hardly explaining itself, so here's the example:

this is the raw data:
LOAD * INLINE [
Date, Market, Media, Source, Male$, Female$
a, b, c, d, 12, 25
a, b, c, e, 54, 24
a, b, f, d, 78, 19
];

as you can see, I have two numeric values here, Male$ and Female$.

Now I want to create another column called "Gender" and then combine $ into one column, so the table would be like

Date, Market, Media, Source, Gender, $

a,b,c,d,M,12

a,b,c,d,F,25

a,b,c,e,M,54

a,b,c,e,F,24

a,b,f,d,M,78

a,b,f,d,F,19

Could anyone please tell me how to do the script to make the data like that?

Thank you!!s

1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Use crosstable, like this:


CROSSTABLE (Gender, $, 4)
LOAD * INLINE [
Date, Market, Media, Source, Male$, Female$
a, b, c, d, 12, 25
a, b, c, e, 54, 24
a, b, f, d, 78, 19];


View solution in original post

8 Replies
Anonymous
Not applicable
Author

Use crosstable, like this:


CROSSTABLE (Gender, $, 4)
LOAD * INLINE [
Date, Market, Media, Source, Male$, Female$
a, b, c, d, 12, 25
a, b, c, e, 54, 24
a, b, f, d, 78, 19];


johnw
Champion III
Champion III

Edit: Doh! Beaten to the punch.

I think this would work:

CROSSTABLE (Gender,$,4)
LOAD * INLINE [
Date, Market, Media, Source, M, F
a, b, c, d, 12, 25
a, b, c, e, 54, 24
a, b, f, d, 78, 19
];

Not applicable
Author

Thank you!!

What if I have another kind of value besides $?

For example

[
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
]
;

Can we make it as this?

Date, Market, Media, Source,Gender, $, Score

a, b, c, d, M, 12, 45
a, b, c, d, F, 25, 87
a, b, c, e, M, 54 ,91
a, b, c, e, F, 24, 54
a, b, f, d, M, 78, 42
a, b, f, d, F, 19, 28

Anonymous
Not applicable
Author

Sorry JohnStick out tongue

In this case probably this way is easier:


table1:
LOAD * INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28];
//
table:
LOAD
Date,
Market,
Media,
Source,
'M' as Gender,
Male$ as $,
MaleScore as Score
RESIDENT table1;
LOAD
Date,
Market,
Media,
Source,
'F' as Gender,
Female$ as $,
FemaleScore as Score
RESIDENT table1;
//
drop table table1;


Not applicable
Author

Thank both of you!! Big Smile

johnw
Champion III
Champion III

A couple problems with a temporary table and three loads is high memory usage and slow performance. I can easily exceed the memory on my machine while testing, and have loads that take over an hour, so that's something I need to be conscious of.

If neither is a problem, I'd definitely use Michael's approach. It's the most straightforward.

But if you do have a memory or processing time problem, here are a couple alternatives. The first might (only might) be faster and might (only might) use less memory. It has no temporary table, and makes due with two loads, though both are more complicated than Michael's, which is why I don't guarantee they'll work faster and with less memory. Might be worth a shot if you have problems, though.

The second approach loads twice from your original data source to avoid a temporary table. Depending on the speed of pulling data from the original data source, that might be faster than the temp table approach, and it will certainly use less memory.

I like options. 🙂

// First alternative

Table:
CROSSTABLE (Gender,Quantities,5)
LOAD
recno() as ID
,Date
,Market
,Media
,Source
,Male$&';'&MaleScore as M
,Female$&';'&FemaleScore as F
INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
];
LEFT JOIN (Table)
LOAD
ID
,subfield(Quantities,';',1) as $
,subfield(Quantities,';',2) as Score
RESIDENT Table
;
DROP FIELDS
ID
,Quantities
;

// Second alternative

LOAD
Date
,Market
,Media
,Source
,'M' as Gender
,Male$ as $
,MaleScore as Score
INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
];
CONCATENATE (Table) LOAD
Date
,Market
,Media
,Source
,'F' as Gender
,Female$ as $
,FemaleScore as Score
INLINE [
Date, Market, Media, Source, Male$, Female$, MaleScore, FemaleScore
a, b, c, d, 12, 25, 45, 87
a, b, c, e, 54, 24, 91, 54
a, b, f, d, 78, 19, 42, 28
];

Not applicable
Author

Thanks John,

you are right, memory is truly a huge concern for me becuase I am processing an enormous dataset.

I tried both of the methods you mentioned, the second one worked, however the first one using crosstable has duplicate records.

In the attached file, you can see that the table on the right hand side includes the record which is supposed to be gender M.

There should only be one unique record as the other table.

I prefer crosstable because it saves time to do several where statement to segment different groups from a huge dataset, which reload would be a problem.

Thank you for your help!!

johnw
Champion III
Champion III

Ah, I see the problem. My recno() as ID was happening BEFORE the crosstable, and therefore used the same ID for both GENDERs, leading to the duplication of records when we did the left join. It can be easily fixed by adding GENDER to the list of fields in the left join.

It probably won't be the approach you want if memory is the main problem, though. I think you'll want to use the second option. I had some script using the second option with a loop through 40 different values, loading from a resident table each time. I converted it to a crosstable load. That improved the execution time significantly (my main problem), but also significantly increased the memory requirements (not a big problem in my case). Testing various approaches with your real data is probably a good idea.