Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Count the duplicate values in a column and assign the count to a new column

Hi Guys I have a very simple question, I don't manage to find an easy tutorial, maybe because I am a very beginner


Context: I have a website where people can answer a simple question,
The correct answers are

1) 28
2) Javascript
3) Java


Every user that wants to answer has to type in his email address as well, and every user can reply as many time as he wants.
Therefore, I have a table where many users gave 1 or several answers, until they get the correct one.

This is my file, uploaded with the data load editor

LOAD

    date_submitted,

    time_submitted,

    name,

    email as "Email Address", //renaming

    your_answer,

    if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded", 

    utm_source,

    utm_medium,

    page_name,

    page_url,


As you can see I have recoded the answers and put them in a dedicated column.

What I have now is a file whit many (duplicated) IDs and all the answers (and other columns)

What I want to do:

1) Count the Email Addresses, create a new column called ''Answers Given'' where
- If the Emaiil Address is unique --> Put 1
- if the Email Address is repeated --> Put 2 or put the count of the number that that specific person answered


2) Scan the column ''Answers Given'' and create a new column called ''Customers' Level''
- if in answers given there is 1 --> Put ''Very Good''
- if in answers given there is 2 (or the count) --> Put ''Not so good''

Any help would be much appreciated, it looks like a simple question, but I have been researching all day long on Google and didn't find anything similar, I tried to code it on my own but I keep on getting syntax error or similar.

Many thanks!

Paolo

9 Replies
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Paolo,

it looks like you need to aggregate your data in subsequent RESIDENT load. Then, you can test the number of answers given to determine if it's good or not so good. That can be done in a preceding load. Something like this:

load

     *,

     if(AnswersCount = 1, 'Good', 'Not good') as result

;

load

     name,

     count(answers) as AnswersCount

resident

     Table1

group by

     name

;

You may also need to enhance this logic to only assign "good" if the single answer was correct, as opposed to wrong?

cheers,

Oleg Troyansky

QlikView Your Business: An expert guide to Business Discovery with QlikView and Qlik Sense

jagan
Luminary Alumni
Luminary Alumni

Hi,

Try like this

Data_Temp:

LOAD

    date_submitted,

    time_submitted,

    name,

    email as "Email Address", //renaming

    your_answer,

    if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded",

    utm_source,

    utm_medium,

    page_name,

    page_url

FROM DAtaSource;

LEFT JOIN(Data_Temp)

LOAD

     [Email Address],

     Count([Email Address]) AS [Answers Given]

RESIDENT Data_Temp

GROUP BY [Email Address];


Data:

LOAD

*,

If([Answers Given] = 1, 'Very Good', 'Not so good') AS Customers' Level

RESIDENT Data_Temp;

DROP TABLE Data_Temp;

Hope this helps you.

Regards,

Jagan.

Not applicable
Author

Hi Oleg,
Thanks for your reply,

I have a few questions though:
1) what do you mean with

load

     *,
Do I have to put something instead of * ? I see that many people use it but I guess that its' a convention for something else, right?

2) What is a RESIDENT load?

3) when you write Table1 you mean the name of my table? I have uploaded the table in the data load editor therefore I don't know how to name it (I did not upload the whole table in the data script)

Many thanks

Paolo

Not applicable
Author

Hi Jagan,

Thanks for your reply, I have a few questions:

1) When you write ''FROM DAtaSource'', what do you mean?
I have uploaded my table with the data load editor without scripting, so I don't know how to refer to my datasource file (the script is automatically generated)

2) when you say RESIDENT TABLE and DROP TABLE you mean that you are building temporary tables to add the cells to my original sourcfile?

3)

LEFT JOIN(Data_Temp)

LOAD

     [Email Address],

     Count([Email Address]) AS [Answers Given]

RESIDENT Data_Temp

GROUP BY [Email Address];


Data:  // should I write Data like this or it is something that you wrote as naming convention?

LOAD

*,  // does the * means * or something else?

If([Answers Given] = 1, 'Very Good', 'Not so good') AS Customers' Level

RESIDENT Data_Temp;

DROP TABLE Data_Temp;

As you can see, I have started using Qlick a few days ago, I have programming knowledge but I haven't found a good tutorial source yet

Many Thanks,

Paolo

simenkg
Specialist
Specialist

* means that you read all the fields in the table;.

Resident load means that you read from a table that is already in memory.

when you load your table you can name it with the convention:

Table1:

Load Field1,

Field2

etc...

From File....;

then when you load Resident Table1 you will read from the previously loaded table.

If you want the answercount on all rows with the same e-mailaddress I would write it like this:

Answers:

LOAD

    date_submitted,

    time_submitted,

    name,

    email as "Email Address", //renaming

    your_answer,

    if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded",

    utm_source,

    utm_medium,

    page_name,

    page_url,

from File..... // your file here.

left join(Answers) //Join adds the result to all matching [Email Address]'es

load [Email Address], count(your_answer) as [Answers given], if(count(your_answer) =1, 'Very Good', 'Not So Good') as [Customers Level]

resident Answers

group by [Email Address];

Not applicable
Author

Hi Simen,
Thanks for your explainations, they gave me a better understanding of the functions.
I had to play around quite a lot and I just managed to do the first part (count the repeated values and create a new column with the count associated to the number of answers that each client has given)

Here is my code

// All Javascript Leads

   

    LOAD

    date_submitted,

    time_submitted,

    email as "Email Address", //renaming

    your_answer,

    if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded", 

    utm_source,

    page_variant_name,

    untitled_menu,

    utm_content

    FROM [lib://Documents/Geektastic\All Javascript Leads (p).csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

LOAD

[Email Address],

Count([Email Address]) as [Answers Given]

RESIDENT Data

Group by [Email Address];

1st Question)

The code works for almost 90% of my dataset
- There is an email address that gave 6 answers and I can see 63 in the column ''answers given''
- There is an email address that gave 3 answers and I see 5 in the column ''answers given''

- some other uncorrect values like 49, 16, 43..

Any Idea of the possible reason behind this?

2nd Question)

Why should I LOAD the email address column, countm put resident data and group
Instead of writing simply in the code something like

LOAD

    date_submitted,

    time_submitted,

    email as "Email Address", //renaming

.....

...

Count([Email Address]) as [Answers Given]

Group by [Email Address];

....

    utm_content

    FROM [lib://Documents/Geektastic\All Javascript Leads (p).csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

What's the difference?

Thank you in advance,

Best,

Paolo

jagan
Luminary Alumni
Luminary Alumni

Refer Qlikview help file or community for help on this keywords.

Regards,

Jagan.

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Paolo,

Simen kindly answered your questions to you. I just wanted to add that, based on your questions, you need to cover some basic QlikView education. Ideally, it would be a beginners' class, but if you can't get access to it, then at least a good book. There is a number of good QlikView books on Amazon, including QlikView 11 for Developers and my new book QlikView Your Business. It will be hard for you to develop QlikView applications without covering the basics first.

cheers,

Oleg Troyansky

Not applicable
Author

Hi Oleg,

Yes I agree with you. I have previous experiences with Java, C++, Mathlab and others, but I find very difficult to get good resources online

I will keep on searching and if I don't find anything, I will buy one of these books.

Many Thanks,

Paolo