9 Replies Latest reply: Nov 3, 2015 8:14 AM by Paolo Ferri RSS

    Count the duplicate values in a column and assign the count to a new column

      Hi Guys I have a very simple question, I don't manage to find an easy tutorial, maybe because I am a very beginner


      Context: I have a website where people can answer a simple question,
      The correct answers are

       

      1) 28
      2) Javascript
      3) Java


      Every user that wants to answer has to type in his email address as well, and every user can reply as many time as he wants.
      Therefore, I have a table where many users gave 1 or several answers, until they get the correct one.

       

      This is my file, uploaded with the data load editor

       

      LOAD

          date_submitted,

          time_submitted,

          name,

          email as "Email Address", //renaming

          your_answer,

          if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded", 

          utm_source,

          utm_medium,

          page_name,

          page_url,


      As you can see I have recoded the answers and put them in a dedicated column.

       

      What I have now is a file whit many (duplicated) IDs and all the answers (and other columns)

       

      What I want to do:

       

      1) Count the Email Addresses, create a new column called ''Answers Given'' where
      - If the Emaiil Address is unique --> Put 1
      - if the Email Address is repeated --> Put 2 or put the count of the number that that specific person answered


      2) Scan the column ''Answers Given'' and create a new column called ''Customers' Level''
      - if in answers given there is 1 --> Put ''Very Good''
      - if in answers given there is 2 (or the count) --> Put ''Not so good''

       

      Any help would be much appreciated, it looks like a simple question, but I have been researching all day long on Google and didn't find anything similar, I tried to code it on my own but I keep on getting syntax error or similar.

       

      Many thanks!

       

      Paolo

        • Re: Count the duplicate values in a column and assign the count to a new column
          Oleg Troyansky

          Hi Paolo,

           

          it looks like you need to aggregate your data in subsequent RESIDENT load. Then, you can test the number of answers given to determine if it's good or not so good. That can be done in a preceding load. Something like this:

           

          load

               *,

               if(AnswersCount = 1, 'Good', 'Not good') as result

          ;

           

          load

               name,

               count(answers) as AnswersCount

          resident

               Table1

          group by

               name

          ;

           

          You may also need to enhance this logic to only assign "good" if the single answer was correct, as opposed to wrong?

           

          cheers,

          Oleg Troyansky

          QlikView Your Business: An expert guide to Business Discovery with QlikView and Qlik Sense

            • Re: Count the duplicate values in a column and assign the count to a new column

              Hi Oleg,
              Thanks for your reply,

               

              I have a few questions though:
              1) what do you mean with

              load

                   *,
              Do I have to put something instead of * ? I see that many people use it but I guess that its' a convention for something else, right?

               

              2) What is a RESIDENT load?

               

              3) when you write Table1 you mean the name of my table? I have uploaded the table in the data load editor therefore I don't know how to name it (I did not upload the whole table in the data script)

               

              Many thanks

               

              Paolo

                • Re: Count the duplicate values in a column and assign the count to a new column
                  Simen Kind Gulbrandsen

                  * means that you read all the fields in the table;.

                   

                   

                  Resident load means that you read from a table that is already in memory.

                   

                  when you load your table you can name it with the convention:

                   

                   

                  Table1:

                  Load Field1,

                  Field2

                  etc...

                  From File....;

                   

                   

                  then when you load Resident Table1 you will read from the previously loaded table.

                   

                   

                  If you want the answercount on all rows with the same e-mailaddress I would write it like this:

                   

                  Answers:

                  LOAD

                      date_submitted,

                      time_submitted,

                      name,

                      email as "Email Address", //renaming

                      your_answer,

                      if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded",

                      utm_source,

                      utm_medium,

                      page_name,

                      page_url,

                  from File..... // your file here.

                   

                   

                  left join(Answers) //Join adds the result to all matching [Email Address]'es

                  load [Email Address], count(your_answer) as [Answers given], if(count(your_answer) =1, 'Very Good', 'Not So Good') as [Customers Level]

                  resident Answers

                  group by [Email Address];

                    • Re: Count the duplicate values in a column and assign the count to a new column

                      Hi Simen,
                      Thanks for your explainations, they gave me a better understanding of the functions.
                      I had to play around quite a lot and I just managed to do the first part (count the repeated values and create a new column with the count associated to the number of answers that each client has given)

                       

                      Here is my code

                       

                      // All Javascript Leads

                         

                          LOAD

                          date_submitted,

                          time_submitted,

                          email as "Email Address", //renaming

                          your_answer,

                          if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded", 

                          utm_source,

                          page_variant_name,

                          untitled_menu,

                          utm_content

                          FROM [lib://Documents/Geektastic\All Javascript Leads (p).csv]

                      (txt, codepage is 1252, embedded labels, delimiter is ',', msq);

                       

                      LOAD

                      [Email Address],

                      Count([Email Address]) as [Answers Given]

                      RESIDENT Data

                      Group by [Email Address];

                       

                      1st Question)

                       

                      The code works for almost 90% of my dataset
                      - There is an email address that gave 6 answers and I can see 63 in the column ''answers given''
                      - There is an email address that gave 3 answers and I see 5 in the column ''answers given''

                      - some other uncorrect values like 49, 16, 43..

                       

                      Any Idea of the possible reason behind this?

                       

                      2nd Question)

                       

                      Why should I LOAD the email address column, countm put resident data and group
                      Instead of writing simply in the code something like

                       

                       

                      LOAD

                          date_submitted,

                          time_submitted,

                          email as "Email Address", //renaming

                      .....

                      ...

                      Count([Email Address]) as [Answers Given]

                      Group by [Email Address];

                      ....

                          utm_content

                          FROM [lib://Documents/Geektastic\All Javascript Leads (p).csv]

                      (txt, codepage is 1252, embedded labels, delimiter is ',', msq);

                       

                      What's the difference?

                       

                      Thank you in advance,

                       

                      Best,

                       

                      Paolo

                    • Re: Count the duplicate values in a column and assign the count to a new column
                      Oleg Troyansky

                      Hi Paolo,

                       

                      Simen kindly answered your questions to you. I just wanted to add that, based on your questions, you need to cover some basic QlikView education. Ideally, it would be a beginners' class, but if you can't get access to it, then at least a good book. There is a number of good QlikView books on Amazon, including QlikView 11 for Developers and my new book QlikView Your Business. It will be hard for you to develop QlikView applications without covering the basics first.

                       

                      cheers,

                      Oleg Troyansky

                  • Re: Count the duplicate values in a column and assign the count to a new column
                    jagan mohan rao appala

                    Hi,

                     

                    Try like this

                     

                    Data_Temp:

                    LOAD

                        date_submitted,

                        time_submitted,

                        name,

                        email as "Email Address", //renaming

                        your_answer,

                        if(match(your_answer, 28, '#javascript', '#java'), 'CORRECT', 'WRONG') as "answers recoded",

                        utm_source,

                        utm_medium,

                        page_name,

                        page_url

                    FROM DAtaSource;

                     

                    LEFT JOIN(Data_Temp)

                    LOAD

                         [Email Address],

                         Count([Email Address]) AS [Answers Given]

                    RESIDENT Data_Temp

                    GROUP BY [Email Address];


                    Data:

                    LOAD

                    *,

                    If([Answers Given] = 1, 'Very Good', 'Not so good') AS Customers' Level

                    RESIDENT Data_Temp;

                     

                    DROP TABLE Data_Temp;

                     

                    Hope this helps you.

                     

                    Regards,

                    Jagan.

                      • Re: Count the duplicate values in a column and assign the count to a new column

                        Hi Jagan,

                        Thanks for your reply, I have a few questions:

                         

                         

                        1) When you write ''FROM DAtaSource'', what do you mean?
                        I have uploaded my table with the data load editor without scripting, so I don't know how to refer to my datasource file (the script is automatically generated)

                         

                        2) when you say RESIDENT TABLE and DROP TABLE you mean that you are building temporary tables to add the cells to my original sourcfile?

                         

                        3)

                        LEFT JOIN(Data_Temp)

                        LOAD

                             [Email Address],

                             Count([Email Address]) AS [Answers Given]

                        RESIDENT Data_Temp

                        GROUP BY [Email Address];


                        Data:  // should I write Data like this or it is something that you wrote as naming convention?

                        LOAD

                        *,  // does the * means * or something else?

                        If([Answers Given] = 1, 'Very Good', 'Not so good') AS Customers' Level

                        RESIDENT Data_Temp;

                         

                        DROP TABLE Data_Temp;

                         

                        As you can see, I have started using Qlick a few days ago, I have programming knowledge but I haven't found a good tutorial source yet

                         

                        Many Thanks,

                         

                        Paolo