8 Replies Latest reply: Jun 11, 2012 4:39 PM by Sergio Alonso Bueno RSS

    two for..next and one while..having with firstsortedvalue for a basket analysis

      Hi everyone,

       

      I need some help to solve the next requirement. I have the Fact aggregate table with fields:

      : yearmonth / market / customer / product / sumSales

       

      For a defined set of the products I need to categorize and classify each and all of the customers sumsales , as what their choice was.

      Need to create for each customer adicional dimension FirstChoice, SecondChoice,Nochoice but at the SCRIPT LEVEL.

      MONTH_YEAR

      MARKET

      custome

      PRODUCT

      MTH_QTY

      201201

      vegtables

      108385

      onion

      5

      201201

      vegtables

      108385

      carrot

      2

      201201

      vegtables

      108393

      tomato

      4

      201201

      vegtables

      108393

      carrot

      3

      201201

      vegtables

      108393

      iogurt

      1

      201201

      vegtables

      108365

      onion

      2

      201201

      vegtables

      108365

      carrot

      6

      201201

      vegtables

      108364

      iogurt

      1

       

      For onion and carrots I need to obtain the next data

      Customer / yearmonth / market/ FirstChoice/ SecondChoice/Nochoice, for the example data

      108385  /201201/ vegetables /onion/carrot/null

      108393 /201201/ vegetables / tomato/ carrot/ null

      108365 / 201201/ vegetables / carrot/ onion/ null

      108364 / 201201 / vegetables / iogurt/null/ 1

       

      Quite complex situation. Any suggestion…?.

      I´ve load on script rank for the consumer but need to run through all the table customer by customer to calculate the new dimensions. Ideas are welcome.

      Thank you very much.

       

      El mensaje fue editado por: SergioABR  Thanks Stephan with your ideas... i´ve arrived to something like..

      //MAPING LOAD  MAP_PROD: MAPPING LOAD * INLINE [ PRODUCT,FLAG CARROT,Y ONIONS,Y ORANGE,N PATATOES,Y TOMATO,N ];

      PRODUCT: LOAD  subfield(K_CUST_PROD,'-',2) AS PRODUCT FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);  CUSTOMER:  MAPPING LOAD  subfield(K_CUST_PROD,'-',1) as CUSTOMER  FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);

      let TotCustomer = noofrows(CUSTOMER)-1;

      FOR i= 0 TO TotCustomer

      LET vCUSTOMER = PEEK(CUSTOMER,$(i),'CUSTOMER') ;     

            let TotPROD = noofrows(PRODUCT)-1;       

              FOR j= 0 TO TotPROD           

               LET VPROD = PEEK(PRODUCT,$(i),'PRODUCT') ;                                                                                                                                                     CUST_POS: 

                                                                           LOAD     PROD&vCUSTOMER&ITERNO()) AS KEY      

                                                                           ,FIRSTSORTEDVALUE(PRODUCT,RANK,1) AS TOP1PROD                                                                 

                                                                            ,FIRSTSORTEDVALUE(RX_MTH_QTY,RANK,1 AS TOP1VALUE  

                                                              ,IF((FIRSTSORTEDVALUE(PRODUCT,RANK,1))=  $(VPROD),'1',  

                                                                                       IF((FIRSTSORTEDVALUE(PRODUCT,RANK,2))=  $(VPROD),'2','NA')) AS PRODPOS                                                                             , IF (SUM(RX_MTH_QTY) > 0,'DEAL','NODEAL') AS STATUS                                                                             GROUP BY CUSTOMER,PROD

      HAVING WILDMATCH (PRODUCT,APPLYMAP('MAP_PROD',PRODUCT),'Y'>0 ;  

             NEXT  NEXT WHILE ITERNO() >=10 AND ITERNO()<=12;

       

      BUT IS NOT WORKING............ I HAVE POSTED TWO FILES. ONE IS THE QVW FILE AND A *.CSV FILE WITH DATA.

       

      Thank you

        • Re: Categorize and classify data at the script level, against dimmension
          Stefan Wühl

          I haven't understood your logic for Nochoice, is this indicating that neither onion nor carrots have been in the basket?

           

          Maybe you could work out a solution using a group by load and FirstSortedValue:

           

          LOAD

          CUSTOMER, MONTH_YEAR, MARKET

          FirstSortedValue( PRODUCT,-MTH_QTY,1) as FirstChoice,

          FirstSortedValue( PRODUCT,-MTH_QTY,2) as SecondChoice,

          ???? as NoChoice

          FROM ..... group by CUSTOMER, MONTH_YEAR, MARKET;

           

          If you could have identical monthly quantities for a given combination of customer, monthyear and market, FirstSortedValue() will return null by default, so you might need to decide how to handle that (there are some methods to get a result even for identical sort order values in FirstSortedValue, e.g. by using DISTINCT qualifier or adding a small number to create a sort preference, like using

          FirstSortedValue(PRODUCT,-(MTH_QTY+AutoNumber(PRODUCT)/10000),1) as FirstChoice,

          when your number of PRODUCTs are much smaller than 10000 and your MTH_QTY are integer).

           

          Hope this helps,

          Stefan

           

           

          • Re: two for..next and one while..having with firstsortedvalue for a basket analysis

            Hello,

             

            I have been working and arrived to  what i am posting. Receiving a Generic tables must contain al lesat 3 fields message ERROR.

             

            //MAPING LOAD
            MAP_PROD:
            MAPPING LOAD * INLINE [
            PRODUCT,FLAG
            CARROT,Y
            ONIONS,Y
            ORANGE,N
            PATATOES,Y
            TOMATO,N
            ];

             

            PRODUCT:
            LOAD
            subfield(K_CUST_PROD,'-',2) AS PRODUCT
            FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);

             

            CUSTOMER:
            MAPPING LOAD
            subfield(K_CUST_PROD,'-',1) as CUSTOMER
            FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);

             

            let TotCustomer = noofrows(CUSTOMER)-1;
            FOR i= 0 TO TotCustomer
            LET vCUSTOMER = PEEK(CUSTOMER,$(i),'CUSTOMER') ;

            let TotPROD = noofrows(PRODUCT)-1;
            FOR j= 0 TO TotPROD
            LET VPROD = PEEK(PRODUCT,$(j),'PRODUCT') ;

            CUST_POS:
            LOAD     
            ($(VPROD)&$(vCUSTOMER)&TIME_ID) AS KEY
            ,FIRSTSORTEDVALUE(PRODUCT,RANK,1) AS TOP1PROD
            ,FIRSTSORTEDVALUE(RX_MTH_QTY,RANK,1 AS TOP1VALUE
            ,IF((FIRSTSORTEDVALUE(PRODUCT,RANK,1))=  $(VPROD),'1',
            IF((FIRSTSORTEDVALUE(PRODUCT,RANK,2))=  $(VPROD),'2','NA')) AS PRODPOS
            , IF (SUM(RX_MTH_QTY) > 0,'DEAL','NODEAL') AS STATUS,
            resident VEG GROUP BY CUSTOMER,PROD,TIME_ID;                                                 
            NEXT
            NEXT.

             

            Need help to adjust the two for..next and while extraction...

            Thank you

              • Re: two for..next and one while..having with firstsortedvalue for a basket analysis
                Stefan Wühl

                The error message was due to your mapping load using only one field. I don't think you want to use a mapping load here?

                 

                I corrected some small syntax errors and the script is now running ... very slowly, due to your loops and embedded load. You might want to start in debug mode and limit the rows returned to some smaller number, maybe some hundred.

                 

                Hope this helps,

                Stefan

                  • Re: two for..next and one while..having with firstsortedvalue for a basket analysis

                    Hi Stefan, thank you very much.

                     

                    Great it is really working but the numbers are atonishing, fot 2835 records, looped throught 2050 customer and 2 products is producing  a 11.623.500 record CUST_POS auxiliar table. Is this how it should be?? 34 min. load ??

                    Am I doing something wrong??

                    As well  having some dificulties with the repeated values on the rank and going to im`plement your advise, it is producing null value for the Top2Prod,top3prod..  IS THE MINUS SIGN required on the RANK FIELD??

                    FirstSortedValue(PRODUCT,-(MTH_QTY+AutoNumber(PRODUCT)/10000),1) as FirstChoice,

                     

                    Should I left  join the CUST_POS table with VEG to clean up the data, and save just one value for the combination of keys ??, because now each record is producing +20 lines on the CUST_POS table, and makes it dificult to understand the output of the CUST_POS table

                     

                    Stefan  is the while interaction required ? As I´ve loaded each combination of customer&product&time???

                     

                    Thank you. Sergio

                     

                    /[code]

                    //just load the products I am interested in..

                     

                    PRODUCT_1:
                    LOAD * INLINE [
                    PRODUCT
                    PATATOES
                    TOMATO
                    ];

                     

                    PRODUCT:
                    NOCONCATENATE LOAD
                    PRODUCT
                    RESIDENT PRODUCT_1;

                     

                    CUSTOMER:
                    LOAD distinct
                    subfield(K_CUST_PROD,'-',1) as CUSTOMER
                    FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);

                     

                    let TotCustomer = noofrows('CUSTOMER')-1;
                    FOR i= 0 TO $(TotCustomer)
                    LET vCUSTOMER = PEEK('CUSTOMER',$(i),'CUSTOMER') ;

                    let TotPROD = noofrows('PRODUCT_1')-1;
                    FOR j= 0 TO $(TotPROD)
                    LET VPROD = PEEK('PRODUCT',$(j),'PRODUCT') ;

                    CUST_POS:
                    LOAD     
                    ('$(VPROD)'&'$(vCUSTOMER)'&TIME_ID) AS KEY
                    ,FIRSTSORTEDVALUE( PRODUCT,RANK,1) AS TOP1PROD
                    ,FIRSTSORTEDVALUE( RX_MTH_QTY,RANK,1) AS TOP1VALUE
                    ,FIRSTSORTEDVALUE( PRODUCT,RANK,2) AS TOP2PROD
                    ,FIRSTSORTEDVALUE( RX_MTH_QTY,RANK,2) AS TOP2VALUE
                    ,IF((FIRSTSORTEDVALUE( PRODUCT,RANK,1))=  '$(VPROD)','1',
                    IF((FIRSTSORTEDVALUE( PRODUCT,RANK,2))=  '$(VPROD)','2','NA')) AS PRODPOS
                    , IF (SUM(RX_MTH_QTY) > 0,'DEAL','NODEAL') AS STATUS
                    resident VEG GROUP BY CUSTOMER,PRODUCT,TIME_ID;                                                 
                    NEXT
                    NEXT
                    DROP TABLE PRODUCT_1;

                    /[code]

                      • two for..next and one while..having with firstsortedvalue for a basket analysis
                        Stefan Wühl

                        Honestly,

                         

                        I don't really know what you want to achieve with your for loops.

                        Since you are already grouping by PRODUCT, something like firstsortedvalue(PRODUCT,RANK) within this grouped load doesn't really make sense.

                         

                        If you could describe your requirements a bit closer, maybe someone can come up with an appropriate solution.

                         

                        Regards,

                        Stefan

                          • Re: two for..next and one while..having with firstsortedvalue for a basket analysis

                            Hi Stephan,

                            Fact original  table contains all the shopping  made by customers and products for each time..

                             

                             

                            The aggregate table postead as vEGETABLE_bASKET.CSV  is the comsumers profile  resume.

                            fOR EXAMPLE ..On ID_TIME 10 FOR THE CUSTOMER 183656 , THE CUSTOMER BOUGHT 2 patatoes and 1 onion.

                            As we want to analyze the behaviour of the consumer for  PATATOES and TOMATO

                             

                             

                            What was his firstchoice (TOP1Prod) and how much (top1value), top2prod,top2value,top3prod,top3value

                            In what position did he bought  the patatoes and tomatos, 1 or 2  (POSPROD)

                            Did he bought PATATOES or TOAMTO on that month ( status = DEAL)

                             

                            At the end I will have for combination of keys the profile of the consumer, classify by the choices he made, what was the first and second.. choice, and if he bought patatoes and tomatos that month...

                            .

                            As i need this auxiliar table to link to the fqact table to allowed the user to interact with the data.. How do I acomplish it...?

                             

                              The consumer can buy up to 5 products but the analysis has to be done  just for the  patatoes and tomato.

                             

                            Sugestions are welcome !!

                            Thank you

                              • Re: two for..next and one while..having with firstsortedvalue for a basket analysis
                                Stefan Wühl

                                Sorry, still no 100% sure how you want to analyze your data i.e. link your VEG to your CUSTPOS table.

                                 

                                Attached is a sample where I link it via CUSTOMER and TIME_ID.

                                 

                                I don't have much time too look into that anymore, hope you get it working from here.

                                 

                                Regards,

                                Stefan

                                  • Re: two for..next and one while..having with firstsortedvalue for a basket analysis

                                    Hi Stefan.

                                     

                                    Perfect...

                                     

                                    Thank you very much.

                                     

                                    I had arrived to a similar solution making it simple and using the AQL. More association and less for..while...

                                     

                                    Glad to had received your time and attention. Really helped !!

                                     

                                    Sergio. Posting YOUR script for the PE users..

                                     

                                    PRODUCT:

                                    LOAD Distinct

                                    subfield(K_CUST_PROD,'-',2) AS PRODUCT

                                    FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);

                                     

                                    CUSTOMER:

                                    LOAD distinct

                                    subfield(K_CUST_PROD,'-',1) as CUSTOMER

                                    FROM [..\vEGETABLES bASKET.csv] (ansi, txt, delimiter is ';', embedded labels, msq);

                                     

                                    For each VPROD in 'PATATOES','TOMATO'

                                     

                                    CUST_POS:

                                    LOAD

                                    '$(VPROD)'&CUSTOMER&TIME_ID as KEY

                                    //('$(VPROD)'&'$(vCUSTOMER)'&TIME_ID) AS KEY

                                    ,FIRSTSORTEDVALUE(PRODUCT,RANK,1) AS TOP1PROD

                                    ,FIRSTSORTEDVALUE(RX_MTH_QTY,RANK,1) AS TOP1VALUE

                                    ,IF((FIRSTSORTEDVALUE(PRODUCT,RANK,1))=  '$(VPROD)','1',

                                    IF((FIRSTSORTEDVALUE(PRODUCT,RANK,2))=  '$(VPROD)','2','NA')) AS PRODPOS

                                    , IF (SUM(RX_MTH_QTY) > 0,'DEAL','NODEAL') AS STATUS

                                    resident VEG GROUP BY CUSTOMER,TIME_ID;

                                    NEXT