5 Replies Latest reply: Jan 13, 2017 8:09 AM by Sunny Talwar RSS

    Summing variable on 2 fields

    Zayd Derweesh

      Hello,

       

      I am trying to sum a variable on 2 fields. It seems to work for one, but for two it does not execute correctly. This works:

       

      Sum:

      LOAD ch_id Sum(trans_amt) as tr_sum Resident flag_13_table Group By ch_id  Order By ch_id;

       

      This also works:

       

      Sum:

      LOAD temp_trans_date Sum(trans_amt) as tr_sum Resident flag_13_table Group By temp_trans_date Order By temp_trans_date ;

       

       

      This does  not:

       

      Sum:

      LOAD temp_trans_date,ch_id, Sum(trans_amt) as tr_sum Resident flag_13_table Group By temp_trans_date,ch_id  Order By temp_trans_date,ch_id;

       

      I get a column of ch_id values and temp_trans_date values, but no sum of trans_amt. How can I fix this?

       

       

      I also want to left join it to my original table on both ch_id and temp_trans_date. What would the code be for this? I've not done it on multiple fields.

       

       

      Thanks!

        • Re: Summing variable on 2 fields
          Sunny Talwar

          That us strange that it doesn't give you a sum, but for left join, you can do this:

           

          Left Join (flag_13_table)

          LOAD temp_trans_date,

               ch_id,

               Sum(trans_amt) as tr_sum

          Resident flag_13_table

          Group By temp_trans_date,ch_id

          Order By temp_trans_date,ch_id;

           

          I think you probably don't need Order By here...

            • Re: Summing variable on 2 fields
              Zayd Derweesh

              Hello,

               

              The reason I need an Order By is because I want it to be ordered first by CH_ID, then by date. It will look like this:

               

              CH_ID      TEMP_TRANS_DATE

              3443           Nov-16

              3443           Dec-16

              3443           Jan-17

              4553           Nov-16

              ....etc.

               

              The reason is that the next step for me is to compare the value of trans_amt for ch_id 3443 in Dec-16 against the value of trans_amt for ch_id 3443 in Nov-16. If it is increased by 25% of more, I need to create a new field that has a '1'. If it is not increased, the new field will have a value of '0'. That is my end goal with this exercise.

               

              Currently when I do the code as it is, it orders by CH_ID correctly, but the temp_trans_date is not ordering correctly - it gives me March, then May, then jumps back and forth in no real order.

               

              Temp_trans_date is a created variable with this code:

               

              (Date(monthstart(trans_date), 'MMM-YY')) as temp_trans_date, /*gives date in MMM-YY format */

               

              The variable trans_date is the original date variable, but it is in the format 12/31/2016 - I need the date to be monthly.

               

               

               

              Do you have any suggestions on how to code this full process? To recap, steps required:

              - Sort data by CH_ID, then date (which is month-year) - with the oldest date at the top

              - Create a field which has a value of 1 if trans_amt for a particular CH_ID on a particular date is 25% greater than the trans_amt for a particular CH_ID the month before it, and '0' if not.

              - There are additional parameters which I have not mentioned, because I think I can figure them out. But if you are able to help that would be fantastic. It is all in the SAS code below.

               

              An additional bonus is to ignore the first month of data in all cases (since there is no previous month to compare it to)... I have no idea how to even begin with that.

               

              For reference, my SAS code that I did this previously is:

               


              proc sql;
                     create table tr_sum_flag1 as select
                     CH_ID, temp_trans_date, flag1, trans_ID,
                     sum(trans_amt) as tr_sum,
                  count(trans_id) as tr_count
                     from combined_pcard_data
                     group by CH_ID, temp_trans_date
                     order by CH_ID, temp_trans_date ;
              quit;

              proc sql noprint;
                 select intnx('day',min(temp_trans_date),0,'E') into: MinDate
                 from tr_sum_flag1;
              quit; /*this code identifies the first month of data, so that flag 1 can be performed on all data AFTER the first month. */


              data tr_flag1;
                set tr_sum_flag1;
                      by CH_ID temp_trans_date ;
                      if first.temp_trans_date then do;
                        if temp_trans_date > &MinDate. AND tr_sum ge 100 AND tr_count ge 10 AND tr_sum ge 1.25*(lag(tr_sum)) then flag1 = 1;
                  else flag1=0 ;
              output;
                     end;
              run;

               

               

               

              Thanks so much for your help - any insight at all that you can provide is much appreciated!

                • Re: Summing variable on 2 fields
                  Sunny Talwar

                  May be something along these lines

                   

                  TempTable:

                  LOAD temp_trans_date,

                      ch_id,

                      Sum(trans_amt) as tr_sum

                  Resident flag_13_table

                  Group By temp_trans_date,ch_id;

                   

                  Left Join (flag_13_table)

                  LOAD *,

                            If(tr_sum > 1.25 * Previous(tr_sum), 1, 0) as Flag1

                  Resident TempTable

                  Order By ch_id, temp_trans_date;

                    • Re: Summing variable on 2 fields
                      Zayd Derweesh

                      Thanks Sunny, that works very well!

                       

                      I tried to replicate this method for another problem I am working on, but without success. I think I may be missing something small. Can you help? This goes directly after the code you provided:

                       

                      tr_sum_flag3:
                      LOAD ch_id,
                      temp_trans_date,
                          merchant,
                          trans_amt,
                          Sum(trans_amt) as tr_sum_merchant

                      Resident flag_13_table
                      Group By ch_id, temp_trans_date;


                      tr_sum_:
                      LOAD temp_trans_date,
                          ch_id,
                          Sum(trans_amt) as tr_sum_,
                          count(trans_amt) as tr_count_

                      Resident tr_sum_flag3
                      Group By ch_id, temp_trans_date;



                      Left Join (flag_13_table)
                      LOAD *,
                                If(tr_sum_ >= 100 AND tr_count_ >= 10 AND (tr_sum_merchant/tr_sum_) >= 0.4, 1, 0) as flag_3_test
                      Resident tr_sum_
                      Order By ch_id, temp_trans_date;

                       

                       

                       

                       

                      I believe the logic is similar, but for some reason it is giving me an error.

                       

                      Thanks!