2 Replies Latest reply: Jun 4, 2013 3:46 PM by Steve Lord RSS

    Flattening a table too large for transformation, generic loads, and pivot tables

    Steve Lord

      Hi, below is a sample bit of a massive columnar data file I am working with.  Qlikview crashes when I try full generic loads, and doesn't get desired result when I cut generic load down to two of the ten testnames I need.  (Other tables are cut to load only the few fields needed for each.)  Pivot Table objects also crash, and file is one or two million rows so too big for transformation step.

       

      I have tried the resident load/group by technique and get through it with a synthetic key but no other errors.  It doesn't flatten the table as desired though.  I have tried multiple small generic loads of a few fields at a time and crashed QlikView in fantastic ways.  I've tried loading each separately and left joining to no avail.  I am trying to get Systolic and Diastolic on the same row with each other, grouped by ClientAccountName, UserId, TestDate, and Source.  I've used if statements to create separate Systolic and Diastolic fields, and added TestName to the group by fields, and used sum(TestValue) for my aggregation function.  I've tried in-script concatenates, in-object aggr statements, something about changing axes on a straight table...

       

      Syntax has come to be correct in all of these methods, but still I get a choice between crashing Qlikview, Diastolic and Systolic on separate rows with a value in one column and null in the other column, or all of the sys/dia combinations next to all of the dates, or various other weird combinations.

       

      Data

      UserIdTestNameSystolicDiastolicBMITestDateSource
      180Diastolic08004/10/2013User
      180Systolic120004/10/2013User
      180BMI0022.34/10/2013User
      738Diastolic092011/4/2011Verified
      738Systolic1400011/4/2011Verified
      738BMI0033.211/4/2011Verified
      739Diastolic078011/11/2011Verified
      739Systolic1440011/11/2011Verified
      739BMI0024.511/11/2011Verified
      739Diastolic08008/2/2012Verified
      739Systolic128008/2/2012Verified
      739BMI0024.78/2/2012Verified
      740Diastolic078011/4/2011Verified
      740Systolic1500011/4/2011Verified
      740BMI0023.311/4/2011Verified

      Nicest looking most recent failed attempt:

      Biometrics:

      LOAD UserId,

           Source,

           TestName,

           TestDate,

           SUM(if(TestName='Systolic',TestValue)) as Systolic,

           SUM(if(TestName='Diastolic',TestValue)) as Diastolic,

           SUM(if(TestName='BMI',TestValue)) as BMI

      FROM

      [R:\Data Analytics\Sample QVDs\05.28.2013\Biometrics.qvd]

      (qvd)

      WHERE TestName='Systolic' OR TestName='Diastolic' OR TestName='BMI'

      Group By UserId, Source, TestName, TestDate;

       

      Now, could I put something into the return part of my if statements to make it return the testvalue where testname=X?

       

      e.g. if(TestName='Systolic',if(TestName='Diastolic',TestValue)) as Diastolic //to get the Diastolic next to the Systolic?  That formula doesn't actually work because it looks in systolic row to find a diastolic and there is nothing there, but a peek into the diastolic row from systolic or something like that maybe?  (I'd need help on peek function syntax.)  Help?

        • Re: Flattening a table too large for transformation, generic loads, and pivot tables
          Steve Lord

          I DID IT!  I re-posted this every week for the longest and I did it.

           

          First, in the script, I have if(TestName='Diastolic',TestValue,Null()) as Diastolic, and repeated that for Systolic and the other biometric measures to create their fields with if statements.  This makes the TestName values into their own field headers without generic loading.

           

          I actually had SUM(TestValue) in the if statements, and a Group By UserId, Source, TestName, TestDate; but the sum/group by combination in script alone did not get the metrics values onto the same row with each other in my table box.

           

          I then created a straight chart and used Aggr(Sum(Diastolic),UserId,TestDate) and labeled it BPDiastolic which put the Diastolic value on the same row with the Systolic.  When I did the same for BMI and Systolic to test, they both went to the same row (maybe it is a firstsorted within the userid/testdate or something) while leaving two 0s and one value in the other if-created field rows.  I threw out the if-created fields off the straight table after verifying accuracy and left the aggr-created fields, suppressed the null testdates, and everything was nice!

           

          Biometrics:

          LOAD UserId,

               Source,

               TestName,

               TestValue,

               Date(Floor(TestDate)) as TestDate,

               ImportedDate,

               Biometric_Season,

               SUM(if(TestName='Systolic',TestValue,Null())) as Systolic,

               SUM(if(TestName='Diastolic',TestValue,Null())) as Diastolic,

               SUM(if(TestName='BMI',TestValue,Null())) as BMI

          FROM

          [R:\filepath\filename.qvd]

          (qvd)

          WHERE TestName='Systolic' OR TestName='Diastolic' or TestName='BMI'

          Group By UserId, Source, TestName, TestValue, Date(Floor(TestDate)), ImportedDate, Biometric_Season;

           

          1> not sure if SUM and Group By contributed to success of aggr functions in straight table.  Removed them to test, but the network drive is offline temporarily.)  They did not collapse any tables on their own though.  if-created fields definately went into the straight chart aggr expressions and might do to create properly aligned if-created fields back in the script.

           

          EXBMI=aggr(sum(BMI),UserId, TestDate)

          BPSystolic=aggr(sum(Systolic),UserId, TestDate)

          BPDiastolic=aggr(sum(Diastolic),UserId, TestDate)

          BloodPressure=aggr(sum(Systolic),UserId, TestDate)&'/'&aggr(sum(Diastolic),UserId, TestDate)

           

          I may have to do divides and counts and such to get good relative or average values, but the above aggrs get me the raw values all on one row:

           

          ClientAccountNameUserIdTestDateEXBMIBPSystolicBPDiastolicBloodPressureSource
          Test1804/10/2013012080120/80User
          Test73811/4/201145.714092140/92Verified
          Test7398/2/201225.312880128/80Verified
          Test73911/11/201125.714478144/78Verified
          Test74011/4/201134.615078150/78Verified
          Test74311/7/201126.112070120/70Verified

           

          Other awesome implication is that I can collapse some other 20 pie charts of if-created banded metrics down to one spiffy table.  No generic loads, no transformations, no synthetic keys or link tables, no joins, no crashes, no error messages, no big red X's on my tables...  Only suppressed null testdates for folks on users table who didn't have testdates on the biometrics table.

           

          BIFF!