4 Replies Latest reply: Jun 10, 2011 3:49 PM by einarjba RSS

    10 random records for each month

      I have loaded a table

       

      Month        Records  

          

      jan  2011     200

      feb  2011     215

      mar 2011    194

      apr 2011     180

       

       

      I want to load only 10 random records from each month.. Can this be done in the script?

       

      thanks in advance

       

      einar

        • 10 random records for each month
          Matthew Crowther

          I can't instantly think of a simple solution but here is a general suggestion:

           

          As you're wanting 10 records per month within the data you're probably going to have to create a loop in the script to go through each month seperately.

           

          1. Load the entire dataset and add a rand() function to each record, sort the table by Month then your new random column.

          2. Generate a Temp table containg all the possible months to control your loop.

          3. Loop through each Month and perform a Resident > concatenate load for each month limiting the load each time with a Where clause (recno() <=10 ?).

          4. You should now have a table containing the data you need, you can drop the previous tables.

           

          The above is unchecked so you may need to adjust slightly, hopefully it will provide you with a starting point.

           

          All the best,

           

          Matt - Visual Analytics Ltd

            • Re: 10 random records for each month
              Stephen Redmond

              Something like this (using Sample though instead of Rand):

               

               

              Orders_Temp:

              LOAD CustomerID,

                  EmployeeID,

                  Freight,

                  OrderDate,

                  Floor(MonthStart(OrderDate)) As MonthKey,

                  OrderID,

                  ShipperID;

              SQL SELECT *

              FROM Orders;

               

               

              Orders:

              NoConcatenate

              Load *

              Resident Orders_Temp

              Where 1=0;

               

               

              Month_Temp:

              Load

                        Distinct MonthKey as MonthList

              Resident Orders_Temp Order by MonthKey;

               

               

              For i = 0 to FieldValueCount('MonthKey')-1

               

               

                        vMonth = Peek('MonthList', $(i), 'Month_Temp');

               

               

                        TRACE $(i)  $(vMonth);

               

               

                        Orders_Temp2:

                        Sample 0.2

                        Load CustomerID,

                            EmployeeID,

                            Freight,

                            OrderDate,

                            OrderID,

                            ShipperID

                        Resident Orders_Temp

                        Where  MonthKey = $(vMonth)

                        AND RowNo() < 10

                        ;

               

               

                        Concatenate (Orders)

                        Load * Resident Orders_Temp2;

               

                        Drop Table Orders_Temp2;

               

              Next

               

               

              Drop Table Orders_Temp;

                • 10 random records for each month

                  Check out the sample statement from the QV reference manual.  You'll probably need to use variables or something to get the first 100 or you might be able to use the FIRST statement if it spits the table out randomly but off the top of my head I don't know if it does.

                   

                   

                  Sample

                  The sample prefix is used to load a random sample of records from the input

                  table.

                  The syntax is:

                  SCRIPT SYNTAX

                  sample p ( loadstatement | selectstatement )

                  where

                  p is an arbitrary expression which evaluates to a number larger than

                  0 and lower or equal to 1. The number indicates the probability for a

                  given record to be read.

                  Examples:

                  Sample 0.15 Select * from Longtable;

                  Sample 0.15 Load * from Longtab.csv;

                  The load statement syntax is described on page 309. The select statement

                  syntax is described on page 329.