3 Replies Latest reply: Jun 29, 2011 1:21 PM by Erich Shiino RSS

    Sample (p) Load - doesn't work

      Hi all,


      I want to use Sample (p) Load to load in a random sample of records from a table.


      According to the help file this should work but it doesn't appear to do anything at all.


      I've tried a p value of 0.1 which should load in approx 10% of total records. Other p values ive tried are 0.05, 0.01. None of them appear to make any difference.



      Is there another way, or alternatively is there a way to load in say every nth row from the source data?




        • Sample (p) Load - doesn't work
          Daniel Rozental

          Hi Daniel, I was just trying this yesterday an apparently Sample doesn't work when loading QVD files, at least not when being loaded optimized. It will work when loading from a resident table.


          Another thing I found is that the parameter is inverted, if you want a 20% sample you need to write Sampl 0.8 load.

          • Re: Sample (p) Load - doesn't work
            Leonard Short

            Couple of things I have found, if your source data is in a .qvd file then sample wont work. If it is an inline or autogenerated type table it will work. Also if your source is in an .xls file it will work... Finally, it doesn't return an 'exact' 10% sample... The 1st 3 examples below work, the 4th (ZIPtable) returns all records.





            SAMPLE 0.50
            LOAD Rand() AS Random1
            AUTOGENERATE 1000000;

            SAMPLE 0.90
            LOAD Rand() AS Random2
            AUTOGENERATE 1000000;

            sample .1 LOAD
            Resno as sample_res
            FROM Feed2.xls
            (biff, embedded labels, table is Sheet1$);



            sample .1 LOAD ZIP as sample_zip,
            TYPE as sample_type
            FROM ZIP.qvd (qvd);





            Hope that helps!

            • Sample (p) Load - doesn't work
              Erich Shiino


              If your table is not big enough there is a change you will not get any record.

              I wrote this piece of code and I get different number of lines in the result when I reload it.

              It's ok since p is a probability to get a line.





              LOAD * INLINE [

                  id, value

                  1, 213

                  2, 4

                  3, 3

                  4, 64

                  5, 67

                  6, 3

                  7, 23

                  8, 3

                  9, 6

                  10, 4




              Sample 0.5 load * resident Test;


              drop table Test;




              If you want to control how many lines you get you can include rand() as a field in the load. Then, in a second step, you use `where rowno() < x' to get x lines from your table.


              Hope this helps,