Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Sample (p) Load - doesn't work

Hi all,

I want to use Sample (p) Load to load in a random sample of records from a table.

According to the help file this should work but it doesn't appear to do anything at all.

I've tried a p value of 0.1 which should load in approx 10% of total records. Other p values ive tried are 0.05, 0.01. None of them appear to make any difference.

Is there another way, or alternatively is there a way to load in say every nth row from the source data?

Thanks

3 Replies
danielrozental
Master II
Master II

Hi Daniel, I was just trying this yesterday an apparently Sample doesn't work when loading QVD files, at least not when being loaded optimized. It will work when loading from a resident table.

Another thing I found is that the parameter is inverted, if you want a 20% sample you need to write Sampl 0.8 load.

Not applicable
Author

Couple of things I have found, if your source data is in a .qvd file then sample wont work. If it is an inline or autogenerated type table it will work. Also if your source is in an .xls file it will work... Finally, it doesn't return an 'exact' 10% sample... The 1st 3 examples below work, the 4th (ZIPtable) returns all records.

Table1:
SAMPLE 0.50
LOAD Rand() AS Random1
AUTOGENERATE 1000000;

Table2:
SAMPLE 0.90
LOAD Rand() AS Random2
AUTOGENERATE 1000000;

sample_residents:
sample .1 LOAD
Resno as sample_res
FROM Feed2.xls
(biff, embedded labels, table is Sheet1$);

ZIPtable:
sample .1 LOAD ZIP as sample_zip,
TYPE as sample_type
FROM ZIP.qvd (qvd);

Hope that helps!

erichshiino
Partner - Master
Partner - Master

Hi,

If your table is not big enough there is a change you will not get any record.

I wrote this piece of code and I get different number of lines in the result when I reload it.

It's ok since p is a probability to get a line.

Test:

LOAD * INLINE [

    id, value

    1, 213

    2, 4

    3, 3

    4, 64

    5, 67

    6, 3

    7, 23

    8, 3

    9, 6

    10, 4

];

NoConcatenate

Sample 0.5 load * resident Test;

drop table Test;

If you want to control how many lines you get you can include rand() as a field in the load. Then, in a second step, you use `where rowno() < x' to get x lines from your table.

Hope this helps,

Erich