Solved: Re: Randomly populate category values in data base... - Qlik Community

rob_vander · ‎2024-11-28

Hi All,

I have dataset like below

ID	Date	Value
1	01/01/2024	449
2	02/01/2024	76
3	03/01/2024	211
4	04/01/2024	402
5	05/01/2024	197
6	06/01/2024	485
7	07/01/2024	168
8	08/01/2024	55
9	09/01/2024	41
10	10/01/2024	154
11	11/01/2024	391

Within this dataset I want to create a Category column in which random values should be populated. There will be fixed three values High,Medium & Low. High values in data should 50%, Medium values should be 30% and Low values should be 20%.

In the above dataset, I have 11 records, so High values should be 50% of 11 which is 6 (rounding), Medium values should be 30% of 11 which is 3 and Low values should be 20% of 11 which is 2

So output should be below. Note that values are not required to be populated as shown as it could be random(anywhere). Only important point is that High values should be 6, Medium=3 & Low=2 based on records and required % of values

ID	Date	Value	Category
1	01/01/2024	449	High
2	02/01/2024	76	High
3	03/01/2024	211	Low
4	04/01/2024	402	High
5	05/01/2024	197	Medium
6	06/01/2024	485	Low
7	07/01/2024	168	High
8	08/01/2024	55	High
9	09/01/2024	41	High
10	10/01/2024	154	Medium
11	11/01/2024	391	Medium

Note: Although Number of Category values remain same, I want them on different position randomly on every reload.

maxgro · ‎2024-11-28

T:

Load * inline [

ID, Date, Value,

1, 01/01/2024, 449

2, 02/01/2024, 76

3, 03/01/2024, 211

4, 04/01/2024, 402

5, 05/01/2024, 197

6, 06/01/2024, 485

7, 07/01/2024, 168

8, 08/01/2024, 55

9, 09/01/2024 ,41

10, 10/01/2024 ,154

11, 11/01/2024 ,391

];

// first solution

LEFT JOIN (T) LOAD

ID,

if(rand()<= .5, 'High', if(rand()<= (0.5 + 0.3), 'Medium', 'Low')) as Category

Resident T;

// second solution, more accurate

LET vNum = NoOfRows('T');

LET vNumHigh = round(vNum * 0.5);

LET vNumMedium = round(vNum * 0.3);

LET vNumLow = vNum - vNumHigh - vNumMedium;

LEFT JOIN (T) LOAD

ID,

rand() as Sort

Resident T;

U:

NOCONCATENATE LOAD

*,

rowno(),

if(rowno()<= $(vNumHigh), 'High', if(rowno()<= ($(vNumHigh) + $(vNumMedium)), 'Medium', 'Low')) as Category2

Resident T

order by Sort;

DROP TABLE T;

View solution in original post

Kushal_Chawda · ‎2024-11-28

@rob_vander In that case you can try below

Data:
LOAD *,
     Rand() as RandomKey
INLINE [
ID	Date	Value
1	01/01/2024	449
2	02/01/2024	76
3	03/01/2024	211
4	04/01/2024	402
5	05/01/2024	197
6	06/01/2024	485
7	07/01/2024	168
8	08/01/2024	55
9	09/01/2024	41
10	10/01/2024	154
11	11/01/2024	100
12	09/01/2024	41
13	10/01/2024	154
14	11/01/2024	100
15	10/01/2024	154

](delimiter is '\t');

let vRecords =NoOfRows('Data');
Let vHighRecords = round(vRecords*0.5);
Let vMediumRecords = round(vRecords*0.3);
Let vLowRecords = round(vRecords*0.2);

Final:
NoConcatenate
Load *,
     if(RowNo()<=$(vHighRecords),'High',
     if(RowNo()<=$(vHighRecords)+$(vMediumRecords),'Medium',
     if(RowNo()<=$(vRecords),'Low'))) as Category
Resident Data
Order by RandomKey;

Drop Table Data;

View solution in original post

Kushal_Chawda · ‎2024-11-28

@rob_vander try below

Data:
LOAD * INLINE [
ID	Date	Value
1	01/01/2024	449
2	02/01/2024	76
3	03/01/2024	211
4	04/01/2024	402
5	05/01/2024	197
6	06/01/2024	485
7	07/01/2024	168
8	08/01/2024	55
9	09/01/2024	41
10	10/01/2024	154
11	11/01/2024	100
12	09/01/2024	41
13	10/01/2024	154
14	11/01/2024	100
15	10/01/2024	154

](delimiter is '\t');

let vRecords =NoOfRows('Data');
Let vHighRecords = round(vRecords*0.5);
Let vMediumRecords = round(vRecords*0.3);
Let vLowRecords = round(vRecords*0.2);

Final:
NoConcatenate
Load *,
     if(RowNo()<=$(vHighRecords),'High',
     if(RowNo()<=$(vHighRecords)+$(vMediumRecords),'Medium',
     if(RowNo()<=$(vRecords),'Low'))) as Category
Resident Data;

Drop Table Data;

rob_vander · ‎2024-11-28

@Kushal_Chawda Thanks for the response. But here category values will be at fixed position although number of values populated are correct. What I want is values should be at different position on each reload. Could you help me on that part? I have updated original post

maxgro · ‎2024-11-28

T:

Load * inline [

ID, Date, Value,

1, 01/01/2024, 449

2, 02/01/2024, 76

3, 03/01/2024, 211

4, 04/01/2024, 402

5, 05/01/2024, 197

6, 06/01/2024, 485

7, 07/01/2024, 168

8, 08/01/2024, 55

9, 09/01/2024 ,41

10, 10/01/2024 ,154

11, 11/01/2024 ,391

];

// first solution

LEFT JOIN (T) LOAD

ID,

if(rand()<= .5, 'High', if(rand()<= (0.5 + 0.3), 'Medium', 'Low')) as Category

Resident T;

// second solution, more accurate

LET vNum = NoOfRows('T');

LET vNumHigh = round(vNum * 0.5);

LET vNumMedium = round(vNum * 0.3);

LET vNumLow = vNum - vNumHigh - vNumMedium;

LEFT JOIN (T) LOAD

ID,

rand() as Sort

Resident T;

U:

NOCONCATENATE LOAD

*,

rowno(),

if(rowno()<= $(vNumHigh), 'High', if(rowno()<= ($(vNumHigh) + $(vNumMedium)), 'Medium', 'Low')) as Category2

Resident T

order by Sort;

DROP TABLE T;

Kushal_Chawda · ‎2024-11-28

@rob_vander In that case you can try below

Data:
LOAD *,
     Rand() as RandomKey
INLINE [
ID	Date	Value
1	01/01/2024	449
2	02/01/2024	76
3	03/01/2024	211
4	04/01/2024	402
5	05/01/2024	197
6	06/01/2024	485
7	07/01/2024	168
8	08/01/2024	55
9	09/01/2024	41
10	10/01/2024	154
11	11/01/2024	100
12	09/01/2024	41
13	10/01/2024	154
14	11/01/2024	100
15	10/01/2024	154

](delimiter is '\t');

let vRecords =NoOfRows('Data');
Let vHighRecords = round(vRecords*0.5);
Let vMediumRecords = round(vRecords*0.3);
Let vLowRecords = round(vRecords*0.2);

Final:
NoConcatenate
Load *,
     if(RowNo()<=$(vHighRecords),'High',
     if(RowNo()<=$(vHighRecords)+$(vMediumRecords),'Medium',
     if(RowNo()<=$(vRecords),'Low'))) as Category
Resident Data
Order by RandomKey;

Drop Table Data;

marcus_sommer · ‎2024-11-28

To get a more random assigning of the categories against the record-position you may apply an order by against n fields and/or a specialized created one in the origin load, for example per: rand() as SortKey.

Thinkable are also solutions without a record-counting and ordering, maybe with something like:

load *, pick(mod(rowno(), 10) + 1, 'High', 'Medium', 'Low', 'High', 'High', 'Low', ...) as X
from Y;

rob_vander · ‎2024-11-28

@maxgro thanks. second solution works

rob_vander · ‎2024-11-28

@Kushal_Chawda This works

rob_vander · ‎2024-11-28

@marcus_sommer Thanks. But this is just example. In real scenario I could have million records. So it will be difficult with pick match statements.

marcus_sommer · ‎2024-11-28

It's no match() only a pick() which means the rowno() evaluation happens only once and this result picked the referenced value from the list. It should be not slower as the if-loop approaches.

Especially because you have millions of records and all of the approaches are quite simple it wouldn't cost much efforts to test the different suggestions in regard to the run-time within your environment.

Randomly populate category values in data based on percenatge

Data Load Editor

Qlik Sense

Script