Skip to content Skip to sidebar Skip to footer

Sampling Without Replacement With A Different Sample Size Per Group In Sql

Using the provided table I would like to randomly sample users per day. The number of users to be sampled is specified in the to_sample column and it is filled by another query. In

Solution 1:

If I got you correctly, so try next: (actually its a improved solution of @BHouse)

SELECT
    T.user_id,
    T.day_of_week
FROM (
    SELECT
        user_id,
        day_of_week,
        to_sample,
        row_number() OVER (PARTITION BY to_sample ORDERBY randomint(max(user_id) + 1)) AS RN
    FROM
        test
    GROUPBY
        user_id,
        day_of_week,
        to_sample
    ORDERBY
        to_sample
    ) AS T
WHERE
    T.RN <= T.to_sample;

Output example for provided data:

1st execution:

 user_id | day_of_week
---------+-------------
       1 |           1
       3 |           2
       2 |           2

2nd execution:

 user_id | day_of_week
---------+-------------
       1 |           1
       1 |           2
       4 |           2

3rd execution:

 user_id | day_of_week
---------+-------------
       5 |           1
       4 |           2
       2 |           2

So, some randomness is guaranteed.

UPDATE

Or try this:

SELECT
    T.user_id,
    T.day_of_week
FROM (
    SELECT
        user_id,
        day_of_week,
        to_sample,
        row_number() OVER (PARTITION BY to_sample) AS RN,
        randomint(42) AS RANDOM_ORDER /* <<-- here is main problem, number should be >= max(user_id) + 1 */
    FROM
        test
    ORDERBY
        to_sample,
        RANDOM_ORDER
    ) AS T
WHERE
    T.RN <= T.to_sample;

A second option is more faster, but I didn't testes it for critical cases.

Solution 2:

Using random row number , you will get this required sample output

select USER_ID,day_of_week 
from 
(
select user_id,day_of_week, ROW_NUMBER() over (  orderby
user_id) rn from #test where day_of_week =1 
) x where rn =1unionallselect USER_ID,day_of_week 
from 
(
select user_id,day_of_week, ROW_NUMBER() over (  orderby
user_id) rn from #test where day_of_week =2
) x where rn in (3,6)

Post a Comment for "Sampling Without Replacement With A Different Sample Size Per Group In Sql"