Sampling Without Replacement With A Different Sample Size Per Group In Sql
Using the provided table I would like to randomly sample users per day. The number of users to be sampled is specified in the to_sample column and it is filled by another query. In
Solution 1:
If I got you correctly, so try next: (actually its a improved solution of @BHouse)
SELECT
T.user_id,
T.day_of_week
FROM (
SELECT
user_id,
day_of_week,
to_sample,
row_number() OVER (PARTITION BY to_sample ORDERBY randomint(max(user_id) + 1)) AS RN
FROM
test
GROUPBY
user_id,
day_of_week,
to_sample
ORDERBY
to_sample
) AS T
WHERE
T.RN <= T.to_sample;
Output example for provided data:
1st execution:
user_id | day_of_week
---------+-------------
1 | 1
3 | 2
2 | 2
2nd execution:
user_id | day_of_week
---------+-------------
1 | 1
1 | 2
4 | 2
3rd execution:
user_id | day_of_week
---------+-------------
5 | 1
4 | 2
2 | 2
So, some randomness is guaranteed.
UPDATE
Or try this:
SELECT
T.user_id,
T.day_of_week
FROM (
SELECT
user_id,
day_of_week,
to_sample,
row_number() OVER (PARTITION BY to_sample) AS RN,
randomint(42) AS RANDOM_ORDER /* <<-- here is main problem, number should be >= max(user_id) + 1 */
FROM
test
ORDERBY
to_sample,
RANDOM_ORDER
) AS T
WHERE
T.RN <= T.to_sample;
A second option is more faster, but I didn't testes it for critical cases.
Solution 2:
Using random row number
, you will get this required sample output
select USER_ID,day_of_week
from
(
select user_id,day_of_week, ROW_NUMBER() over ( orderby
user_id) rn from #test where day_of_week =1
) x where rn =1unionallselect USER_ID,day_of_week
from
(
select user_id,day_of_week, ROW_NUMBER() over ( orderby
user_id) rn from #test where day_of_week =2
) x where rn in (3,6)
Post a Comment for "Sampling Without Replacement With A Different Sample Size Per Group In Sql"