Google Bigquery: Rolling Count Distinct
I have a table with is simply a list of dates and user IDs (not aggregated). We define a metric called active users for a given date by counting the distinct number of IDs that ap
Solution 1:
BigQuery documentation claims that count(distinct)
works as a window function. However, that doesn't help you, because you are not looking for a traditional window frame.
One method would adds a record for each date after a visit:
select theday, count(distinct visid)
from (select date_add(u.day, n.n, "day") as theday, u.visid
from daily_users u crossjoin
(select1as n unionallselect2unionall . . .
select45
) n
) u
groupby theday;
Note: there may be simpler ways to generate a series of 45 integers in BigQuery.
Solution 2:
Below should work with BigQuery
#legacySQL
SELECTday, active_users FROM (
SELECTday,
COUNT(DISTINCT id)
OVER (ORDERBY ts RANGEBETWEEN45*24*3600 PRECEDING ANDCURRENTROW) AS active_users
FROM (
SELECTday, id, TIMESTAMP_TO_SEC(TIMESTAMP(day)) AS ts
FROM daily_users
)
) GROUPBY1, 2ORDERBY1
Above assumes that day
field is represented as '2016-01-10' format.
If it is not a case , you should adjust TIMESTAMP_TO_SEC(TIMESTAMP(day))
in most inner select
Also please take a look at COUNT(DISTINC) specifics in BigQuery
Update for BigQuery Standard SQL
#standardSQL
SELECTday,
(SELECTCOUNT(DISTINCT id) FROMUNNEST(active_users) id) AS active_users
FROM (
SELECTday,
ARRAY_AGG(id)
OVER (ORDERBY ts RANGEBETWEEN3888000 PRECEDING ANDCURRENTROW) AS active_users
FROM (
SELECTday, id, UNIX_DATE(PARSE_DATE('%Y-%m-%d', day)) *24*3600AS ts
FROM daily_users
)
)
GROUPBY1, 2ORDERBY1
You can test / play with it using below dummy sample
#standardSQL
WITH daily_users AS (
SELECT1AS id, '2016-01-10'ASdayUNIONALLSELECT2AS id, '2016-01-10'ASdayUNIONALLSELECT1AS id, '2016-01-11'ASdayUNIONALLSELECT3AS id, '2016-01-11'ASdayUNIONALLSELECT1AS id, '2016-01-12'ASdayUNIONALLSELECT1AS id, '2016-01-12'ASdayUNIONALLSELECT1AS id, '2016-01-12'ASdayUNIONALLSELECT1AS id, '2016-01-13'ASday
)
SELECTday,
(SELECTCOUNT(DISTINCT id) FROMUNNEST(active_users) id) AS active_users
FROM (
SELECTday,
ARRAY_AGG(id)
OVER (ORDERBY ts RANGEBETWEEN86400 PRECEDING ANDCURRENTROW) AS active_users
FROM (
SELECTday, id, UNIX_DATE(PARSE_DATE('%Y-%m-%d', day)) *24*3600AS ts
FROM daily_users
)
)
GROUPBY1, 2ORDERBY1
Post a Comment for "Google Bigquery: Rolling Count Distinct"