Skip to content Skip to sidebar Skip to footer

Google Bigquery: Rolling Count Distinct

I have a table with is simply a list of dates and user IDs (not aggregated). We define a metric called active users for a given date by counting the distinct number of IDs that ap

Solution 1:

BigQuery documentation claims that count(distinct) works as a window function. However, that doesn't help you, because you are not looking for a traditional window frame.

One method would adds a record for each date after a visit:

select theday, count(distinct visid)
from (select date_add(u.day, n.n, "day") as theday, u.visid
      from daily_users u crossjoin
           (select1as n unionallselect2unionall . . .
            select45
           ) n
     ) u
groupby theday;

Note: there may be simpler ways to generate a series of 45 integers in BigQuery.

Solution 2:

Below should work with BigQuery

#legacySQL
SELECTday, active_users FROM (
  SELECTday, 
    COUNT(DISTINCT id) 
      OVER (ORDERBY ts RANGEBETWEEN45*24*3600 PRECEDING ANDCURRENTROW) AS active_users
  FROM (
    SELECTday, id, TIMESTAMP_TO_SEC(TIMESTAMP(day)) AS ts 
    FROM daily_users
  )
) GROUPBY1, 2ORDERBY1

Above assumes that day field is represented as '2016-01-10' format. If it is not a case , you should adjust TIMESTAMP_TO_SEC(TIMESTAMP(day)) in most inner select

Also please take a look at COUNT(DISTINC) specifics in BigQuery

Update for BigQuery Standard SQL

#standardSQL
SELECTday, 
  (SELECTCOUNT(DISTINCT id) FROMUNNEST(active_users) id) AS active_users
FROM (
  SELECTday, 
    ARRAY_AGG(id) 
      OVER (ORDERBY ts RANGEBETWEEN3888000 PRECEDING ANDCURRENTROW) AS active_users
  FROM (
    SELECTday, id,  UNIX_DATE(PARSE_DATE('%Y-%m-%d', day)) *24*3600AS ts 
    FROM daily_users
  )
) 
GROUPBY1, 2ORDERBY1

You can test / play with it using below dummy sample

#standardSQL
WITH daily_users AS (
  SELECT1AS id, '2016-01-10'ASdayUNIONALLSELECT2AS id, '2016-01-10'ASdayUNIONALLSELECT1AS id, '2016-01-11'ASdayUNIONALLSELECT3AS id, '2016-01-11'ASdayUNIONALLSELECT1AS id, '2016-01-12'ASdayUNIONALLSELECT1AS id, '2016-01-12'ASdayUNIONALLSELECT1AS id, '2016-01-12'ASdayUNIONALLSELECT1AS id, '2016-01-13'ASday
)
SELECTday, 
  (SELECTCOUNT(DISTINCT id) FROMUNNEST(active_users) id) AS active_users
FROM (
  SELECTday, 
    ARRAY_AGG(id) 
      OVER (ORDERBY ts RANGEBETWEEN86400 PRECEDING ANDCURRENTROW) AS active_users
  FROM (
    SELECTday, id,  UNIX_DATE(PARSE_DATE('%Y-%m-%d', day)) *24*3600AS ts 
    FROM daily_users
  )
) 
GROUPBY1, 2ORDERBY1

Post a Comment for "Google Bigquery: Rolling Count Distinct"