Skip to content Skip to sidebar Skip to footer

Filtering Mysql Query Result Set To Yield Multiple Occurences Within A Specific Period Of Time

Apologies in advance if this is a common question, I tried researching it but can't seem to find something that fits. I have a query that pulls data the way I like but would like t

Solution 1:

If we want to filter out the rows where there aren't at least four preceding rows within the past 60 seconds, assuming that dateTimeOrigination is integer type, a 32-bit unix-style timestamp, we can do something like this:

SELECT FROM_UNIXTIME(r.dateTimeOrigination) AS dateTimeOrigination
     , r.callingPartyNumber
     , r.originalCalledPartyNumber
     , r.finalCalledPartyNumber
     , r.duration
     , r.origDeviceName
     , r.destDeviceName
  FROM cdr_records r
 WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')AND r.dateTimeOrigination  < UNIX_TIMESTAMP('2016-05-21')AND r.callingPartyNumber NOTLIKE'b00%'AND r.originalCalledPartyNumber NOTLIKE'b00%'AND r.finalCalledPartyNumber NOTLIKE'b00%'AND ( SELECT COUNT(1)
           FROM cdr_records c
          WHERE c.originalCalledPartyNumber = r.originalCalledPartyNumber
            AND c.dateTimeOrigination       > r.dateTimeOrigination - 60AND c.dateTimeOrigination      <= r.dateTimeOrigination
       ) > 4ORDERBY r.originalCalledPartyNumber
     , r.dateTimeOrigination

NOTE: For performance, we prefer to have predicates on bare columns.

With a form like this, with the column wrapped in an expression:

WHERE FROM_UNIXTIME(r.dateTimeOrigination) LIKE'2016-05-20%'

MySQL will evaluate the function for every row in the table, and then compare the return from the function to the literal.

With a form like this:

WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')
   AND r.dateTimeOrigination  < UNIX_TIMESTAMP('2016-05-21')

MySQL will evaluate the expressions on the right side one time, as literals. Which allows MySQL to make effective use of a range scan operation on a suitable index.

FOLLOWUP

For best performance of the outer query, the best index would likely be an index with leading column of dateTimeOrigination, preferably containing

...ON cdr_records (dateTimeOrigination
    ,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber)

For best performance, a covering index, to avoid lookups to the pages in the underlying table. For example:

...ON cdr_records (dateTimeOrigination
    ,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber
    ,duration,origDeviceName,destDeviceName)

With that, we'd expect EXPLAIN to show "Using index".

For the correlated subquery, we'd want an index with leading columns like this:

...ON cdr_records (originalCalledPartyNumber,dateTimeOrigination)

I strongly recommend you look at the output from EXPLAIN to see which indexes MySQL is using for the query.

Post a Comment for "Filtering Mysql Query Result Set To Yield Multiple Occurences Within A Specific Period Of Time"