Filtering Mysql Query Result Set To Yield Multiple Occurences Within A Specific Period Of Time
Solution 1:
If we want to filter out the rows where there aren't at least four preceding rows within the past 60 seconds, assuming that dateTimeOrigination is integer type, a 32-bit unix-style timestamp, we can do something like this:
SELECT FROM_UNIXTIME(r.dateTimeOrigination) AS dateTimeOrigination
, r.callingPartyNumber
, r.originalCalledPartyNumber
, r.finalCalledPartyNumber
, r.duration
, r.origDeviceName
, r.destDeviceName
FROM cdr_records r
WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')AND r.dateTimeOrigination < UNIX_TIMESTAMP('2016-05-21')AND r.callingPartyNumber NOTLIKE'b00%'AND r.originalCalledPartyNumber NOTLIKE'b00%'AND r.finalCalledPartyNumber NOTLIKE'b00%'AND ( SELECT COUNT(1)
FROM cdr_records c
WHERE c.originalCalledPartyNumber = r.originalCalledPartyNumber
AND c.dateTimeOrigination > r.dateTimeOrigination - 60AND c.dateTimeOrigination <= r.dateTimeOrigination
) > 4ORDERBY r.originalCalledPartyNumber
, r.dateTimeOrigination
NOTE: For performance, we prefer to have predicates on bare columns.
With a form like this, with the column wrapped in an expression:
WHERE FROM_UNIXTIME(r.dateTimeOrigination) LIKE'2016-05-20%'
MySQL will evaluate the function for every row in the table, and then compare the return from the function to the literal.
With a form like this:
WHERE r.dateTimeOrigination >= UNIX_TIMESTAMP('2016-05-20')
AND r.dateTimeOrigination < UNIX_TIMESTAMP('2016-05-21')
MySQL will evaluate the expressions on the right side one time, as literals. Which allows MySQL to make effective use of a range scan operation on a suitable index.
FOLLOWUP
For best performance of the outer query, the best index would likely be an index with leading column of dateTimeOrigination, preferably containing
...ON cdr_records (dateTimeOrigination
,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber)
For best performance, a covering index, to avoid lookups to the pages in the underlying table. For example:
...ON cdr_records (dateTimeOrigination
,callingPartyNumber,originalCalledPartyNumber,finalCalledPartyNumber
,duration,origDeviceName,destDeviceName)
With that, we'd expect EXPLAIN to show "Using index".
For the correlated subquery, we'd want an index with leading columns like this:
...ON cdr_records (originalCalledPartyNumber,dateTimeOrigination)
I strongly recommend you look at the output from EXPLAIN to see which indexes MySQL is using for the query.
Post a Comment for "Filtering Mysql Query Result Set To Yield Multiple Occurences Within A Specific Period Of Time"