Select Query With Three Where Conditions Is Slow, But The Same Query With Any Combination Of Two Of The Three Where Conditions Is Fast
Solution 1:
Try to split the existing SQL in two parts and see what are the execution times for each. This would hopefully give you what part is responsible for the slowness:
part 1:
SELECT table_1.id
FROM table_1
LEFTJOIN table_2
ON (table_1.id = table_2.id)
WHERE table_1.col_condition_1 =0AND table_1.col_condition_2 NOTIN (3, 4)
AND table_2.id isNULL
and part 2 (note the inner join here):
SELECT table_1.id
FROM table_1
JOIN table_2
ON (table_1.id = table_2.id)
WHERE table_1.col_condition_1 = 0AND table_1.col_condition_2 NOTIN (3, 4)
AND table_1.date_col > table_2.date_col
I expect the part 2 would be the one to take longer. In this I think an index on both table_1 and table_2 on date_coll would help.
I don't think the composite index would help at all in your select.
This said it is hard to diagnose why the three conditions together would impact the performance that badly. It seems to be related to your data distribution. Not sure about mySql but in Oracle a statistics collections on those tables would make a difference.
Hope it helps.
Solution 2:
OR
is a performance killer.- Sometimes using
UNION
instead ofOR
can speed up the query. - Perhaps in one case the 5000 were "near the beginning" of the combined tables, but not in the other case.
- Using
LIMIT
withoutORDER BY
is dubious. - Since a PK is a Unique key, it is redundant to also declare
id_UNIQUE
. INDEX(a)
is unnecessary when you also haveINDEX(a,b)
.- If there are only 4 values,
IN (1, 2)
might be faster thanNOT IN (3, 4)
. - It is unusual to have two tables sharing the same PK. Why do you have a 1:1 relationship?
- We might have further insight if we could see the real column names.
Solution 3:
Problems like this tend to require trying things and testing to see how well they work.
As such, start with this:
SELECT
table_1.id
FROM
table_1
LEFTJOIN table_2
ON table_1.id = table_2.id
AND table_1.date_col <= table_2.date_col
WHERE
table_1.col_condition_1 =0AND table_1.col_condition_2 NOTIN (3, 4)
AND table_2.id isNULL
LIMIT 5000;
Logical reasoning on why this is equivalent to your query:
Your original query's WHERE statement of (table_2.id is NULL OR table_1.date_col > table_2.date_col)
can be summarized as "Only include table_1 records that either do NOT have a table_2 record, or where the table_2 record is earlier than (or equal to) the table_1 record.
My version of the query uses an anti-join to exclude all table_1 records where they exists a table_2 that is earlier than (or equal to) the table_1 record.
Indexes
There are a number of possible composite indexes that may help this query. Here are a couple to start with:
For table_2: (id,date_col)
For table_1: (col_condition_1,id,date_col,col_condition_2)
Please try my query and indexes, and report the results (including EXPLAIN plan).
Post a Comment for "Select Query With Three Where Conditions Is Slow, But The Same Query With Any Combination Of Two Of The Three Where Conditions Is Fast"