Help with Advanced MySQL Optimization -
i've run problem sql query, "failing" (taking long) when tables have more 100k records. should not problem, , thought had covered works pretty fine 50k records.
i'll try brief , clear, i'll start query:
select v.id videos v left join videos_categories vc on v.id = vc.video_id left join categories c on vc.category_id = c.id left join users u on v.user_id = u.id -- irrelevant table. don't pay attention v.status = 1 , (c.status = 1 or c.id null) , (u.status = 1 or u.id null) -- irrelevant group v.id order v.id desc limit 0, 12 --------------------------------------------- **query took 10.8771 sec** (very bad! take 0.1 max) i'm using left joins because don't want restrain results if category doesn't exist. means videos without assigned categories returned.
the idea of tables structure next:
- 'videos' (id pk, +irrelevant fields) table holds +100k records.
- 'videos_categories' (video_id index,category_id index) +600k records - multiple rows per video
- 'categories' (id pk, + irrelevant fields)
- 'users' (id pk, +irrelevant fields) not problem.
---- update july 3 ----
structure tables:
create table `videos` ( -- holding +100k records `id` int(10) unsigned not null auto_increment, `user_id` int(10) unsigned not null default '0', -- irrelevant example `status` tinyint(1) not null default '0', primary key (`id`), key `status` (`status`) -- ... -- irrelevant keys ) engine=myisam default charset=utf8 row_format=dynamic auto_increment=113339 ; create table `videos_categories` ( -- holding +600k records (several categories per video) `video_id` int(10) unsigned not null default '0', `category_id` int(10) unsigned not null default '0', key `video_id` (`video_id`), key `category_id` (`category_id`) ) engine=myisam default charset=utf8 collate=utf8_unicode_ci; categories table has pk id, , irrelevant fields. holds 80 records. users table irrelevant , may ignored. sorry adding in first instance.
---- end of update july 3 ----
this explained result query
id select_type table type possible_keys key key_len ref rows 1 simple v range status status 1 null 112895 using where; using temporary; using filesort 1 simple vc ref video_id video_id 4 v.id 2 1 simple c eq_ref primary primary 4 vc.category_id 1 using 1 simple u eq_ref primary primary 4 v.user_id 1 using i think problem sql engine "using filesort" because it's using 'status' index, instead of v.id. also, it's "using temporary" because of number of records engine has write , in-memory table not enough.
update (july 3): after tests reached conclusion problem of particular query usage of v.status index doesn't @ (98% of videos has status=1)
- question 1: why isn't optimizer using v.id index sort , filter? i'm using order , limit that.
important note: if remove 'v.status=1' filter clause, query takes 0.01 s, , uses v.id (primary) index, solving all.
- question 2: there way force index usage on mysql < 5.0 ?
---- end of update note july 3 ----
to sum up
assuming have relevant indexes covered: how can optimize query, take 0.1 seconds?
i'm pretty sure quite challenge advanced sql administrators , programmers.
given query (somewhat reformatted):
select v.id videos v left join videos_categories vc on v.id = vc.video_id left join categories c on vc.category_id = c.id left join users u on v.user_id = u.id v.status = 1 , v.reported < 10 , (c.status = 1 or c.id null) , (u.status = 1 or u.id null) group v.id order v.id desc limit 0, 12 you have incorrectly characterized tables. said:
- 'videos' (id pk, +irrelevant fields) table holds +100k records.
- 'videos_categories' (video_id index,category_id index) +600k records - multiple rows per video
- 'categories' (id pk, + irrelevant fields)
- 'users' (id pk, +irrelevant fields) not problem.
the cardinalities (row counts) of categories , users informative. more seriously, though, query references:
- videos.status
- videos.reported
- videos.user_id
- categories.status
- users.status
these fields should being mentioned separately irrelevant fields, , indexes on these columns should identified. better provide table schemas used answer query, comment '-- , other irrelevant columns' @ end of each table.
does video_categories table have unique constraint on combined (video_id, category_id) columns? why not?
it isn't clear why videos table has user_id column; looks more there should video_users table (video_id, user_id) columns. however, that's separate discussion. also, not clear why have videos without user id value, left outer join users puzzling too. however, bravely assert not part of problem, take @ word.
left outer join can serious performance inhibitor. might better results union (or might not - union can performance inhibitor!):
select v.id (select v.id, v.user_id, v.status, v.reported videos v join videos_categories vc on v.id = vc.video_id join categories c on vc.category_id = c.id c.status = 1 union select v.id, v.user_id, v.status, v.reported videos v v.id not in (select video_id video_categories) ) l left join users u on l.user_id = u.id l.reported < 10 , (u.status = 1 or u.id null) group l.id order l.id desc limit 0, 12 (the alias 'l' 'list of videos'.) thinking here first half of union deals inner joins, , second half deals videos aren't categorized. however, not in condition performance problem, if there one. come think of it, think 2 lists of videos in union should disjoint, can use union in place of union; can beneficial performance (because avoids duplicate elimination phase).
you usefully push 'l.reported < 10' condition down each half of union (where becomes v.reported < 10) if optimizer not automatically you.
i'm no means confident perform better original, @ least gives ideas mull over.
Comments
Post a Comment