Help with Advanced MySQL Optimization -


i've run problem sql query, "failing" (taking long) when tables have more 100k records. should not problem, , thought had covered works pretty fine 50k records.

i'll try brief , clear, i'll start query:

select     v.id     videos v     left join videos_categories vc on v.id = vc.video_id     left join categories c on vc.category_id = c.id     left join users u on v.user_id = u.id -- irrelevant table. don't pay attention     v.status = 1     , (c.status = 1 or c.id null)     , (u.status = 1 or u.id null) -- irrelevant group v.id order v.id desc limit 0, 12  --------------------------------------------- **query took 10.8771 sec** (very bad! take 0.1 max) 

i'm using left joins because don't want restrain results if category doesn't exist. means videos without assigned categories returned.

the idea of tables structure next:

  • 'videos' (id pk, +irrelevant fields) table holds +100k records.
  • 'videos_categories' (video_id index,category_id index) +600k records - multiple rows per video
  • 'categories' (id pk, + irrelevant fields)
  • 'users' (id pk, +irrelevant fields) not problem.

---- update july 3 ----

structure tables:

create table `videos` ( -- holding +100k records     `id` int(10) unsigned not null auto_increment,     `user_id` int(10) unsigned not null default '0', -- irrelevant example     `status` tinyint(1) not null default '0',     primary key  (`id`),     key `status` (`status`)     -- ... -- irrelevant keys ) engine=myisam  default charset=utf8 row_format=dynamic auto_increment=113339 ;   create table `videos_categories` (  -- holding +600k records (several categories per video)     `video_id` int(10) unsigned not null default '0',     `category_id` int(10) unsigned not null default '0',     key `video_id` (`video_id`),     key `category_id` (`category_id`) ) engine=myisam default charset=utf8 collate=utf8_unicode_ci; 

categories table has pk id, , irrelevant fields. holds 80 records. users table irrelevant , may ignored. sorry adding in first instance.

---- end of update july 3 ----

this explained result query

id  select_type     table   type    possible_keys       key         key_len     ref             rows    1   simple          v       range   status              status      1           null            112895  using where; using temporary; using filesort 1   simple          vc      ref     video_id            video_id    4           v.id            2     1   simple          c       eq_ref  primary             primary     4           vc.category_id  1       using 1   simple          u       eq_ref  primary             primary     4           v.user_id       1       using 

i think problem sql engine "using filesort" because it's using 'status' index, instead of v.id. also, it's "using temporary" because of number of records engine has write , in-memory table not enough.

update (july 3): after tests reached conclusion problem of particular query usage of v.status index doesn't @ (98% of videos has status=1)

  • question 1: why isn't optimizer using v.id index sort , filter? i'm using order , limit that.

important note: if remove 'v.status=1' filter clause, query takes 0.01 s, , uses v.id (primary) index, solving all.

  • question 2: there way force index usage on mysql < 5.0 ?

---- end of update note july 3 ----

to sum up

assuming have relevant indexes covered: how can optimize query, take 0.1 seconds?

i'm pretty sure quite challenge advanced sql administrators , programmers.

given query (somewhat reformatted):

select v.id   videos v   left join videos_categories vc on v.id = vc.video_id   left join categories c on vc.category_id = c.id   left join users u on v.user_id = u.id  v.status = 1    , v.reported < 10    , (c.status = 1 or c.id null)    , (u.status = 1 or u.id null)  group v.id  order v.id desc  limit 0, 12 

you have incorrectly characterized tables. said:

  • 'videos' (id pk, +irrelevant fields) table holds +100k records.
  • 'videos_categories' (video_id index,category_id index) +600k records - multiple rows per video
  • 'categories' (id pk, + irrelevant fields)
  • 'users' (id pk, +irrelevant fields) not problem.

the cardinalities (row counts) of categories , users informative. more seriously, though, query references:

  • videos.status
  • videos.reported
  • videos.user_id
  • categories.status
  • users.status

these fields should being mentioned separately irrelevant fields, , indexes on these columns should identified. better provide table schemas used answer query, comment '-- , other irrelevant columns' @ end of each table.

does video_categories table have unique constraint on combined (video_id, category_id) columns? why not?

it isn't clear why videos table has user_id column; looks more there should video_users table (video_id, user_id) columns. however, that's separate discussion. also, not clear why have videos without user id value, left outer join users puzzling too. however, bravely assert not part of problem, take @ word.

left outer join can serious performance inhibitor. might better results union (or might not - union can performance inhibitor!):

select v.id   (select v.id, v.user_id, v.status, v.reported           videos v           join videos_categories vc on v.id = vc.video_id           join categories c on vc.category_id = c.id          c.status = 1         union         select v.id, v.user_id, v.status, v.reported           videos v          v.id not in (select video_id video_categories)        ) l   left join users u on l.user_id = u.id  l.reported < 10    , (u.status = 1 or u.id null)  group l.id  order l.id desc  limit 0, 12 

(the alias 'l' 'list of videos'.) thinking here first half of union deals inner joins, , second half deals videos aren't categorized. however, not in condition performance problem, if there one. come think of it, think 2 lists of videos in union should disjoint, can use union in place of union; can beneficial performance (because avoids duplicate elimination phase).

you usefully push 'l.reported < 10' condition down each half of union (where becomes v.reported < 10) if optimizer not automatically you.

i'm no means confident perform better original, @ least gives ideas mull over.


Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -