python - How to retrieve the first X objects with unique attribute value -


in 1 of django application, looking elegant , performant solution problem described following example :

given these objects:

class author(models.model):     name = models.charfield()  class book(models.model):     collection = models.foreignkey(collection)     publication = models.datefield()  class collection(models.model):     name = models.charfield()     author = models.foreignkey(author) 

i retrieve 4 (or other small number) latest published books want have 4 different authors. meaning if 2 latest published books same author, want 1 in top 4 et leave 3 spots other authors.

i have thought of doing in multiple steps, retrieving latest publication, testing 1 one , storing author value , if present multiple time retrieve more latest publication... being done on home page, need code efficient possible.

any highly appreciated.

you use annotate, extra or raw. here's how you'd use annotatate:

books = [a.book_set.latest('pub_date') in author.objects                    .annotate(latest=max('book__pub_date'))                    .order_by('-latest')[:5]] 

assuming authors don't have multiple books same pub_date use extra this:

sql = '''select max(app_book.pub_date)          app_book          app_book.author_id=app_author.id''' latest = author.objects.extra(                 select={'latest': sql},                 order_by=['-latest'])[:5].values_list('latest') books = book.objects.filter(pub_date__in=[x[0] x in latest]).order_by('-pub_date') 

if use raw grab books single query:

sql = '''select * app_book          app_book.pub_date in            (select max(app_book.pub_date)             app_book             group app_book.author_id)          order app_book.pub_date desc''' books = list(book.objects.raw(sql)[:5]) 

i'm assuming models following:

class author(models.model):     name = models.charfield(max_length=50)  class book(models.model):     title = models.charfield(max_length=50)     author = models.foreignkey(author)     pub_date = models.datetimefield()      class meta:         get_latest_by = 'pub_date' 

for fun thought i'd benchmark 3 approaches (using db filled 100k dummy books):

>>> %time annotate() (0.274) select "app_author"."id", "app_author"."name", max("app_book"."pub_date") "latest" "app_author" left outer join "app_book" on ("app_author"."id" = "app_book"."author_id") group "app_author"."id", "app_author"."name", "app_author"."id", "app_author"."name" order "latest" desc limit 5; args=() (0.035) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 10  order "app_book"."pub_date" desc limit 1; args=(10,) (0.036) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 9  order "app_book"."pub_date" desc limit 1; args=(9,) (0.036) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 8  order "app_book"."pub_date" desc limit 1; args=(8,) (0.036) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 7  order "app_book"."pub_date" desc limit 1; args=(7,) (0.040) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 6  order "app_book"."pub_date" desc limit 1; args=(6,) cpu times: user 0.32 s, sys: 0.15 s, total: 0.47 s wall time: 0.47 s <<< [<book: susan>, <book: yasmin>, <book: carl>, <book: benny>, <book: george>]  >>> %time extra() (0.445) select (select max(app_book.pub_date)              app_book              app_book.author_id=app_author.id) "latest" "app_author" order "latest" desc limit 5; args=() (0.045) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."pub_date" in (2038-11-25 11:33:30.425836, 2038-11-24 11:33:30.424598, 2038-11-23 11:33:30.423435, 2038-11-22 11:33:30.422227, 2038-11-21 11:33:30.421045) order "app_book"."pub_date" desc; args=(u'2038-11-25 11:33:30.425836', u'2038-11-24 11:33:30.424598', u'2038-11-23 11:33:30.423435', u'2038-11-22 11:33:30.422227', u'2038-11-21 11:33:30.421045') cpu times: user 0.32 s, sys: 0.18 s, total: 0.50 s wall time: 0.50 s <<< [<book: susan>, <book: yasmin>, <book: carl>, <book: benny>, <book: george>]  >>> %time raw() (0.279) select * app_book              app_book.pub_date in                (select max(app_book.pub_date)                 app_book                 group app_book.author_id)             order app_book.pub_date desc; args=() cpu times: user 0.19 s, sys: 0.09 s, total: 0.28 s wall time: 0.28 s <<< [<book: susan>, <book: yasmin>, <book: carl>, <book: benny>, <book: george>] 

Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -