python - How to retrieve the first X objects with unique attribute value -
in 1 of django application, looking elegant , performant solution problem described following example :
given these objects:
class author(models.model): name = models.charfield() class book(models.model): collection = models.foreignkey(collection) publication = models.datefield() class collection(models.model): name = models.charfield() author = models.foreignkey(author) i retrieve 4 (or other small number) latest published books want have 4 different authors. meaning if 2 latest published books same author, want 1 in top 4 et leave 3 spots other authors.
i have thought of doing in multiple steps, retrieving latest publication, testing 1 one , storing author value , if present multiple time retrieve more latest publication... being done on home page, need code efficient possible.
any highly appreciated.
you use annotate, extra or raw. here's how you'd use annotatate:
books = [a.book_set.latest('pub_date') in author.objects .annotate(latest=max('book__pub_date')) .order_by('-latest')[:5]] assuming authors don't have multiple books same pub_date use extra this:
sql = '''select max(app_book.pub_date) app_book app_book.author_id=app_author.id''' latest = author.objects.extra( select={'latest': sql}, order_by=['-latest'])[:5].values_list('latest') books = book.objects.filter(pub_date__in=[x[0] x in latest]).order_by('-pub_date') if use raw grab books single query:
sql = '''select * app_book app_book.pub_date in (select max(app_book.pub_date) app_book group app_book.author_id) order app_book.pub_date desc''' books = list(book.objects.raw(sql)[:5]) i'm assuming models following:
class author(models.model): name = models.charfield(max_length=50) class book(models.model): title = models.charfield(max_length=50) author = models.foreignkey(author) pub_date = models.datetimefield() class meta: get_latest_by = 'pub_date' for fun thought i'd benchmark 3 approaches (using db filled 100k dummy books):
>>> %time annotate() (0.274) select "app_author"."id", "app_author"."name", max("app_book"."pub_date") "latest" "app_author" left outer join "app_book" on ("app_author"."id" = "app_book"."author_id") group "app_author"."id", "app_author"."name", "app_author"."id", "app_author"."name" order "latest" desc limit 5; args=() (0.035) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 10 order "app_book"."pub_date" desc limit 1; args=(10,) (0.036) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 9 order "app_book"."pub_date" desc limit 1; args=(9,) (0.036) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 8 order "app_book"."pub_date" desc limit 1; args=(8,) (0.036) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 7 order "app_book"."pub_date" desc limit 1; args=(7,) (0.040) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."author_id" = 6 order "app_book"."pub_date" desc limit 1; args=(6,) cpu times: user 0.32 s, sys: 0.15 s, total: 0.47 s wall time: 0.47 s <<< [<book: susan>, <book: yasmin>, <book: carl>, <book: benny>, <book: george>] >>> %time extra() (0.445) select (select max(app_book.pub_date) app_book app_book.author_id=app_author.id) "latest" "app_author" order "latest" desc limit 5; args=() (0.045) select "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" "app_book" "app_book"."pub_date" in (2038-11-25 11:33:30.425836, 2038-11-24 11:33:30.424598, 2038-11-23 11:33:30.423435, 2038-11-22 11:33:30.422227, 2038-11-21 11:33:30.421045) order "app_book"."pub_date" desc; args=(u'2038-11-25 11:33:30.425836', u'2038-11-24 11:33:30.424598', u'2038-11-23 11:33:30.423435', u'2038-11-22 11:33:30.422227', u'2038-11-21 11:33:30.421045') cpu times: user 0.32 s, sys: 0.18 s, total: 0.50 s wall time: 0.50 s <<< [<book: susan>, <book: yasmin>, <book: carl>, <book: benny>, <book: george>] >>> %time raw() (0.279) select * app_book app_book.pub_date in (select max(app_book.pub_date) app_book group app_book.author_id) order app_book.pub_date desc; args=() cpu times: user 0.19 s, sys: 0.09 s, total: 0.28 s wall time: 0.28 s <<< [<book: susan>, <book: yasmin>, <book: carl>, <book: benny>, <book: george>]
Comments
Post a Comment