php - Atomicity in Map/Reducing over new records (MongoDB) -


here's situation: i've got mongodb cluster , web-app, pretty intensive map/reduce query. query happens periodically (every 5min) in cron job, , results stored (using $merge) collection.

what works: currently, query performs on every record in collection. said collection growing millions of rows, , each time runs, takes little longer.

the obvious solution run map/reduce on new records, , use reduce function on old stored values calculate correct value. mongodb great, lets specify reduce option instead of merge that.

what can't figure out: how correctly perform m/r on new records in initial collection. see 2 potential solutions, neither of good. ideas?

  1. i flag records have been processed. problem how flag same records m/r'd over?
  2. i query matching items, pass list of ids $in: [id1, id2, ...] query map/reduce, , send update set flag using same $in. that's inelegant, , don't know how that's going perform when list of records huge.

tl;dr: how select new records in map/reduce query reduces result collection.

a kind soul on #mongodb irc channel helped me figure 1 out. simple solution have state machine field, , following (in pseudo-code):

set {state:'processing'} {state:{$exists:false}} mapreduce {...} {state:'processing'} set {state:'done'} {state:'processing'} 

now, suboptimal because wastes lot of disk space on collection millions of records. real question is, why did not think of sooner?


Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -