php - Atomicity in Map/Reducing over new records (MongoDB) -
here's situation: i've got mongodb cluster , web-app, pretty intensive map/reduce query. query happens periodically (every 5min) in cron job, , results stored (using $merge
) collection.
what works: currently, query performs on every record in collection. said collection growing millions of rows, , each time runs, takes little longer.
the obvious solution run map/reduce on new records, , use reduce function on old stored values calculate correct value. mongodb great, lets specify reduce
option instead of merge
that.
what can't figure out: how correctly perform m/r on new records in initial collection. see 2 potential solutions, neither of good. ideas?
- i flag records have been processed. problem how flag same records m/r'd over?
- i query matching items, pass list of ids
$in: [id1, id2, ...]
query map/reduce, , send update set flag using same$in
. that's inelegant, , don't know how that's going perform when list of records huge.
tl;dr: how select new records in map/reduce query reduces result collection.
a kind soul on #mongodb
irc channel helped me figure 1 out. simple solution have state machine field, , following (in pseudo-code):
set {state:'processing'} {state:{$exists:false}} mapreduce {...} {state:'processing'} set {state:'done'} {state:'processing'}
now, suboptimal because wastes lot of disk space on collection millions of records. real question is, why did not think of sooner?
Comments
Post a Comment