How can I count the current people in a group from a list of start and end dates in R -
or rather, how can better have fudged.
i have dataframe names , start , end dates in group. want produce dataframe number of people in group on time. note, people haven't left yet (end date na)
here's example dataset
foo<-data.frame(name=c("bob","sue", "richard", "jane"), start=as.posixct(c("2006-03-23 gmt", "2007-01-20 gmt", "2007-01-20 gmt", "2006-03-23 gmt")), end=as.posixct(c("2009-01-20 gmt", "na", "2006-03-23 gmt", "na")))
here create dataframe dates covering range want. feels dirty.
daterange<-data.frame(date=as.posixct( paste( rep(2006:2009, each=12), "-", rep(01:12, times=4), "-", 1, " gmt", sep="") ) ) #cheat setting nas soemthing far away foo$end[is.na(foo$end)]<-as.posixct(sys.time())+(365*24*60*60)
now use ddply produce result.
ddply(.data=daterange, .variable="date", function(df) { result=nrow(subset(foo, start<df$date & end>df$date)) return(result) })
there must easier way ?
here alternate approach using plyr
. directly works original data frame foo
, not require converting na
date. code self-explanatory , readable. comments welcome.
dates = seq(as.posixct('2006-01-01'), as.posixct('2009-12-01'), = "month") count = ldply(dates, function(d) with(foo, sum((start < d) + (d < end | is.na(end)) == 2))) data.frame(dates, count)
Comments
Post a Comment