How can I count the current people in a group from a list of start and end dates in R -


or rather, how can better have fudged.

i have dataframe names , start , end dates in group. want produce dataframe number of people in group on time. note, people haven't left yet (end date na)

here's example dataset

foo<-data.frame(name=c("bob","sue", "richard", "jane"),          start=as.posixct(c("2006-03-23 gmt", "2007-01-20 gmt", "2007-01-20 gmt", "2006-03-23 gmt")),         end=as.posixct(c("2009-01-20 gmt", "na", "2006-03-23 gmt", "na"))) 

here create dataframe dates covering range want. feels dirty.

daterange<-data.frame(date=as.posixct(                 paste(                         rep(2006:2009, each=12),                          "-",                          rep(01:12, times=4),                         "-",                          1,                         " gmt",                          sep="")                         )                 )  #cheat setting nas soemthing far away foo$end[is.na(foo$end)]<-as.posixct(sys.time())+(365*24*60*60) 

now use ddply produce result.

ddply(.data=daterange, .variable="date", function(df) {             result=nrow(subset(foo, start<df$date & end>df$date))             return(result)         }) 

there must easier way ?

here alternate approach using plyr. directly works original data frame foo , not require converting na date. code self-explanatory , readable. comments welcome.

dates = seq(as.posixct('2006-01-01'), as.posixct('2009-12-01'), = "month") count = ldply(dates, function(d)     with(foo, sum((start < d) + (d < end | is.na(end)) == 2))) data.frame(dates, count) 

Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -