Cassandra datastore size -
i using cassandra store parsed site logs. have 2 column families multiple secondary indices. log data around 30 gb in size. however, size of cassandra data dir ~91g. there way can reduce size of store? also, having multiple secondary indices have big impact on datastore size?
potentially, secondary indices have big impact, depends put in them! if of data entries appear in 1 or more indexes, indexes form significant proportion of storage.
you can see how space each column family using jconsole and/or 'nodetool cfstats'.
you can @ sizes of disk data files idea of usage.
it's possible data isn't being flushed disk enough - can result in lots of commitlog files being left on disk long time, occupying space. can happen if of column families lightly loaded. see http://wiki.apache.org/cassandra/memtablethresholds parameters tune this.
if have large numbers of small columns, column names may use significant proportion of storage, may worth shortening them makes sense (not if timestamps or other meaningful data!).
Comments
Post a Comment