synchronization - What is a sensible data-structure for allowing efficient synchronisation between two root paths? -


i working on application involves maintaining consistency between 2 local directories. specifically, directories should identical, exception files in 1 of directories modified in particular way (this part not important question).

while running, application runs 2 processes listen changes occurring under each of paths, , performs relevant operations bring them in sync when necessary.

in terms of specific question: i'm looking advice on tricker situation of when 1 starts application. @ point, each process needs check files/folders under both path looking after, see if has changed in anyway whilst application not running. (let assume application cannot notified os of happened while shutdown, , need directly check every file/folder.)

each process have access (and maintain) persistent data-structure of files/folder under designated path. thinking following should held within data-structure each of files , folders:

  • file/folder name;
  • file hash (crc32);
  • file/folder last mod data; and
  • file/folder size.

these pieces of information check changes files/folder, best way store them?

it seems me 1 sensible way approach situation of application start each process recursively scan through files/folders under designated path, , compare metadata each file scanned metadata stored in data-structure. processes should iterate through data-structures things have been removed paths. cases may encountered during process are:

  • file modified (file name found in data-structure, hash differs);
  • file added (no identical filename or hash found in data-structure);
  • file renamed (file same hash exists in data-structure, not same filename);
  • folder added (no folder name in data-structure);
  • folder removed (folder name in data-structure, not under path);
  • folder renamed (tricky one).

so, what's best data-structure use task? in head i'm thinking form of sorted associative array, e.g., red-black tree, store file , folder objects. each file object contains name, hash , mod-date attributes , while each folder object contains name , children attributes, children stores associative array underneath. given path arbitrary file, e.g., /foo/bar/file.txt, begin @ root (foo), check bar , on until file.txt's parent object.

another alternative can think of merely store flatly, such there 1 red-black tree each key full path each file/folder, , value file / folder object. quicker retrieval, won't possible detect renamed files/folders without iterating through values anyway, sounds expensive. in first approach, may case identifying rename involves checking portion of data-structure rather of it.

sorry above ideas aren't terribly thought out. what's state of art in area, , there well-trodden approaches these types of problems?

you're modelling filesystem, it's quite natural use hierarchical data structure. after all, don't need compare file @ dir1\dir2\foo.txt dir3\bar.txt, right? didn't mention file moves between directories you're tracking.

so, data structure be:

interface ifsentry {   string name   datetime creationdate   pure virtual bool compare(ifsentry other)   pure virtual void updatefrom(ifsentry other)   pure virtual bool wasrenamed(dictionary<string,ifsentry> possibleoriginals, out string oldname)   ... }   class file : ifsentry {   ... }   class directory : ifsentry {   private dictionary<string,ifsentry> children;   ... } 

the directory implementations of updatefrom , compare recurse down children.

file renames relatively easy comparing crc's. you'd miss files changed in both places , renamed. add crc dictionary directory class if time run comparisons proves performance problem.

for directory moves, if child files changed, you've got fuzzy logic situation. best have merge tool user operate situation.

if file changes in both places, need user-facing merge strategy if conflicting changes occur. i'd argue idea, let user eyeball document didn't lose coherence.


Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -