Left/Right Only Diffing using recursive node compression

While we’re talking about diffing … now I needed to be able to diff two local directories with a few requirements …

 

1) determine left/right only items

2) determine the optimal recursive compression of those items

3) print some status one in a while

4) distinguish between compressed and non-compressed items in the result set

 

What is optimal recursive compression?  Basically it says this:

 

If I have:

 

left-only: \path\foo\bar\file1.txt

left-only: \path\foo\bar\baz\file2.txt

left-only: \path\foo\bar

 

All I need to track is:

 

left-only: \path\foo\bar

 

Since it is left-only anything under it is implicitly left-only.

 

Make sense?

 

This is pretty easy with a simple tree structure.  There are two types of nodes in this tree – terminal and non-terminal.

 

In the example above the following is the tree structure before pruning:

 

                    <root>

                    +

                    Path

                    +

                    Foo

                    +

                    Bar [t]

                    +

                    / \

                    / \

                   / \

        File1.txt [t] Baz

  +

  File2.txt [t]

 

It should be clear that the top-most terminal node is Bar so everything beneath it can be cut from the tree.  This pruning can be done while the tree is being built or afterwards.  The implementation I choose is to do it during.  Obviously input order can’t be an issue (it should not matter if the terminal node comes in before or after non-terminal children) so you prune on the fly [which can mean deleting children when a terminal parent is found or discarding children of a terminal parent).

 

Ok – you can see the code.  It’s the PruningTreeNode<T> and PruningTree<T> class in the example below.

 

Also I decided not to reinvent the wheel and used Eric Gunnerson’s DirectoryWalker class from his book “A Programmer’s Introduction to C#”

 

Oh – there is also the question of “Why do you want recursive node compression?”

 

Simple … TFS is designed to handle recursive cases.  It’s cheaper to operate on a directory with 1000 children then to operate on all 1000 children and the parent directory individually.

 

On to the code (normal caveats about this being untested and how it works for me but I make no promises about what it might do on your machine)…

 

using System;

using System.Collections.Generic;

using System.Text;

using System.IO;

namespace locallocaldiff

{

    class Program

    {

        static void Main(string[] args)

        {

            if (args.Length != 2)

            {

                Console.Error.WriteLine("Usage: locallocaldiff <path> <path>");

                Environment.Exit(1);

            }

            Console.WriteLine("comment: start time: {0}", DateTime.Now);

            System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

            sw.Start();

            try

            {

                ProcessDiffs(

                    args[0].TrimEnd(new char[] { '\\', '/' }),

                    args[1].TrimEnd(new char[] { '\\', '/' }));

           }

            catch (LocalLocalDiffException e)

            {

                Console.Error.WriteLine(e.Message);

                Environment.Exit(1);

            }

            Console.WriteLine("comment: Elapsed time: {0}", sw.Elapsed);

        }

     private static void ProcessDiffs(string lhsPath, string rhsPath)

        {

            if (!Directory.Exists(lhsPath))

            {

                throw new LocalLocalDiffException(string.Format("directory not found: {0}", rhsPath));

            }

            if (!Directory.Exists(rhsPath))

            {

                throw new LocalLocalDiffException(string.Format("directory not found: {0}", rhsPath));

            }

            Console.WriteLine("comment: Diffing item sets");

            DiffItemSets(lhsPath, rhsPath);

        }

        private static Dictionary<string, bool> CreateLocalHash(string localPath)

        {

            Dictionary<string, bool> resultHash = new Dictionary<string, bool>();

            DirectoryWalker dw = new DirectoryWalker(

                    new DirectoryWalker.ProcessDirCallback(delegate(DirectoryInfo d, int level, object obj)

                    {

                        printStatus(d.FullName);

                        Dictionary<string, bool> results = (Dictionary<string, bool>)obj;

                        results[getNormalizedLocalPath(localPath, d.FullName)] = true;

                    }),

                    new DirectoryWalker.ProcessFileCallback(delegate(FileInfo f, int level, object obj)

                    {

                        printStatus(f.FullName);

                        Dictionary<string, bool> results = (Dictionary<string, bool>)obj;

                        results[getNormalizedLocalPath(localPath, f.FullName)] = false;

                    })

                );

            dw.Walk(localPath, resultHash);

            return resultHash;

        }

        private static string getNormalizedLocalPath(string rootPath, string filePath)

        {

            return filePath.Substring(rootPath.Length).Replace('\\', '/');

        }

        static DateTime lastMessage = DateTime.Now;

        private static void printStatus(string msg)

        {

            if (DateTime.Now.Subtract(lastMessage).Seconds > 30)

            {

                Console.WriteLine("comment: {0}", msg);

                lastMessage = DateTime.Now;

            }

        }

        private static void DiffItemSets(string lhsRootPath, string rhsRootPath)

        {

            int same = 0;

            int lonly = 0;

            int ronly = 0;

            Console.WriteLine("comment: loading {0}", lhsRootPath);

            Dictionary<string, bool> lhsHash = CreateLocalHash(lhsRootPath);

            Console.WriteLine("comment: loading {0}", rhsRootPath);

            Dictionary<string, bool> rhsHash = CreateLocalHash(rhsRootPath);

            PruningTree<Pair<string, bool>> lhsOnlyTree = new PruningTree<Pair<string, bool>>();

            Console.WriteLine("comment: running diff algorithm");

            foreach (string lPath in lhsHash.Keys)

            {

                printStatus(lPath);

                bool lhsIsDir = lhsHash[lPath];

                bool rhsIsDir;

                if (rhsHash.TryGetValue(lPath, out rhsIsDir))

                {

       if (lhsIsDir == rhsIsDir)

                    {

                        same++;

                    }

                    rhsHash.Remove(lPath);

                }

                else

                {

                    lhsOnlyTree.Add(

   lPath.Split(new char[] { '\\' }, StringSplitOptions.RemoveEmptyEntries),

                        new Pair<string,bool>(lPath, lhsIsDir));

                    lonly++;

                }

            }

            foreach (Pair<string, bool> lhOnlyitem in lhsOnlyTree)

            {

                string verb = (lhOnlyitem.Second) ? "left-only-r" : "left-only";

                Console.WriteLine("{0}: {1}", verb, lhOnlyitem.First);

            }

            PruningTree<Pair<string, bool>> rhsOnlyTree = new PruningTree<Pair<string, bool>>();

            foreach (string rPath in rhsHash.Keys)

            {

                rhsOnlyTree.Add(

                    rPath.Split(new char[] { '\\', '/' }, StringSplitOptions.RemoveEmptyEntries),

                    new Pair<string, bool>(rPath, rhsHash[rPath]));

                ronly++;

            }

            foreach (Pair<string, bool> rhOnlyitem in rhsOnlyTree)

            {

                string verb = (rhOnlyitem.Second) ? "right-only-r" : "right-only";

                Console.WriteLine("{0}: {1}", verb, rhOnlyitem.First);

            }

            Console.WriteLine("summary: Compared {0} items", ronly + lonly + same);

            Console.WriteLine("summary: Same: {0}", same);

            Console.WriteLine("summary: left only: {0}", lonly);

            Console.WriteLine("summary: right only: {0}", ronly);

        }

    }

    class PruningTreeNode<T>

    {

        bool terminal;

        string nodeValue;

        T tagValue;

    List<PruningTreeNode<T>> children = null;

        public PruningTreeNode(string value, bool isTerminal, T tag)

        {

            nodeValue = value;

            terminal = isTerminal;

            tagValue = tag;

        }

        public T Tag

    {

            get

            {

                return tagValue;

            }

            set

            {

                tagValue = value;

            }

        }

        public string Value

        {

            get

            {

                return nodeValue;

            }

        }

        public bool Terminal

        {

            get

            {

                return terminal;

            }

            set

            {

                terminal = value;

            }

        }

        public List<PruningTreeNode<T>> Children

        {

            get

            {

                if (children == null)

                {

                    children = new List<PruningTreeNode<T>>();

                }

                return children;

            }

        }

        public bool HasChildren

        {

            get

            {

                return (!terminal && children != null && children.Count > 0);

            }

        }

        public bool TryGetChild(string childValue, out PruningTreeNode<T> item)

        {

            bool has = false;

            item = null;

            if (!terminal && children != null)

            {

                foreach (PruningTreeNode<T> node in children)

                {

                    if (node.Value == childValue)

                    {

                        item = node;

                        has = true;

                        break;

                    }

                }

            }

            return has;

        }

        public override string ToString()

        {

            if (terminal)

            {

                return tagValue.ToString();

            }

            else

            {

                return string.Format("Non-terminal: {0}", nodeValue);

            }

        }

    }

    class PruningTree<T> : IEnumerable<T>

    {

        PruningTreeNode<T> root;

        public PruningTree()

        {

            root = new PruningTreeNode<T>(null, false, default(T));

        }

        public void Add(string[] paths, T tag)

        {

            int depth = paths.Length;

            PruningTreeNode<T> current = root;

            if (depth > 0)

            {

                int i = 0;

                for (; i < depth; i++)

                {

                    if (current.Terminal)

                    {

                        // we hit a terminal node so we don't have to keep looking

                        break;

                    }

                    PruningTreeNode<T> tempNode;

                    if (current.TryGetChild(paths[i], out tempNode))

                    {

                        current = tempNode;

                        if (i == depth - 1)

                        {

                            // PRUNING children ...

                            // we are at the terminal node of the list and the node is

                            // in the tree - so either there are children and this is a

                            // terminal parent (prune) or there is a duplicate in the list

                            // (favor the last one seen)

                            current.Children.Clear();

                            current.Terminal = true;

                            current.Tag = tag;

                        }

                    }

                    else

                    {

                        // Adding a new node - may or may not be terminal.

                        bool terminal = (i == depth - 1);

                        T tagTemp = (terminal) ? tag : default(T);

                        tempNode = new PruningTreeNode<T>(paths[i], terminal, tagTemp);

                        current.Children.Add(tempNode);

                        current = tempNode;

                    }

                }

            }

        }

        private void flattenTree_internal(PruningTreeNode<T> current, List<PruningTreeNode<T>> nodes)

        {

            if (current.Terminal)

            {

                nodes.Add(current);

            }

            else if (current.HasChildren)

            {

                foreach (PruningTreeNode<T> child in current.Children)

                {

                    flattenTree_internal(child, nodes);

                }

            }

        }

        private List<PruningTreeNode<T>> flattenTree()

        {

            List<PruningTreeNode<T>> nodes = new List<PruningTreeNode<T>>();

            flattenTree_internal(root, nodes);

            return nodes;

        }

        #region IEnumerable<T> Members

        public IEnumerator<T> GetEnumerator()

        {

            List<PruningTreeNode<T>> nodes = flattenTree();

            foreach (PruningTreeNode<T> node in nodes)

            {

                yield return node.Tag;

            }

        }

        #endregion

  #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()

        {

            return GetEnumerator();

        }

        #endregion

    }

    internal class LocalLocalDiffException : Exception

    {

        public LocalLocalDiffException(string msg)

            : base(msg)

        { }

    }

    internal class Pair<T1, T2>

    {

        public T1 First;

        public T2 Second;

        public Pair(T1 first, T2 second)

        {

     First = first;

            Second = second;

        }

    }

    /*

        A Programmer's Introduction to C# (Second Edition)

        by Eric Gunnerson

        Publisher: Apress L.P.

        ISBN: 1-893115-62-3

    */

    // 32 - .NET Frameworks Overview\InputOutput\Traversing Directories

    // copyright 2000 Eric Gunnerson

    internal class DirectoryWalker

    {

        public delegate void ProcessDirCallback(DirectoryInfo dir, int level, object obj);

        public delegate void ProcessFileCallback(FileInfo file, int level, object obj);

        public DirectoryWalker(ProcessDirCallback dirCallback,

        ProcessFileCallback fileCallback)

        {

            this.dirCallback = dirCallback;

            this.fileCallback = fileCallback;

        }

        public void Walk(string rootDir, object obj)

        {

            DoWalk(new DirectoryInfo(rootDir), 0, obj);

        }

        void DoWalk(DirectoryInfo dir, int level, object obj)

        {

            foreach (FileInfo f in dir.GetFiles())

            {

                if (fileCallback != null)

                    fileCallback(f, level, obj);

            }

            foreach (DirectoryInfo d in dir.GetDirectories())

            {

                if (dirCallback != null)

                    dirCallback(d, level, obj);

                DoWalk(d, level + 1, obj);

            }

        }

        ProcessDirCallback dirCallback;

        ProcessFileCallback fileCallback;

    }

}