[Openmcl-devel] ccl 1.11 release and GitHub

Ron Garret ron at flownet.com
Tue Feb 14 04:18:26 PST 2017


On Feb 14, 2017, at 12:09 AM, Ralf Mattes <rm at seid-online.de> wrote:

> On Mon, Feb 13, 2017 at 08:24:15PM -0600, mikel evins wrote:
>> 
>> 
>> The main problem with storing binaries in git repositories is that git's diff algorithm is designed for diffing text files. It doesn't get particularly good results in the general case on binary objects. The practical result is that, in effect, any time you change a binary object, no matter how small the change, git will store a whole new copy of the entire binary object.
> 
> 
> ??? Are aou shure this is correct? AFAIK git _always_ stores files and
> not diffs. That's one of the biggest differences with older/traditional
> version control systems. That's the reason checkouts/branch swithxing is
> so blazingly fast. 
> 
>> That means that repos containing binaries that change often will tend to grow in size very much faster than repos that contain only text.
> 
> I think this is wrong. Change one whitespace in a text file and a new
> object is created (named by it's md5 sum. Commits pretty much only
> ppoint to a collection of md5 sums).

You are correct, Ralf.  Git never stores diffs.  The reason binaries cause repo bloat is that they tend to be much bigger than source files.

rg




More information about the Openmcl-devel mailing list