[Openmcl-devel] ccl 1.11 release and GitHub

mikel evins mevins at me.com
Mon Feb 13 18:24:15 PST 2017


> On Feb 13, 2017, at 7:57 PM, Bill St. Clair <wws at clozure.com> wrote:
> 
> On Mon, Feb 13, 2017 at 7:26 PM, R. Matthew Emerson <rme at acm.org> wrote:
> 
> In the Subversion world, sure, we could tag an old release, and that would include suitable bootstrapping binaries and interface databases.
> 
> In the git world, a tag (as I understand it) is essentially a pointer to a particular commit.  It is still necessary to have suitable bootstrapping binaries that will work to compile the sources at that point in time.  As a concrete example of what I mean by this, note that a current development ccl (i.e., 1.12-dev) can't build a ccl from the 1.11 sources.
> 
> ​This suggests a solution to me.
> 
> CCL would have two GIT repositories:
> 
> 1) Source code, tagged with releases, branched with patches for particular releases. Move its tag when you add a bug-fix patch to a release.
> 
> 2) Binaries. Branched from the beginning with one branch per platform. Tagged to match the source tags, but with unique names for each branch:
> 
> 1) Tags: 1.9, 1.10, 1.11
> 
> 2) Branches: darwinx86, linuxarm, linuxppc, linuxx86, solarisx86, windows​
> 
> <branch>: <tag>…
> darwinx86: 1.9-dx86, 1.10-dx86, 1.11-dx86
> linuxarm: 1.9-armcl, 1.10-armcl, 1.11-armcl
> linuxppc: 1.9-pp, 1.10-pp, 1.11-pp
> linuxx86: 1.9-lx86, 1.10-lx86, 1.11-lx86
> solarisx86: 1.9-sx86, 1.10-sx86, 1.11-sx86
> windows: 1.9-wx86, 1.10-wx86, 1.11-wx86
> 
> This would allow you to put the two directories side-by-side, and, on the linux-like platforms, have symbolic links from the source directory reference the binaries in the binary directory:
> 
> ccl/
>   trunk/    # Source for master branch
>     lx86cl -> ../trunk-bin/lx86cl
>     lx86cl.image -> ../trunk/bin/lx86cl.image
>     lx86cl64 -> ../trunk-bin/lx86cl64
>     lx86cl64.image -> ../trunk-bin/lx86cl64.image
>   trunk-bin/    # Binaries for master branch
>     lx86cl
>     lx86cl.image
>     lx86cl64
>     lx86cl64.image
> 
> We could have a script to run in the source directory to create the symbolic links from <dir>/foo to ../<dir>-bin/foo
> 
> One remaining issue is what to do with the header directories. We could either just have everybody always download all of them with the source, or put them with the binaries and use symbolic links from the source directory to get to them in the binary directory.

The main problem with storing binaries in git repositories is that git's diff algorithm is designed for diffing text files. It doesn't get particularly good results in the general case on binary objects. The practical result is that, in effect, any time you change a binary object, no matter how small the change, git will store a whole new copy of the entire binary object.

That means that repos containing binaries that change often will tend to grow in size very much faster than repos that contain only text.

That isn't necessarily a big problem, as long as the stored binaries don't change very often. In fact, you should be able to do a little back-of-the-envelope arithmetic to determine how big a problem it would be to store CCL binaries in a git repo. Go over the release history for the past couple of years and count up the number of times each binary object has changed. For each binary object, multiply its size by the number of times it changed. Add all those together for a rough estimate of the total size of the repo.

The common strategy for storing binaries for git repos is to store them in something other than git, and use a git extension that stores pointer files in the repo. The pointer files refer to the non-gt storage for the binary files, and enable git to fetch and store the appropriate binary objects during push and pull operations.

Here's a recent discussion of the problem and the current crop of solutions:

  https://github.com/openframeworks/openFrameworks/wiki/Moving-binaries-out-of-the-repo

GitHub provides support for git-LFS.





More information about the Openmcl-devel mailing list