TIL About BFG Repo-cleaner

If you ever migrate code from Bitbucket to Github, you will unpleasantly discover that GH does not allow by default fiels larger than 100MB (unless you pay extra for Large File Storage). At that point, you will probably realize that Github isn't really the right place to store such large files, and that you are better off moving the data to S3 or somewhere else.

However, you quickly realize that, even if you git remove the large file, you are unable to push the repo to Github anyways, as the file does not only exist on the current commit, but in all the history.

So you are left with two alternatives. The classic rm -rf .git which basically removes all the repo history, which not always is an option, or removing the large files completely from the history.

That is exactly what BFG Repo-cleaner does, is a tool that recursively removes large files across a repository's commit history.

You can read the usage on the project's site, but here is the list of steps you have to follow (with some-big-repo.git the repository's name on Bitbucket and 100MB the size of the files (or bigger) that you wish to remove:

$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push