Removing Sensitive Data from Git Repos


If you’ve ever worked with git for a WordPress project, you may at some point accidentally added private keys and other sensitive data into the repo. It’s easy to do especially when your working on a project that you’re not intending on sharing. Going back and modifying a past commit isn’t a simple one-liner. That’s sorta the point, once code is committed the history is unchangeable.

Good news – it is possible to change past commits.

I recently cleaned up a few git repos of mine and removed a bunch of sensitive data in prep to release the code publicly. Here’s a short walkthrough of what that process looks like.

Step 1: Removed all sensitive data from the current working directory.

For WordPress, the easiest thing to do is move all sensitive data into the wp-config.php file as constants like so.

define( 'CONSTELLIX_API_KEY', "123abc456efg" );
define( 'CONSTELLIX_SECRET_KEY', "xx123xx456xx" );

Then in your code where you’re using the keys directly you would simply replace them with the equivalent PHP constants.

function constellix_api_get( $command ) {

	$timestamp = round( microtime( true ) * 1000 );
	$hmac      = base64_encode( hash_hmac( 'sha1', $timestamp, CONSTELLIX_SECRET_KEY, true ) );
	$args      = array(
		'timeout' => 30,
		'headers' => array(
			'Content-type'         => 'application/json',
			'x-cnsdns-apiKey'      => CONSTELLIX_API_KEY,
			'x-cnsdns-hmac'        => $hmac,
			'x-cnsdns-requestDate' => $timestamp,
		),
	);

Step 2: Make a list of all sensitive data

Manually searching through all past git commits would be nearly impossible for any project. Luckily we don’t need to search through and remove things manually. In order to do an automatic search and remove we need to start by putting all sensitive data into a text file. This can be called anything. I’ll use passwords.txt in my example. Use one line per sensitive data.

Step 3: Run the removal tool 

This is handled through an amazing little tool named BFG Repo-Cleaner. You can see instructions on how to use it on their page. This is a condensed version:

  • Make a fresh copy of your git repo somewhere else on your computer using --mirror. This will make a new folder with the git repo only.
    git clone --mirror git://example.com/my-repo.git
  • Run the removal. 
    bfg --replace-text passwords.txt my-repo.git
  • Change into the git repo and run some git cleanup commands.
    cd my-repo.git
    git reflog expire --expire=now --all && git gc --prune=now --aggressive
  • Lastly push the changes to the repo. If this gives you trouble make sure your repo has “Non Fast-Forward pushes” enabled. With Beanstalk, who I use, that’s a simple checkbox.
    git push

Step 4: Delete all copies and make a fresh clone

Now that the repo has been cleaned, everyone working on the project should remove their copy and do a fresh clone. That will ensure no sensitive data is stored on other computers. At this point we are also done with the mirror copy. Delete that as well.

Step 5: Review and Repeat if needed

Again you’ll want to review your current working files to make sure no sensitive data was missed. If you do find another password you need to restart the entire process. Trust me it’s best to be thorough from the beginning. Some of my repos took 3 or 4 tries before I found everything that needed removed. All of your past commits with sensitive info should now look something like this.