Regex with WordPress Database Backups

For too long, regular expressions or regex have been mysterious to me. With the help of the fantastic video series at Laracasts the quirky pattern matching syntax has started to make sense. The following are some experiments with pattern matching on WordPress database backups.

Extracting data directly from database files

Ok so this use case might seem a little silly, but on my backup server I have a database backups for each of my WordPress sites. Currently the only way for me to read them is to create a local MySQL database, import the backup, then talk to it with SQL statements.

This might make sense for complex queries however it’s overkill considering I just want to read some basic info from the wp_options tables. Specifically I want to see the active WordPress plugins from the active_plugins row. This regular expression will return the active_plugins value (see group #2 results): http://regexr.com/3gf9p.

From the command line you can use perl to run the pattern match and store the results in a bash variable.

active_plugins=`perl -n -e '/(\047active_plugins\047,\047)(.+?)(\047,\047yes\047\),)/&& print $2' ~/Backup/websitename.com/wp-content/mysql.sql`

The results will be serialized and slashes added. You can pass that through a PHP file to parse.

php unserialize.php $active_plugins

Here is what’s inside the unserialize.php file

<?php
print_r( unserialize( stripslashes( $argv[1] ) ) );

That will output the following.

Array (
[0] => worker/init.php
[1] => gravityforms/gravityforms.php
[2] => advanced-custom-fields-pro/acf.php
[3] => akismet/akismet.php
[4] => anchorhost-client/anchorhost-client.php
[5] => jetpack/jetpack.php
[6] => mainwp-child/mainwp-child.php
[7] => wordpress-importer/wordpress-importer.php
[8] => wp-cron-cleaner/wp-cron-cleaner.php
[9] => wp-force-lowercase-urls/wp-force-lowercase-urls.php
)

One thing you may have noticed is I’m using \047 instead of a single quote ' in the regex. That is due to issues with escaping on the command line.