Manually Merging Posts Between WordPress Sites

My recommendation has always been to avoid workflows that involve deploying staging to/from production. That’s because merging WordPress websites is really hard if not nearly impossible to pull off. There are just too moving pieces to track. 👀

Mergebot was a brilliant attempt at solving the merging problem. Unfortunately, that project has been abandoned and I haven’t seen an elegant replacement. That said, given an ideal scenario, let’s see how merging might be accomplished using just WP-CLI and PHP scripts.

A single custom post type, a few custom taxonomies and 5 custom fields…. now let’s merge the content!

The scenario I’ll be describing will be painfully familiar to some of you. Nine months ago I created a staging copy of my church’s WordPress website. On the staging site the following changes were made:

  • A new WordPress theme was installed and activated.
  • Elementor page builder was added, and most of the content moved to the Elementor editor.
  • Various content updates to navigation and widgets.

Now fast forward to today and everything on the staging site is ready to be pushed to the production site. But before that can be done… changes made to the production site need to be pulled back into the staging site.

I say this is an ideal website to merge with the production because during the last 9 months the only changes made were a weekly audio posting handled by a single custom post type. This site has no e-commerce, no user accounts changes, or contact forms to worry about. This should be a fairly straightforward task. Right? Let’s dig in.

Custom post type compared between staging and production.

Step #1: Run custom export script on the production website.

On the production site I created a folder called /migrations/ and placed a file called export-audio.php. In WP-CLI I run wp eval-file export-audio.php, which generates a posts.json file that contains all of the data relating to the custom post type named audio. There are three related taxonomies and five related meta fields which are explicitly mentioned.

<?php
$posts_json = "posts.json";

if ( ! file_exists( $posts_json ) ) {
    file_put_contents( $posts_json, '[]' );
}

$posts_json_data = json_decode( file_get_contents( $posts_json ) );
$post_ids        = array_column( $posts_json_data, 'post_id' );

$args  = [
    "post_type"      => "audio",
    "post_status"    => "publish",
    "posts_per_page" => "-1",
];

$posts = get_posts( $args );

foreach ( $posts as $post ) {

    $series   = get_the_terms( $post->ID, 'series' );
    $speakers = get_the_terms( $post->ID, 'speakers' );
    $years    = get_the_terms( $post->ID, 'years' );

    // Update current exported post
    if ( in_array( $post->ID, $post_ids ) ) {
        $key                                = array_search( $post->ID, array_column( $posts_json_data, 'post_id' ) );
        $posts_json_data[$key]->title       = $post->post_title;
        $posts_json_data[$key]->description = $post->post_content;
        $posts_json_data[$key]->series      = is_array( $series ) ? array_column( $series, 'term_taxonomy_id' ) : [];
        $posts_json_data[$key]->speakers    = is_array( $speakers ) ? array_column( $speakers, 'term_taxonomy_id' ) : [];
        $posts_json_data[$key]->years       = is_array( $years ) ? array_column( $years, 'term_taxonomy_id' ) : [];
        $posts_json_data[$key]->data        = [
            "audio_soundcloud"        => get_post_meta( $post->ID, "audio_soundcloud", true ),
            "audio_soundcloud_length" => get_post_meta( $post->ID, "audio_soundcloud_length", true ),
            "audio_soundcloud_url"    => get_post_meta( $post->ID, "audio_soundcloud_url", true ),
            "audio_soundcloud_size"   => get_post_meta( $post->ID, "audio_soundcloud_size", true ),
            "audio_file_english"      => get_post_meta( $post->ID, "audio_file_english", true ),
        ];
        continue;
    }

    // Export post
    $posts_json_data[] = [
        "post_id"     => $post->ID,
        "created_at"  => $post->post_date,
        "title"       => $post->post_title,
        "description" => $post->post_content,
        "series"      => is_array( $series ) ? array_column( $series, 'term_taxonomy_id' ) : [],
        "speakers"    => is_array( $speakers ) ? array_column( $speakers, 'term_taxonomy_id' ) : [],
        "years"       => is_array( $years ) ? array_column( $years, 'term_taxonomy_id' ) : [],
        "data"        => [
            "audio_soundcloud"        => get_post_meta( $post->ID, "audio_soundcloud", true ),
            "audio_soundcloud_length" => get_post_meta( $post->ID, "audio_soundcloud_length", true ),
            "audio_soundcloud_url"    => get_post_meta( $post->ID, "audio_soundcloud_url", true ),
            "audio_soundcloud_size"   => get_post_meta( $post->ID, "audio_soundcloud_size", true ),
            "audio_file_english"      => get_post_meta( $post->ID, "audio_file_english", true ),
        ]
    ];

}

file_put_contents( $posts_json, json_encode( $posts_json_data, JSON_PRETTY_PRINT ) );
Example generated posts.json

Step #2: Pull in only new posts into staging site

On the staging site, I created a /migrations/ folder and placed a file called import-audio.php, and copied in posts.json which was created by the export script. Before running the import, make sure to do a quick database backup wp db export in case the import goes off the rails. Next, run the import wp eval-file import-audio.php. This will loop through and generate new posts for any missing.

<?php
$posts_json = "posts.json";

$tax_map_ids = [
    "180" => "188",
    "181" => "193",
    "182" => "194",
    "183" => "189",
    "184" => "190",
    "185" => "191",
    "186" => "192",
];

if ( ! file_exists( $posts_json ) ) {
    file_put_contents( $posts_json, '[]' );
}

$posts_json_data = json_decode( file_get_contents( $posts_json ) );

$args  = [
    "post_type"      => "audio",
    "post_status"    => "publish",
    "posts_per_page" => "-1",
];

$posts    = get_posts( $args );
$post_ids = array_column( $posts, 'ID' );

foreach ( $posts_json_data as $key => $post ) {

    $compare_post = get_post( $post->post_id );

    // Verify a new post hasn't been generated
    if ( $post->new_post_id != "" ) {
        continue;
    }

    // Verify destination post matches created at time
    if ( in_array( $post->post_id, $post_ids ) && $post->created_at == $compare_post->post_date ) {
        if ( $compare_post->post_title != $post->title ) {
            echo "Updating title\n";
            wp_update_post( [ 'ID' => $post->post_id, "post_title" => $post->title ] );
        }
        if ( $compare_post->post_content != $post->description ) {
            echo "Updating description\n";
            wp_update_post( [ 'ID' => $post->post_id, "post_content" => $post->description ] );
        }
        continue;
    }

    $speakers = $posts_json_data[$key]->speakers;
    $series   = $posts_json_data[$key]->series;
    $years    = $posts_json_data[$key]->years;

    foreach ( $speakers as $speaker_key => $speaker ) {
        if (  array_key_exists( $speaker, $tax_map_ids ) ) {
            $speakers[ $speaker_key ] = $tax_map_ids[ $speaker ];
        }
    }

    foreach ( $series as $serie_key => $serie ) {
        if (  array_key_exists( $serie, $tax_map_ids ) ) {
            $series[ $serie_key ] = $tax_map_ids[ $serie ];
        }
    }

    // Made it this far so generate a new post as it's missing.
    echo "Generating new post ID for #{$post->post_id}\n";
    $args = [
        'post_date'    => $post->created_at,
        'post_title'   => $post->title,
        'post_content' => $post->description,
        'post_type'    => 'audio',
        'post_status'  => 'publish',
        'meta_input'   => $post->data,
    ];
    $new_post_id                        = wp_insert_post( $args );
    $posts_json_data[$key]->new_post_id = $new_post_id;
    wp_set_post_terms( $new_post_id, $speakers, 'speakers' );
    wp_set_post_terms( $new_post_id, $series, 'series' );
    wp_set_post_terms( $new_post_id, $years, 'years' );
    continue;

}

file_put_contents( $posts_json, json_encode( $posts_json_data, JSON_PRETTY_PRINT ) );

Verify the imported data looks correct. If not, roll back the database with WP-CLI wp db import <backup-file-name.sql>, tweak some code and try the import again.

Step #3: Final backup then deploy staging to production

After a bit of trial and error I did manage to successfully merge over 56 missing posts and launch the new website! 🎉 This was significantly faster then the copy/paste alternative. Not to mention much more accurate.

Kinsta’s staging to production deploy dialog

Merging is a complex process.

Doing it manually requires a deep understanding of where all the data is stored within WordPress. You may have noticed my restore script included an array named $tax_map_ids, which had hard-coded taxonomy term ID’s. That was due to the fact that I manually recreated missing taxonomy terms on my staging site. These few terms had different term ID’s and needed to be translated to the proper term ID when importing.

Avoid merging when possible.

Everything about merging gets complex really quickly. Yes it’s possible to write a custom script to merge data between separate sites. In fact, I’d consider the above tutorial a solid workflow for doing just that. However I would highly caution against merging data manually due to the tedious level of coding required.

Really, what should exist is a product or service which handles all this complex merging for you. If that sounds like something you wish to build, I encourage you to use this post as a starting point. 🤖