Importing a Big Honkin’ BlogML.xml Into WordPress

I started blogging in 2003. The blog application I started with was called .Text.  After a couple of years I migrated to Community Server, then to BlogEngine.NET.  The blog counts for those 10 years was 2557 posts and 3238 comments. So when I decided to migrated to WordPress I ended up with a HUGE BlogML.xml file, well over 10 megabytes.

The WordPress BlogML import plugin size limit is 2 megabytes.  This is actually set by PHP in the PHP.ini file.  You can bump it up as I did, but I discovered the default maximum is 2 megabytes for a reason.  Anything much larger and you’ll end up dropping posts, so I decided to break up my BlogML.xml into manageable chunks, 8 BlogML.xml files in fact. It was not as big a deal as you might suspect, as we’ll see.

Here is my WordPress BlogML initial Import screen with the ramped-up maximum size enabled. The Maximum size display is based on the server’s PHP configuration and changes automatically when the configuration is changed.

Before we get started, here’s a tip to reduce our workload and the size of the BlogML.xml export file from BlogEngine.NET or other blogging application. Check for spam comments, those which your spam filter may have handled but are still in your database.  The Spam comment count below may display “0”, but it was over 6000 before I purged it. That’s a lot of megabytes and spam that will end up on your sweet WordPress blog.

Breaking up is not hard to do

Here are tips on breaking up the BlogML.xml. We first do the usual BlogEngine.NET-to-WordPress modifications on the single BlogML.xml file. I blogged about that in .NET to WordPress: Migrating BlogEngine.NET.  Now we can start creating BlogML.xml babies.

The general instructions for breaking up a big honkin’ BlogML.xml are:

  1. We import the categories in the initial BlogML.xml file ONLY. In all others we clear the <categories /> tag. The WordPress BlogML Importer will match up the correct categories in the subsequent BlogML.xml files.
  2. We include the BlogML.xml head in each file, and the concluding </posts> and </blog> tags (which I’ll call the BlogML.xml tail.)
  3. When we cut from the master BlogML.xml file we can start at any <post> and grab a bunch of them to paste between the head and tail of a child BlogML.xml.
  4. After importing the initial BlogML.xml containing the categories, subsequent BlogML.xml files can be imported in any order since their display in WordPress is based on post publication date.

Here are a few screenshots covering our general instructions, starting with what I meant by a cleared <categories /> tag in all secondary BlogML.xml files.

Here is where we could start pulling posts from a big honkin’ BlogML.xml to create a smaller one.  We can start pulling posts at any <post>. Does not have to be the first one after the BlogML.xml header area.

Where we would stop and a display of the BlogML.xml tail tags we need to include on all child BlogML.xmls.

The BlogML.xmls required to import in my 10 years of blogging. Notice the file sizes around 2 megabytes. If you didn’t increase the PHP.ini properties as I did you would need to keep all files under 2MBs.

The Results

The original BlogEngine.NET blog shows our 2557 posts.

My spanking brand new WordPress blog shows 2553.

Say what???  Hey, I’m not going to worry about losing 4 posts. I’m just happy knowing that my first 10 years of blog posts are now on a WordPress blog running on a Linux server.