Migrating the Drupal way. Part I: creating a node.

Kevin Hankens's picture

My position with Acquia will find me helping out with a lot of migrations and upgrades. I'm going to embark on a multiple-part blog to discuss some of the common techniques that I use when moving clients to Drupal.

Migrating to Drupal can seem intimidating if you already maintain a database-driven website. However, populating a Drupal site with your current content might be easier than you think. Whether you are migrating from a popular CMS or a fully custom application, you can easily use Drupal modules to mimic your current data structures and migrate your data using a simple custom PHP script. I should note that while there are several different methods to accomplish this task, this happens to be my favorite.

When interacting with Drupal, it's a good idea to do things the Drupal way. Fortunately, Drupal core allows you to bootstrap Drupal and use all of its API functionality outside of a normal Drupal instance. For yours truly, learning about this has been a godsend because it provides a fast, simple way to migrate data.

Creating a basic node

When writing an import script, you will need to bootstrap Drupal to use the API functions. Using drupal_bootstrap($phase), you can load Drupal up to a certain loading phase by designating a $phase argument. The value of $phase allows you to specifically load the site configuration, database layer, modules and other requisite functionality. For our purposes, we will use drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL) to make sure that we have access to the whole API.

Note: Make sure that you create this script in the root of your Drupal installation.

<?php
// Bootstrap Drupal
require 'includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
?>

For a simple example, we will create a basic node object and save it in our Drupal database using node_save().

<?php
// Construct the new node object.
$node = new stdClass();

// Your script will probably pull this information from a database.
$node->title = "My imported node";
$node->body = "The body of my imported node.\n\nAdditional Information";
$node->type = 'story';   // Your specified content type
$node->created = time();
$node->changed = $node->created;
$node->status = 1;
$node->promote = 1;
$node->sticky = 0;
$node->format = 1;       // Filtered HTML
$node->uid = 1;          // UID of content owner
$node->language = 'en';
// If known, the taxonomy TID values can be added as an array.
$node->taxonomy = array(2,3,1,);

node_save($node);
?>

Creating more complex nodes

The script above will create a new node with a title and body that is published and promoted to the homepage. However, the process becomes more slightly more complicated if you have more data than simple title and body fields. The CCK module is a popular method to extend your nodes by adding any number of custom fields. When Drupal displays your content, CCK adds your custom fields to the node object using hook_nodeapi(). Luckily, you can replicate this by adding your own fields in the import script. So, how can you find out the structure of these fields? One really easy method is to use the Devel module.

CCK Custom FieldsThe Devel module can be used to show how Drupal sees your node object

Using the Devel module

The Devel module is a great way to see, among other things, the structure of the node object which is invaluable in this case. After installing the module and viewing a node you will see new tabs: Dev load and Dev render. Click the Dev load tab, then click the "... (Object) stdClass" header to expand the node object definition. Here you will find some familiar data like nid, type, etc. Near the bottom, you will see some other definitions that begin with "field_". These should resemble the CCK fields that you created for your node type.

Depending on your CCK definitions, the assignments in your import script might look like one of the following:

Devel Load Module TabHere you can see some examples of how CCK has added fields to the node object

<?php
$node
->field_text_field[0]['value'] = "value 1";
$node->field_text_field[1]['value'] = "2nd value";
$node->field_nodereference[0]['nid'] = 58;
?>

Add these assignments to your import script and you will start to see the power of the Drupal API. Let's say you are migrating from another CMS with a number of related fields, categories, images, etc. You could expand this script to iterate through your old database and map all of the related elements to a corresponding node object. Execute your script, and all of your old data will now become Drupal data! The best part about using the API is that it takes care of all of everything from search indexing to path aliases and all of the other little things we might overlook.

Migrating to Drupal can seem like a daunting task, but when doing things the Drupal way it's quite straight forward. Whether you are planning a migration of 100 nodes or 100,000 nodes, proper scripting can make it seem like a breeze!

node_save is one way to

node_save is one way to create the node, but my preference is drupal_execute which has the benefit of creating the node in a more Drupalish way (i.e. executing the validation from modules that care about the node prior to it being saved). node_save is probably faster, but I'd rather have valid data than fast data.

There is a really good guide about this on http://drupal.org/node/178506#comment-895418

Also I'm sad we never got to meet while you were in Boulder (you were out here, right? or did you just work for velonews from somewhere else?). Hopefully we'll get to meet up in D.C.

One trick I've picked up

One trick I've picked up after a myriad of different imports, is node_object_prepare.

Take for example your code above. If I just want to fill in status, promote, and sticky, and set the date to the current time;

<?php
// Construct the new node object.
$node = new stdClass();
$node->type = 'story';   // Your specified content type

node_object_prepare($node); // just filled in default values for uid, status, promote, status, date, created, and revision properties

// Your script will probably pull this information from a database.
$node->title = "My imported node";
$node->body = "The body of my imported node.\n\nAdditional Information";

// SNIP
?>

This usually sets defaults for most items, and you can always override the uid, date or other items later. The biggest benefit of this, is the invocation of hook_prepare and hook_nodeapi (with op 'prepare'). If you have other modules that take advantage of these hooks, and want them to work on our imported nodes, then you will need to call node_object_prepare with your node, at some point in your import script.

Note that you could just as easily call this from the end of your import script as well, but then you wouldn't have the chance to override it's values.

For information, I just add

For information, I just add this line at the begining of the script:

<?php
require 'modules/node/node.pages.inc';
?>

to be able to use node_object_prepare

Thx guys for your tricks, you save my day !

Awesome tips, thanks guys!

Awesome tips, thanks guys!

@Greg, I was only in Boulder for a year - hardly enough time to even settle in :) Definitely look me up in DC!

Thanks for the tips,

Thanks for the tips, everyone!

I'm having trouble using node_save with a cck nodereference field (on Drupal 5.x). Anyone know of further documentation on that?

A number of entries at drupal.org mention the problem (e.g. http://drupal.org/node/275754 ) but haven't helped. It seems that using a select list on a custom form to set the value will work, but programatically setting the value using the syntax suggested above $node->field_nodereference[0]['nid'] = 58; will fail. Advice?

Did you stumble upon the CCK

Did you stumble upon the CCK import doc? Specifically check out the comment on clearing the cache.

The code above works fine in 6.x, but I didn't test it in 5. In the past, I've done nodereference imports for 5.x by populating the database field manually after creating the node. You could use something like the above script and then add:

<?php
// create your node without populating the nodereference fields

// check your schema for the proper table and column
$result = db_query("UPDATE content_field_noderef SET field_noderef_nid = %d WHERE nid = %d AND delta = %d LIMIT 1", $referenced_nid, $nid, $delta);
?>

The only problem I ran into is addressed in the above links.

Good luck!

Thanks for the summary,

Thanks for the summary, Kevin: the little fiddly bits like status can cause node_save to fail silently, and it's really hard with just the Devel module alone to work out precisely what's the bare minimum needed to save a node.

If anyone's interested, Node factory is really good at handling the bare bones of node creation (basically the second PHP block in your post). It's still considered bleeding-edge by its maintainer, because I think he wants to nail CCK support, but for one-time imports I'd happily use it to set up basic nodes without reservation.

Another big point, in the

Another big point, in the drupal_execute() vs. node_save() decision, is that using node_save() will bypass most validation for nodes such as required and non-required fields, allowed values for CCK fields, length of fields and any custom validation you've added in a custom module.

This can be a big advantage or a huge shock depending on how you look at it. I tend to prefer using node_save() over drupal_execute() for this reason. I don't particularly care for sanitizing my clients data for them, unless we agree to not import anything that doesn't validate, but many times in the course of importing there is an occasional missing field which can throw errors with drupal_execute().

This is really great.

This is really great. Seeing the thinking here on migration is really useful. I hope that you write some more blog entries.

Thanks!

Its great! Thanks bet hints

Its great! Thanks bet hints