Build An Aggregation Site With Drupal (Part 1)
This tutorial will be split into three parts - part 1 (this part) will explain how to set up the aggregation and import feeds, part 2 (to be published next post) will explain setting up cron to handle auto updating the feeds and will also cover using views to create some different site sections, and part 3 (to be published the post after that) will explain how to theme everything. In the tutorial I will be building a Drupal based sports news aggregation site, but you can obviously tailor this to whatever type of news items you'd like.
The goals:
- Create an aggregation site which aggregates RSS feeds and outputs them in river of news style pages with the most recent news items first.
- Create some different site sections (football and baseball) which only show news items related to that topic.
- Allow users to filter news items by source (e.g. ESPN, BBC etc.).
- Create RSS feeds of our aggregated pages which are available for our users.
You can check out the finished aggregation site (part 1 + part 2) here.
The set up:
For this tutorial I'll be using the following:
- A clean install of Drupal 5.10 (using Garland)
- SimpleFeed 5.x-2.2
- Views 5.x-1.6
A quick word on SimpleFeed vs other aggregation modules:
There are a number of other aggregation modules available for Drupal. From my own experience the two best are SimpleFeed and FeedAPI. FeedAPI has excellent functionality and can do some very cool stuff (for example, check out this video on drupaltherapy.com which shows how to use FeedAPI and feed element mapping). However, in this case I've chosen to use SimpleFeed because I don't require any of this extra functionality and SimpleFeed is, well, the simplest to use.
Step 1: Set up the site and modules
Set up your Drupal site and then download and install the SimpleFeed module and the Views module. Select the following options for each module:

Step 2: Install the missing simplepie.inc file
In order for SimpleFeed to work correctly it requires that we place the simplepie.inc file from the SimplePie library into our SimpleFeed module directory. If you currently go to your status report (admin/logs/status) you'll see the following error alerting you to this fact:

So, to sort this out do the following:
- Go to simplepie.org and download the latest version of SimplePie by clicking on the big 'Download' button (at the time of writing this is SimplePie version 1.1.1).
- Extract the contents of the download, which will create a folder named 'SimplePie 1.1.1'.
- Open this folder, locate the simplepie.inc file, and copy it into your SimpleFeed module folder. So you should have 'sites/all/modules/simplefeed/simplepie.inc'.
Now when you check your status report page you should not see any errors.
Step 3: Set up a vocabulary
In order to theme our news items more effectively, and to help us with filtering and sorting news items, we're going to assign taxonomy terms to them. SimpleFeed includes auto-assign functionality for taxonomy terms which will be very helpful here.
First, go to the 'Add vocabulary' subsection of the 'Categories' admin section (admin/content/taxonomy/add/vocabulary). Then create a vocabulary with the following settings:
- Vocabulary name: Source
- Types: check 'Feed' and 'Feed Item'
- Check 'Free tagging'

Step 4: Configure your SimpleFeed settings
There are a few places where we can configure settings for SimpleFeed:
- SimpleFeed settings page (admin/settings/simplefeed)
- Access control 'simplefeed module' settings (admin/user/access)
- 'Feed' content type settings page (admin/content/types/feed)
- 'Feed Item' content type settings page (admin/content/types/feed-item)
SimpleFeed settings page
The SimpleFeed settings (admin/settings/simplefeed) are fairly self explanatory. In this
case we will use the following settings:

'Discard feed items older than:'
we always want our feed items to be available to users so we set this to 'Never'.
'Check feeds every:'
1 hour is good here as sports news is quite frequent, so we want to check for updates
often.
'Default input format:'
we'll start be leaving this as the default 'Filtered HTML' option which will filter out
any HTML tags that are not specified in the input format settings (admin/settings/filters).
However, with RSS feeds you can find that unclosed tags in a feed item will have an
adverse effect on the rest of your page so you may need to remove further tags options
after some testing.
'Vocabulary'
set this to 'Source' which was the vocabulary we set up in the previous step. This will
allow feed items to automatically inherit their parent's taxonomy terms.
'Automatically add categories set by external feeds to the vocabulary above'
we will leave this unchecked as we want to tightly control the taxonomy.
'Cron throttle'
the default 50 will be plenty for now!
Access control 'simplefeed module' settings
For this tutorial we're not going to change anything here, but you may want to depending upon the site usage.
'Feed' content type settings page
By default both the 'Feed' content type and the 'Feed Item' content type have their 'Default comment setting:' option set to 'Read/Write'. In this case we don't want users commenting on either so we'll change them to 'Disabled'.
To do so, first go to the 'Feed' content type settings page (admin/content/types/feed) and
scroll down to the 'Workflow' section. Then just change the 'Default comment setting:'
option to 'Disabled'.

'Feed Item' content type settings page
Do the same as for the 'Feed' content type and set the 'Default comment setting:' option to
'Disabled'.
Step 5: Find some RSS feeds
As this is going to be a sports news aggregation site I've gathered together the following RSS feed URLs and chosen a taxonomy term for each:
- BBC Sport Front Page
- http://newsrss.bbc.co.uk/rss/sportonline_world_edition/front_page/rss.xm...
- taxonomy term = BBC - BBC Sport Football
- http://newsrss.bbc.co.uk/rss/sportonline_world_edition/football/rss.xml
- taxonomy term = BBC Football - ESPN Front Page
- http://sports-ak.espn.go.com/espn/rss/news
- taxonomy term = ESPN - ESPN Baseball
- http://sports-ak.espn.go.com/espn/rss/mlb/news
- taxonomy term = ESPN Baseball - MLB.com Headlines
- http://mlb.mlb.com/partnerxml/gen/news/rss/mlb.xml
- taxonomy term = MLB
Just remember to check the terms of use sections on each of the websites regarding the republishing of feed content.
Step 6: Add your RSS feeds
Now everything is set up, let's add some feeds!
Go to 'Create content > Feed'. We'll start by adding the BBC sport front page feed with the following settings (the feed URL and taxonomy term are taken from Step 5 above):

We've already taken care of all of the other settings (like input format, comment settings), so hit 'Submit'. The feed will be created and you should get the following confirmation screen:

Now we need to actually import the feed items.
We could wait for our cron job (which we'll set up in part 2 of the tutorial) to fire and trigger the auto update of the feed for us, but for now we'll do it manually.
So, click on the 'Refresh this feed' link and SimpleFeed will look for, and import, the new feed items. SimpleFeed imports each feed item as a node. After a second or two it should have found the new items, created the nodes, and output the following success message:

One quick thing to note here is the first line 'The directory files/cache_simplefeed has been created'. If file permissions for this folder are not set correctly on your server it can cause cron errors, but we'll deal with this later on if it's a problem.
Finished! (Part 1)
If you now check the front page of your site you should see all of the feed items from the BBC front page RSS feed. Go ahead and add the other feeds the same way as the BCC feed (using the feed URLs and taxonomy terms from Step 5 above). The site won't do a lot yet, though, so we'll sort that out in part 2.
Also in this series
Build An Aggregation Site With Drupal (Part 2), where we set up cron to handle auto updating the feeds and also use views to create some different sports sections (football and baseball) and RSS feeds.
Coming soon...
In part 3 we'll get to theming everything.





Hi, I'm Laurence and this is my Drupal blog.
Don't Make Me Think!
Pro Drupal Development
PHP Cookbook
Will You Please Be Quiet, Please?
Looks great! I can't wait for you to finish with the whole series! :-)
Is there a reason you did not use the FeedAPI and chose SimpleFeed instead?
this is very timely - I want to learn how to do exactly this, without a lot of trial-and-error time cost. thanks so much! looking forward to Part 2 and Part 3, with bated breath...
any chance you could write a bit about workflow options, node queue, etc. I would like to do something like this, but not automatically publish the nodes, but rather put them into moderation.
Looking forward to reading more, thanks for taking the time to write this up.
I must admit that I was surprised to see a new guide/tutorial that is based on Drupal 5 and not on the latest Drupal 6. Is there a specific reason for that? Are there some obstacles with any of the modules to deliver the same functionality with Drupal 6?
I've been also trying to aggregate feeds with an emphasis on feeds with images and media files (videos). I’m working with Drupal 6. Last night I have tried a combination of FeedAPI and Embedded Media Field modules, but it seems it’s not possible without the Feedapi_Mapper module, which has not been ported to Drupal 6 yet.
Are you planning to cover how to display images and/or videos referenced by the feed items in your guide?
Anyway, I’m looking forward to parts 2 and 3.
Thank you.
Hi everyone, thanks for the comments so far.
@ Scifiguy - yes it was due to the requirements for the site. I talked about the decision to use SimpleFeed, as opposed to FeedAPI, near the beginning of the tutorial - check that out for more detail.
@ Guest (comment #3) - I hadn't planned on this but it sounds like a good idea. I think what I will do is gather a list of extra suggestions from people whilst I publish the first 3 parts and then cover them all in a follow up 4th part post.
@ Leonid - regarding using Drupal v5 instead of v6 you kind of answered the question already. SimpleFeed does not have a Drupal 6 site ready version available yet, so I went with Drupal 5.
As for displaying images and videos - I won't be covering this in the tutorial. You really need to use FeedAPI (as you already are) to do that kind of thing more easily.
Thank you for doing this series! Looking forward to seeing how you will do it.
@ Leonid As far as I know, neither FeedAPI nor Simplefeed have support for Views 2 yet in Drupal 6. That adds a big hiccup to this whole process of displaying things back. I'm sure it's coming but it is not there yet.
@Leonid Quick correction: FeedAPI just got the code for View 2 support in the Drupal 6 branch committed and it will be in future releases soon. http://drupal.org/node/238851
Looking forward to making this work for a site I have, but I ran into a little problem. I don't see the URL option anywhere on my feed page. I went to 'Create content > Feed' as you show in step 6, but it's not there. Funny, the URL field is there when I go to 'Create content > Feed Item'. Any ideas?
I'm having the same issue as Kate. Perhaps its because I'm using the Drupal 6 version? This page seems to indicate that it doesn't work... but there is a dev release on the simplefeed page.
Yes, I have also the same problem like Kate. Using Drual 5.11 with simplefeed 5.x-3.1 and there is no URL field. No way to create a feed without URL. Any ideas?
Post new comment