תשתיות תוכן קהילתיות: ייעוץ ,הדרכה ובניית אתרים ואפליקציות בדרופל

Using FeedAPI's API

FeedAPI is a great module, and easy to use if you want it to parse your RSS/RDF/ATOM feeds into nodes or lightweight items. There are quite a few posts that review that, but developer documentation for using FeedAPI's hook's can still be better. This post will be a walk-through of the code.

For leveraging it's capabilities, FeedAPI gives us two hooks: hook_feedapi_feed for creating a parser, and hook_feedapi_item for creating a processor. If you used FeedAPI before then these concepts are familiar to you- parser-common-syndication and simplepie are examples of parsers implementing these hooks, and feedapi_node and aggregator are examples of parsers.

When you wish to extend FeedAPI using the hooks, It is more likely that you'll want to implement a parser- say if your client wants to get data from a custom XML feed still unsupported by FeedAPI.
So first, let's look at a simplified (removed caching bits) implementation of hook_feedapi_feed, courtesy of parser_common_syndication (PCS for brevity here on forward). I've bolded items you would change:

<?php
function parser_common_syndication_feedapi_feed($op) {
 
$args = func_get_args();
  switch (
$op) {
    case
'type':
      return array(
"XML feed");
    case
'compatible':
      if (!
function_exists('simplexml_load_string')) {
        return
FALSE;
      }
     
$url = $args[1]->url;
     
$downloaded_string = _parser_common_syndication_download($url, $op);
      if (
is_object($downloaded_string)) {
        return
$downloaded_string->type;
      }
      if (!
defined('LIBXML_VERSION') || (version_compare(phpversion(), '5.1.0', '<'))) {
        @
$xml = simplexml_load_string($downloaded_string, NULL);
      }
      else {
        @
$xml = simplexml_load_string($downloaded_string, NULL, LIBXML_NOERROR | LIBXML_NOWARNING);
      }
      if (
_parser_common_syndication_feed_format_detect($xml) != FALSE) {
        return
array_shift(parser_common_syndication_feedapi_feed('type'));
      }
      return
FALSE;
    case
'parse':
     
$feed = is_object($args[1]) ? $args[1] : FALSE;
     
$parsed_feed = _parser_common_syndication_feedapi_parse($feed);
      return
$parsed_feed;
  }
}
?>

The three main changes that you will want to do here are _parser_common_syndication_feed_format_detect, _parser_common_syndication_feedapi_parse, and the hook itself.

FeedAPI calls this hook after it instantiated a $feed object. When the hook is called, it is called with a number of arguments which we map through func_get_args. In the 'compatible' $op, we download the feed and inspect it to make sure that our parser recognizes it. PCS utilizes PHP's simplexml function to parse the feed, therefore the matching function calls. If all is well, we'll call the hook again with the 'type' $op that will return "XML feed". I guess the 'parse' op is self explanatory... here you'll fill in your version of feedapi_parse- let's call it myparser_parse().

so to start implementing your version, you'll use the hook (module_name_feedapi_feed), and change the call to _parser_common_syndication_feedapi_parse to myparser_parse(). You can decide for yourself if you want to implement the feed_detect function or not.

There are a couple of functions involved in the feed creation in PCS, here's a rundown- "_parser_common_syndication_feedapi_parse" (our main parser) calls "_parser_common_syndication_download" (makes some preliminary tests on the data) which calls "_parser_common_syndication_feedapi_get" (checks whether the feed exists in DB and changed from its copy in cache and if not- fetches the feed and checks it). This is all done in order to not waste time on parsing existing items or feed. You don't have to use them or implement your version of them to parse the data, but they make the process more efficient. if PCS is enabled, you can use calls to _parser_common_syndication_download() since they are still valid (this function is a helper function, not necessarily tied to a format!) instead of implementing it yourself. In essence- you can utilize PCS and just change the call to "feedapi_parse" and "feed_format_detect" which we will get to in a minute.
now, take a look at:

<?php
function _myparser_parse($feed) {
  if (
is_a($feed, 'SimpleXMLElement')) {
   
$xml = $feed;
  }
  else {
   
$downloaded_string = _parser_common_syndication_download($feed->url, 'parse');
    if (
$downloaded_string === FALSE || is_object($downloaded_string)) {
      return
$downloaded_string;
    }

    if (!defined('LIBXML_VERSION') || (version_compare(phpversion(), '5.1.0', '<'))) {
      @
$xml = simplexml_load_string($downloaded_string, NULL);
    }
    else {
      @
$xml = simplexml_load_string($downloaded_string, NULL, LIBXML_NOERROR | LIBXML_NOWARNING);
    }

    // We got a malformed XML
   
if ($xml === FALSE || $xml == NULL) {
      return
FALSE;
    }
  }
 
$feed_type = _parser_common_syndication_feed_format_detect($xml);
  if (
$feed_type ==  "atom1.0") {
    return
_parser_common_syndication_atom10_parse($xml);
  }
  if (
$feed_type == "RSS2.0" || $feed_type == "RSS0.91" || $feed_type == "RSS0.92") {
    return
_parser_common_syndication_RSS20_parse($xml);
  }
  if (
$feed_type == "RDF") {
    return
_parser_common_syndication_RDF10_parse($xml);
  }
  return
FALSE;
}
?>

On your implementation of the preliminary parser function (myparser_parse, in this example, please note that this is not a hook!), make sure you load the xml file- that can be done via simplexml_load_string (if you already have the file through PCS_download() or similar method) or simplexml_load_file. SimpleXML will create an object of the xml. Now comes in your implementation of feed recognition. You might use php's simplexml getName to recognize the first tag.

Once recognized, you can accordingly direct the feed to the "real" parser function if you have a number of them, or do the actual organization of data right here. In general, I guess most of the time you'll resort to leaving it like the original only with minor changes. If you look at _parser_common_syndication_feedapi_parse, you'll see that the main part you'll need to change is the calls for appropriate parsers. If you are implementing a call to just one, you can do the processing here as well.

The question is, how to parse the data. Remember that simpleXML created an object, so you'll be able to traverse it with foreach(), or by using PHP's simplexml_element->xpath() .
note that some type casting might be in order here. You can take one of PCS's own parsers as a reference.

another note- If you pass the data to node_processor, and it doesn't recognize either a 'guid' or an 'original_url' field, it will ignore the data. So fill those in.

Well, that's about it! enjoy....
oh, and of course, thanks Aron.

Keywords:

Thanks for this! Very

Thanks for this! Very informative...
One question, though... Since we can't just choose a parser on a feed-by-feed basis, we basically have to enable any applicable parsers to the Feed content type, correct?

Let's say for example, I write a custom parser CNN and for MSNBC. Then I want to default to using the SimplePie parser for any other feeds.
I would just enable my CNN and MSNBC parsers, and make their weight less than SimplePie's, so they execute before it?

Then in my CNN (for example), I would do a check based on URL, or whatever, in the 'compatible' op. If it matches my conditions, then I can return True, Feed API wil call my parser back with 'type' $op, then again with the actual 'parse' op, right?
So if it's not a match for CNN, then it will move on to the next parser by weight, and on through the Enabled parsers?

Thanks so much for your help. Please email me at cmceldowney at databasepublish dot com so I can see your response.

If i understood what you

If i understood what you described, then yes, that would be the way to go. You can also use a separate content type for regular rss items and specific ones.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <b> <a> <p> <br> <em> <strong> <cite> <table> <tr> <td> <th> <tbody> <ul> <ol> <li> <dl> <dt> <dd><img> <div><h1> <h2> <h3> <h4>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

Drupal association member logo
Acquia Silver Partner logo

Introducing Linnovate

A brief overview about Linnovate spcialities and drupal connection.
Contact us to learn more...


Read this document on Scribd: linnovate-overview

מתיק העבודות

Leadel

לידל - Leadel - תמונת מסך

Leadel היא רשת חברתית של הקונגרס היהודי האירופי, והיא עוסקת בנושא "זהות".
הזהות באה לידי ביטוי באתר באופנים שונים: המרכיב המרכזי באתר הוא ראיון מצולם, שעל גביו מתפתח לאחר מכן דיון, בין חברי האתר. כמו כן, לכל חבר באתר יש "זהות" משלו, שמתבטאת על-ידי ציון תחומי העניין של המשתמש.