תשתיות תוכן קהילתיות: ייעוץ ,הדרכה ובניית אתרים ואפליקציות בדרופל

Using FeedAPI's API

FeedAPI is a great module, and easy to use if you want it to parse your RSS/RDF/ATOM feeds into nodes or lightweight items. There are quite a few posts that review that, but developer documentation for using FeedAPI's hook's can still be better. This post will be a walk-through of the code.

For leveraging it's capabilities, FeedAPI gives us two hooks: hook_feedapi_feed for creating a parser, and hook_feedapi_item for creating a processor. If you used FeedAPI before then these concepts are familiar to you- parser-common-syndication and simplepie are examples of parsers implementing these hooks, and feedapi_node and aggregator are examples of parsers.

When you wish to extend FeedAPI using the hooks, It is more likely that you'll want to implement a parser- say if your client wants to get data from a custom XML feed still unsupported by FeedAPI.
So first, let's look at a simplified (removed caching bits) implementation of hook_feedapi_feed, courtesy of parser_common_syndication (PCS for brevity here on forward). I've bolded items you would change:

<?php
function parser_common_syndication_feedapi_feed($op) {
 
$args = func_get_args();
  switch (
$op) {
    case
'type':
      return array(
"XML feed");
    case
'compatible':
      if (!
function_exists('simplexml_load_string')) {
        return
FALSE;
      }
     
$url = $args[1]->url;
     
$downloaded_string = _parser_common_syndication_download($url, $op);
      if (
is_object($downloaded_string)) {
        return
$downloaded_string->type;
      }
      if (!
defined('LIBXML_VERSION') || (version_compare(phpversion(), '5.1.0', '<'))) {
        @
$xml = simplexml_load_string($downloaded_string, NULL);
      }
      else {
        @
$xml = simplexml_load_string($downloaded_string, NULL, LIBXML_NOERROR | LIBXML_NOWARNING);
      }
      if (
_parser_common_syndication_feed_format_detect($xml) != FALSE) {
        return
array_shift(parser_common_syndication_feedapi_feed('type'));
      }
      return
FALSE;
    case
'parse':
     
$feed = is_object($args[1]) ? $args[1] : FALSE;
     
$parsed_feed = _parser_common_syndication_feedapi_parse($feed);
      return
$parsed_feed;
  }
}
?>

The three main changes that you will want to do here are _parser_common_syndication_feed_format_detect, _parser_common_syndication_feedapi_parse, and the hook itself.

FeedAPI calls this hook after it instantiated a $feed object. When the hook is called, it is called with a number of arguments which we map through func_get_args. In the 'compatible' $op, we download the feed and inspect it to make sure that our parser recognizes it. PCS utilizes PHP's simplexml function to parse the feed, therefore the matching function calls. If all is well, we'll call the hook again with the 'type' $op that will return "XML feed". I guess the 'parse' op is self explanatory... here you'll fill in your version of feedapi_parse- let's call it myparser_parse().

so to start implementing your version, you'll use the hook (module_name_feedapi_feed), and change the call to _parser_common_syndication_feedapi_parse to myparser_parse(). You can decide for yourself if you want to implement the feed_detect function or not.

There are a couple of functions involved in the feed creation in PCS, here's a rundown- "_parser_common_syndication_feedapi_parse" (our main parser) calls "_parser_common_syndication_download" (makes some preliminary tests on the data) which calls "_parser_common_syndication_feedapi_get" (checks whether the feed exists in DB and changed from its copy in cache and if not- fetches the feed and checks it). This is all done in order to not waste time on parsing existing items or feed. You don't have to use them or implement your version of them to parse the data, but they make the process more efficient. if PCS is enabled, you can use calls to _parser_common_syndication_download() since they are still valid (this function is a helper function, not necessarily tied to a format!) instead of implementing it yourself. In essence- you can utilize PCS and just change the call to "feedapi_parse" and "feed_format_detect" which we will get to in a minute.
now, take a look at:

<?php
function _myparser_parse($feed) {
  if (
is_a($feed, 'SimpleXMLElement')) {
   
$xml = $feed;
  }
  else {
   
$downloaded_string = _parser_common_syndication_download($feed->url, 'parse');
    if (
$downloaded_string === FALSE || is_object($downloaded_string)) {
      return
$downloaded_string;
    }

    if (!defined('LIBXML_VERSION') || (version_compare(phpversion(), '5.1.0', '<'))) {
      @
$xml = simplexml_load_string($downloaded_string, NULL);
    }
    else {
      @
$xml = simplexml_load_string($downloaded_string, NULL, LIBXML_NOERROR | LIBXML_NOWARNING);
    }

    // We got a malformed XML
   
if ($xml === FALSE || $xml == NULL) {
      return
FALSE;
    }
  }
 
$feed_type = _parser_common_syndication_feed_format_detect($xml);
  if (
$feed_type ==  "atom1.0") {
    return
_parser_common_syndication_atom10_parse($xml);
  }
  if (
$feed_type == "RSS2.0" || $feed_type == "RSS0.91" || $feed_type == "RSS0.92") {
    return
_parser_common_syndication_RSS20_parse($xml);
  }
  if (
$feed_type == "RDF") {
    return
_parser_common_syndication_RDF10_parse($xml);
  }
  return
FALSE;
}
?>

On your implementation of the preliminary parser function (myparser_parse, in this example, please note that this is not a hook!), make sure you load the xml file- that can be done via simplexml_load_string (if you already have the file through PCS_download() or similar method) or simplexml_load_file. SimpleXML will create an object of the xml. Now comes in your implementation of feed recognition. You might use php's simplexml getName to recognize the first tag.

Once recognized, you can accordingly direct the feed to the "real" parser function if you have a number of them, or do the actual organization of data right here. In general, I guess most of the time you'll resort to leaving it like the original only with minor changes. If you look at _parser_common_syndication_feedapi_parse, you'll see that the main part you'll need to change is the calls for appropriate parsers. If you are implementing a call to just one, you can do the processing here as well.

The question is, how to parse the data. Remember that simpleXML created an object, so you'll be able to traverse it with foreach(), or by using PHP's simplexml_element->xpath() .
note that some type casting might be in order here. You can take one of PCS's own parsers as a reference.

another note- If you pass the data to node_processor, and it doesn't recognize either a 'guid' or an 'original_url' field, it will ignore the data. So fill those in.

Well, that's about it! enjoy....
oh, and of course, thanks Aron.

Keywords:

If i understood what you

If i understood what you described, then yes, that would be the way to go. You can also use a separate content type for regular rss items and specific ones.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <b> <a> <p> <br> <em> <strong> <cite> <table> <tr> <td> <th> <tbody> <ul> <ol> <li> <dl> <dt> <dd><img> <div><h1> <h2> <h3> <h4>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

Drupal association member logo
Acquia Silver Partner logo

Introducing Linnovate

A brief overview about Linnovate spcialities and drupal connection.
Contact us to learn more...


Read this document on Scribd: linnovate-overview

מתיק העבודות

נאמני תעסוקה למען עובדי הקבלן

פרוייקט "נאמני תעסוקה" הוא פרויקט שיזמה עמותת "במעגלי צדק" ומטרתו לשנות את היחס לעובדי הקבלן (המנקים והשומרים) במוסדות החינוך. האתר מספק מידע נרחב על תופעת ניצול עובדי הקבלן בישראל: מאמרים, כתבות, מידע על זכויות ועוד.

Click the images below for larger versions: