Using FeedAPI's API

You are here

Contact

Israel Office

+972-52-838-7222

+972-52-430-5252

Europe office

+33-695-805-004

03.09.2008
Using FeedAPI's API
submitted by: oren

FeedAPI is a great module, and easy to use if you want it to parse your RSS/RDF/ATOM feeds into nodes or lightweight items. There are quite a few posts that review that, but developer documentation for using FeedAPI's hook's can still be better. This post will be a walk-through of the code.

For leveraging it's capabilities, FeedAPI gives us two hooks: hook_feedapi_feed for creating a parser, and hook_feedapi_item for creating a processor. If you used FeedAPI before then these concepts are familiar to you- parser-common-syndication and simplepie are examples of parsers implementing these hooks, and feedapi_node and aggregator are examples of parsers.

When you wish to extend FeedAPI using the hooks, It is more likely that you'll want to implement a parser- say if your client wants to get data from a custom XML feed still unsupported by FeedAPI. So first, let's look at a simplified (removed caching bits) implementation of hook_feedapi_feed, courtesy of parser_common_syndication (PCS for brevity here on forward). I've bolded items you would change:

url; $downloaded_string = _parser_common_syndication_download($url, $op); if (is_object($downloaded_string)) { return $downloaded_string->type; } if (!defined('LIBXML_VERSION') || (version_compare(phpversion(), '5.1.0', '<'))) { @ $xml = simplexml_load_string($downloaded_string, NULL); } else { @ $xml = simplexml_load_string($downloaded_string, NULL, LIBXML_NOERROR | LIBXML_NOWARNING); } if (_parser_common_syndication_feed_format_detect($xml) != FALSE) { return array_shift(parser_common_syndication_feedapi_feed('type')); } return FALSE; case 'parse': $feed = is_object($args[1]) ? $args[1] : FALSE; $parsed_feed = _parser_common_syndication_feedapi_parse($feed); return $parsed_feed; } } ?>

The three main changes that you will want to do here are _parser_common_syndication_feed_format_detect, _parser_common_syndication_feedapi_parse, and the hook itself.

FeedAPI calls this hook after it instantiated a $feed object. When the hook is called, it is called with a number of arguments which we map through func_get_args. In the 'compatible' $op, we download the feed and inspect it to make sure that our parser recognizes it. PCS utilizes PHP's simplexml function to parse the feed, therefore the matching function calls. If all is well, we'll call the hook again with the 'type' $op that will return "XML feed". I guess the 'parse' op is self explanatory... here you'll fill in your version of feedapi_parse- let's call it myparser_parse().

so to start implementing your version, you'll use the hook (module_name_feedapi_feed), and change the call to _parser_common_syndication_feedapi_parse to myparser_parse(). You can decide for yourself if you want to implement the feed_detect function or not.

There are a couple of functions involved in the feed creation in PCS, here's a rundown- "_parser_common_syndication_feedapi_parse" (our main parser) calls "_parser_common_syndication_download" (makes some preliminary tests on the data) which calls "_parser_common_syndication_feedapi_get" (checks whether the feed exists in DB and changed from its copy in cache and if not- fetches the feed and checks it). This is all done in order to not waste time on parsing existing items or feed. You don't have to use them or implement your version of them to parse the data, but they make the process more efficient. if PCS is enabled, you can use calls to _parser_common_syndication_download() since they are still valid (this function is a helper function, not necessarily tied to a format!) instead of implementing it yourself. In essence- you can utilize PCS and just change the call to "feedapi_parse" and "feed_format_detect" which we will get to in a minute. now, take a look at:

url, 'parse'); if ($downloaded_string === FALSE || is_object($downloaded_string)) { return $downloaded_string; } if (!defined('LIBXML_VERSION') || (version_compare(phpversion(), '5.1.0', '<'))) { @ $xml = simplexml_load_string($downloaded_string, NULL); } else { @ $xml = simplexml_load_string($downloaded_string, NULL, LIBXML_NOERROR | LIBXML_NOWARNING); } // We got a malformed XML if ($xml === FALSE || $xml == NULL) { return FALSE; } } $feed_type = _parser_common_syndication_feed_format_detect($xml); if ($feed_type == "atom1.0") { return _parser_common_syndication_atom10_parse($xml); } if ($feed_type == "RSS2.0" || $feed_type == "RSS0.91" || $feed_type == "RSS0.92") { return _parser_common_syndication_RSS20_parse($xml); } if ($feed_type == "RDF") { return _parser_common_syndication_RDF10_parse($xml); } return FALSE; } ?>

On your implementation of the preliminary parser function (myparser_parse, in this example, please note that this is not a hook!), make sure you load the xml file- that can be done via simplexml_load_string (if you already have the file through PCS_download() or similar method) or simplexml_load_file. SimpleXML will create an object of the xml. Now comes in your implementation of feed recognition. You might use php's simplexml getName to recognize the first tag.

Once recognized, you can accordingly direct the feed to the "real" parser function if you have a number of them, or do the actual organization of data right here. In general, I guess most of the time you'll resort to leaving it like the original only with minor changes. If you look at _parser_common_syndication_feedapi_parse, you'll see that the main part you'll need to change is the calls for appropriate parsers. If you are implementing a call to just one, you can do the processing here as well.

The question is, how to parse the data. Remember that simpleXML created an object, so you'll be able to traverse it with foreach(), or by using PHP's simplexml_element->xpath() . note that some type casting might be in order here. You can take one of PCS's own parsers as a reference.

another note- If you pass the data to node_processor, and it doesn't recognize either a 'guid' or an 'original_url' field, it will ignore the data. So fill those in.

Well, that's about it! enjoy.... oh, and of course, thanks Aron.

Comments

If i understood what you described, then yes, that would be the way to go. You can also use a separate content type for regular rss items and specific ones.

Thanks, this writeup is really helpful. I've been messing around with parsing XML query results for the past few days. I managed to add a new type and parsing function to PCS, but I think I'd like to add an entirely parser module (based on PCS) for the sake of cleanliness. You gave me a nice start in doing that!

Add new comment