Using Goodreads Data
So far, we have created a basic module that uses hook_block() to add block content and installed this basic module. As it stands, however, this module does no more than simply displaying a few lines of static text.
In this article, we are going to extend the module's functionality. We will add a few new functions that retrieve and format data from Goodreads.
Goodreads makes data available in an XML format based on RSS 2.0. The XML content is retrieved over HTTP (HyperText Transport Protocol), the protocol that web browsers use to retrieve web pages. To enable this module to get Goodreads content, we will have to write some code to retrieve data over HTTP and then parse the retrieved XML.
Our first change will be to make a few modifications to goodreads_block().
Modifying the Block Hook
We could cram all of our new code into the existing goodreads_block() hook; however, this would make the function cumbersome to read and difficult to maintain. Rather than adding significant code here, we will just call another function that will perform another part of the work.
/** * Implementation of hook_block */function goodreads_block($op='list' , $delta=0, $edit=array()) { switch ($op) { case 'list': $blocks[0]['info'] = t('Goodreads Bookshelf'); return $blocks; case 'view': $url = 'http://www.goodreads.com/review/list_rss/' .'398385' .'?shelf=' .'history-of-philosophy'; $blocks['subject'] = t('On the Bookshelf'); $blocks['content'] = _goodreads_fetch_bookshelf($url); return $blocks; }}
The preceding code should look familiar. This is our hook implementation as seen in the previous article. However, we have made a few modifications, indicated by the highlighted lines.
First, we have added a variable, $url, whose value is the URL of the Goodreads XML feed we will be using (http://www.goodreads.com/review/list_rss/398385?shelf=history-of-philosophy). In a completely finished module, we would want this to be a configurable parameter, but for now we will leave it hard-coded.
The second change has to do with where the module is getting its content. Previously, the function was setting the content to t('Temporary content'). Now it is calling another function: _goodreads_fetch_bookshelf($url).
The leading underscore here indicates that this function is a private function of our module—it is a function not intended to be called by any piece of code outside of the module. Demarcating a function as private by using the initial underscore is another Drupal convention that you should employ in your own code.
Let's take a look at the _goodreads_fetch_bookshelf() function.
Retrieving XML Content over HTTP
The job of the _goodreads_fetch_bookshelf() function is to retrieve the XML content using an HTTP connection to the Goodreads site. Once it has done that, it will hand over the job of formatting to another function.
Here's a first look at the function in its entirety:
/** * Retrieve information from the Goodreads bookshelp XML API. * * This makes an HTTP connection to the given URL, and * retrieves XML data, which it then attempts to format * for display. * * @param $url * URL to the goodreads bookshelf. * @param $num_items * Number of items to include in results. * @return * String containing the bookshelf. */function _goodreads_fetch_bookshelf($url, $num_items=3) { $http_result = drupal_http_request($url); if ($http_result->code == 200) { $doc = simplexml_load_string($http_result->data); if ($doc === false) { $msg = "Error parsing bookshelf XML for %url: %msg."; $vars = array('%url'=>$url, '%msg'=>$e->getMessage()); watchdog('goodreads', $msg, $vars, WATCHDOG_WARNING); return t("Getting the bookshelf resulted in an error."); } return _goodreads_block_content($doc, $num_items); // Otherwise we don't have any data}else { $msg = 'No content from %url.'; $vars = array('%url' => $url); watchdog('goodreads', $msg, $vars, WATCHDOG_WARNING); return t("The bookshelf is not accessible."); }}
Let's take a closer look.
Following the Drupal coding conventions, the first thing in the above code is an API description:
/** * Retrieve information from the Goodreads bookshelp XML API. * * This makes an HTTP connection to the given URL, and retrieves * XML data, which it then attempts to format for display. * * @param $url * URL to the goodreads bookshelf. * @param $num_items * Number of items to include in results. * @return * String containing the bookshelf. */
This represents the typical function documentation block. It begins with a one-sentence overview of the function. This first sentence is usually followed by a few more sentences clarifying what the function does.
Near the end of the docblock, special keywords (preceded by the @ sign) are used to document the parameters and possible return values for this function.
@param: The @param keyword is used to document a parameter and it follows the following format: @param <variable name> <description>. The description should indicate what data type is expected in this parameter.
@return: This keyword documents what type of return value one can expect from this function. It follows the format: @return <description>.
This sort of documentation should be used for any module function that is not an implementation of a hook.
Now we will look at the method itself, starting with the first few lines.
function _goodreads_fetch_bookshelf($url, $num_items=3) { $http_result = drupal_http_request($url);
This function expects as many as two parameters. The required $url parameter should contain the URL of the remote site, and the optional $num_items parameter should indicate the maximum number of items to be returned from the feed.
While we don't make use of the $num_items parameter when we call _goodreads_fetch_bookshelf() this would also be a good thing to add to the module's configurable parameters.
The first thing the function does is use the Drupal built-in drupal_http_request() function found in the includes/common.php library. This function makes an HTTP connection to a remote site using the supplied URL and then performs an HTTP GET request.
The drupal_http_request() function returns an object that contains the response code (from the server or the socket library), the HTTP headers, and the data returned by the remote server.
Drupal is occasionally criticized for not using the object-oriented features of PHP. In fact, it does—but less overtly than many other projects. Constructors are rarely used, but objects are employed throughout the framework. Here, for example, an object is returned by a core Drupal function.
When the drupal_http_request() function has executed, the $http_result object will contain the returned information. The first thing we need to find out is whether the HTTP request was successful—whether it connected and retrieved the data we expect it to get.
We can get this information from the response code, which will be set to a negative number if there was a networking error, and set to one of the HTTP response codes if the connection was successful.
We know that if the server responds with the 200 (OK) code, it means that we have received some data.
In a more robust application, we might also check for redirect messages (301, 302, 303, and 307) and other similar conditions. With a little more code, we could configure the module to follow redirects.
Our simple module will simply treat any other response code as indicating an error:
if ($http_result->code == 200) { // ...Process response code goes here... // Otherwise we don't have any data} else { $msg = 'No content from %url.'; $vars = array( '%url' => $url ); watchdog('goodreads', $msg, $vars, WATCHDOG_WARNING); return t("The bookshelf is not accessible.");}
First let's look at what happens if the response code is something other than 200:
} else { $msg = 'No content from %url.'; $vars = array( '%url' => $url ); watchdog('goodreads', $msg, $vars, WATCHDOG_WARNING); return t("The bookshelf is not accessible.");}
We want to do two things when a request fails: we want to log an error, and then notify the user (in a friendly way) that we could not get the content. Let's take a glance at Drupal's logging mechanism.
The watchdog() Function
Another important core Drupal function is the watchdog() function. It provides a logging mechanism for Drupal.
Customize your loggingDrupal provides a hook (hook_watchdog()) that can be implemented to customize what logging actions are taken when a message is logged using watchdog(). By default, Drupal logs to a designated database table. You can view this log in the administration section by going to Administer | Logs.
The watchdog() function gathers all the necessary logging information and fires off the appropriate logging event.
The first parameter of the watchdog() function is the logging category. Typically, modules should use the module name (goodreads in this case) as the logging category. In this way, finding module-specific errors will be easier.
The second and third watchdog parameters are the text of the message ($msg above) and an associative array of data ($vars) that should be substituted into the $msg. These substitutions are done following the same translation rules used by the t() function. Just like with the t() function's substitution array, placeholders should begin with !, @, or %, depending on the level of escaping you need.
So in the preceding example, the contents of the $url variable will be substituted into $msg in place of the %url marker.
Finally, the last parameter in the watchdog() function is a constant that indicates the log message's priority, that is, how important it is.
There are eight different constants that can be passed to this function:
WATCHDOG_EMERG: The system is now in an unusable state.
WATCHDOG_ALERT: Something must be done immediately.
WATCHDOG_CRITICAL: The application is in a critical state.
WATCHDOG_ERROR: An error occurred.
WATCHDOG_WARNING: Something unexpected (and negative) happened, but didn't cause any serious problems.
WATCHDOG_NOTICE: Something significant (but not bad) happened.
WATCHDOG_INFO: Information can be logged.
WATCHDOG_DEBUG: Debugging information can be logged.
Depending on the logging configuration, not all these messages will show up in the log.
The WATCHDOG_ERROR and WATCHDOG_WARNING levels are usually the most useful for module developers to record errors. Most modules do not contain code significant enough to cause general problems with Drupal, and the upper three log levels (alert, critical, and emergency) should probably not be used unless Drupal itself is in a bad state.
There is an optional fifth parameter to watchdog(), usually called $link, which allows you to pass in an associated URL. Logging back ends may use that to generate links embedded within logging messages.
The last thing we want to do in the case of an error is return an error message that can be displayed on the site. This is simply done by returning a (possibly translated) string:
return t("The bookshelf is not accessible.");
We've handled the case where retrieving the data failed. Now let's turn our attention to the case where the HTTP request was successful.
Processing the HTTP Results
When the result code of our request is 200, we know the web transaction was successful. The content may or may not be what we expect, but we have good reason to believe that no error occurred while retrieving the XML document.
So, in this case, we continue processing the information:
if ($http_result->code == 200) { // ... Processing response here... $doc = simplexml_load_string($http_result->data); if ($doc === false) { $msg = "Error parsing bookshelf XML for %url: %msg."; $vars = array('%url'=>$url, '%msg'=>$e->getMessage()); watchdog('goodreads', $msg, $vars, WATCHDOG_WARNING); return t("Getting the bookshelf resulted in an error."); } return _goodreads_block_content($doc, $num_items); // Otherwise we don't have any data} else { // ... Error handling that we just looked at.
In the above example, we use the PHP 5 SimpleXML library. SimpleXML provides a set of convenient and easy-to-use tools for handling XML content. This library is not present in the now-deprecated PHP 4 language version.
For compatibility with outdated versions of PHP, Drupal code often uses the Expat parser, a venerable old event-based XML parser supported since PHP 4 was introduced. Drupal even includes a wrapper function for creating an Expat parser instance. However, writing the event handlers is time consuming and repetitive. SimpleXML gives us an easier interface and requires much less coding.
For an example of using the Expat event-based method for handling XML documents, see the built-in Aggregator module. For detailed documentation on using Expat, see the official PHP documentation: http://php.net/manual/en/ref.xml.php.
We will parse the XML using simplexml_load_string(). If parsing is successful, the function returns a SimpleXML object. However, if parsing fails, it will return false.
In our code, we check for a false. If one is found, we log an error and return a friendly error message. But if the Goodreads XML document was parsed properly, this function will call another function in our module, _goodreads_block_content(). This function will build some content from the XML data.
Read more