At the Forge: Bloglines Web Services
Last month, we looked at ways in which we can gather, or aggregate, content from a number of different Web sites and put together a single summary of the day's news. Although it was amazing to see how much we could accomplish with a little bit of code, the application I presented is merely a toy when compared with actual aggregators. My example application supports only one user, is controlled by a primitive configuration file, doesn't categorize Weblogs into groups, checks for Weblog updates only when we explicitly ask it to do so and doesn't check for or handle errors.
Creating a robust, user-friendly aggregator is beyond the scope of this column, given the attention to technical and design details that would be necessary. But several days before I sat down to write this column, something amazing happened. The free, Web-based Bloglines.com aggregation service, which many people use to keep track of their favorite Weblogs, announced the availability of a Web service API that allows independent developers to create and deploy applications that use the data and applications developed by Bloglines. The publication and availability of the Bloglines API marks the growing popularity of Web services among well-known sites and opens the door to new applications built on the underlying Bloglines infrastructure.
This month, we take a look at the Bloglines API, including the creation of a simple application based on it. The API is brand new as of this writing (early October 2004) and undoubtedly will evolve as more people use it. If Weblogs interest you, and if you still are waiting to see practical uses for Web services, this combination of events might have come just in time.
The basic idea behind Web services is quite simple: the Web's success is due in no small part to the fact that the client and server operating systems are irrelevant. So long as the client and server adhere to the HTTP and HTML specifications, they can communicate seamlessly. Linux has made inroads into the server space precisely for this reason.
Web services take this one step further, saying that computers and not people should be the biggest users of the Web. Although computers exchange information over HTTP, they send and receive data in XML, the markup language or meta-language, that has caught on like wildfire in recent years. If my computer can send XML in the HTTP request it sends to your computer, and your computer then returns XML in its HTTP response, we can exchange information regardless of what languages and operating systems we're using.
The original form of this service, known as XML-RPC, still exists and is great for fast, easy communication. But this idea was extended further, and a variety of data types, error-checking mechanisms and object serialization techniques were introduced that XML-RPC lacked. This extension became known as SOAP (Simple Object Access Protocol). SOAP theoretically can run on top of a variety of protocols, but it most often is sent on top of HTTP.
SOAP is a great solution to many problems, except that it is terribly complex, can be slow and is difficult to implement. And, both XML-RPC and SOAP require that the HTTP request include a well-formed XML request containing the query. One response to this growing complexity is REST (representational state transfer), in which all transactions are initiated by a simple HTTP GET request and all parameters are specified in the URL itself. The response then is an XML document containing the records and fields appropriate to the request. All of the Bloglines API calls are done with REST, although it's hard to say if this reflects the relatively simple queries now provided or if it's a design preference of the developers.
Although Web services probably are taking off behind corporate doors, only a few of the larger Web sites have made their plans and APIs public. The best-known examples are some of the largest and most profitable sites on the Web, including Amazon, eBay and Google. eBay charges for access to its Web services, with annual fees as well as per-transaction costs. By contrast, Amazon and Google have made their APIs freely available to the public, subject to usage restrictions and without making any promises regarding future availability.
In making its API public, Bloglines is indicating its interest in creating the same sort of developer community that Amazon, Google and eBay have created. This move also demonstrates its interest in remaining a leader in the world of Weblog aggregation and applications. Given Google's purchase of Blogger several years ago and the extensive search features that Bloglines is making available with its API, we might be witnessing the beginning of a new type of application or platform battle, with the Google and Bloglines APIs competing for attention.
Bloglines aggregates content from a large number of Weblogs and frequently updated news sources. Bloglines is happy to accept feeds in a variety of formats, including Atom and several versions of RSS. Indeed, Bloglines offers subscribers the choice of which feed to use, if more than one is available. The Bloglines software then archives that content, providing a search interface for interested users. Bloglines provides some relevance features, telling subscribers which additional Weblogs might interest them. Finally, Bloglines lets you look at other users' subscriptions; if you are interested in seeing which Weblogs interest me, you can review my profile and see my subscriptions.
For now at least, much of this functionality remains under wraps, available only through the Bloglines Web site. But three particular pieces of functionality now are available from the Bloglines Web services API:
Notifier: if you are a Bloglines subscriber and want to know when new content has arrived from one or more of the Weblogs to which you subscribe, now you can do it. This is the most established of the Bloglines Web services, and a number of tools for a number of operating systems and windowing toolkits rely on this interface to provide updates.
Sync API: allows you to retrieve information about a particular user's subscriptions, as well as the latest entries from each of those subscriptions. You can think of this as the data underlying the HTML that Bloglines generates for the main Weblog listing it provides.
Blogroll API: presents a way to retrieve and display a particular user's subscription list.
As I wrote above, Bloglines has decided to use REST for all of its Web services APIs. This means every request consists of a single URL, with all of the parameters and their values in the URL. Information is returned in whatever format the server deems appropriate. This stands in sharp contrast to SOAP, which specifies the name and type of each parameter and return value. A minor exception to this rule is that APIs requiring authentication expect the user name and password to arrive in HTTP Basic rather than in the URL itself. In the Bloglines universe, subscribers are identified by their e-mail addresses and user-selected passwords.
The easiest of the APIs to understand and use is the Notifier. To invoke the Notifier, simply go to the URL rpc.bloglines.com/update?user=reuven@lerner.co.il&ver=1. The response, while (incorrectly) tagged by the server as having a MIME type of text/html, contains a plain-text response of the format:
|A|B|
Notifiers can interpret the response as follows:
Normally, A indicates the number of unread Weblog entries in the user's subscription.
If the provided e-mail address is not registered with the system, then A contains -1.
If B isn't empty, it then contains a URL pointing to an upgrade page. The documentation doesn't say much about what it means to have an upgrade page. I assume that such a page is meant for people rather than programs, because it would be impossible or at least quite difficult to identify all of the programs that use the Notifier API and that are in need of an upgrade.
We easily could implement the client side of the Notifier API in any modern high-level language. But at the time of this writing, versions of Bloglines client libraries exist in Perl, Python and Ruby. I use the Perl version (on CPAN as WebService::Bloglines), but you may feel more comfortable rolling your own version, using a different version or both.
Here is a simple command-line program that prints “You have new blogs!” if Bloglines reports that new messages are waiting and “No new blogs” if I already have read everything:
#!/usr/bin/perl use WebService::Bloglines; my $username = 'reuven@lerner.co.il'; my $password = 'MYPASS'; my $bloglines = WebService::Bloglines->new( username => $username, password => $password); my $unread_blogs = $bloglines->notify(); if ($unread_blogs) { print "You have '$unread_blogs' new blogs!\n "; } else { print "No new blogs.\n" }
The number returned by $bloglines->notify() is the number of unread postings, not of unread Weblogs. If there are 15 unread messages from five Weblogs, $bloglines->notify() returns 15, not 5. Moreover, the number reflects the state of the internal Bloglines database. That is, if you click on the Keep New check box at the bottom of a Weblog entry, it is included in the count of new messages returned by $bloglines->notify().
If we enter an incorrect user name, our program exits with a fatal error and indicates that we gave it a bad user name. Giving a bad password has no consequences for the Notifier API, because that information is available publicly.
Another offering from Bloglines, as we mentioned earlier, is the Blogroll API. A blogroll is a list of Weblogs that a particular author finds interesting and often reads. It's likely that if you enjoy reading someone's Weblog, you also would enjoy perusing that person's reading list. In the case of Bloglines, a blogroll simply is a list of subscriptions associated with a particular user.
So far, we have mentioned that someone's Bloglines user name is the same as his or her e-mail address. But this is not completely true—if you choose to use Bloglines for your own private purposes, never sharing information about your subscriptions with other people, you need nothing more than your e-mail address. But if you do want to expose your subscriptions, you must choose a user name with which they can be associated. In my case, my registration e-mail address is reuven@lerner.co.il and my user name is reuven. This distinction wasn't clear to me for the first few months that I used Bloglines, although it seems to be more obviously advertised now.
If a user has established a user name for public consumption and if that user has chosen to share his or her subscriptions, you can get a version of that user's Blogroll that uses HTML and JavaScript as follows: https://www.bloglines.com/public/reuven. If we want to retrieve the blogroll results in HTML, we can do so with the following style of URL: https://rpc.bloglines.com/blogroll?id=reuven&html=1.
But the whole idea of Web services is to make data machine-readable, such that it can be stored and processed by computers. OPML, the Outline Processor Markup Language, specified by Dave Winer in 2000, is the format used by Bloglines when it exports a list of subscriptions. It is not an official part of the Bloglines Web services specification, but you can retrieve it by going to the following type of URL: https://www.bloglines.com/export?id=reuven.
In all of the above examples, you can and should replace my Bloglines user name with that of the user whose blogroll you want to read. Not every user makes his or her subscription list public, so you may encounter error messages when trying to retrieve them. And once you retrieve the OPML, you need to process it, perhaps using a tool such as the publicly available XML::OPML module from CPAN.
As you can see, the Bloglines API for Web services opens the door to a host of third-party applications. It increasingly is possible to create useful applications that use HTML, XML and HTTP but that are not tied to a Web browser. The Notifier and Blogroll APIs are only the beginning. As we saw earlier, there is also a Sync API that effectively allows developers to create alternative GUIs and applications with the actual content Bloglines retrieves and stores. In my next column, we will look at the Sync API, building some basic applications on top of the Bloglines infrastructure.
Reuven M. Lerner, a longtime Web/database consultant and developer, now is a graduate student in the Learning Sciences program at Northwestern University. His Weblog is at altneuland.lerner.co.il, and you can reach him at reuven@lerner.co.il.