Amazon Web Services
Back when I was in college, there weren't many options for buying technical books. I could buy them new at the high-priced campus bookstore, I could buy them from a high-priced competitor around the corner, or I could buy used copies from other students, who advertised their wares at the end of every semester. Regardless, my ability to buy books was dictated by my location, coupled with my ability to learn what was available.
So, it probably won't surprise you to learn that I was an early customer of on-line bookstores, patronizing both Bookpool and Amazon before the summer of 1995. The combination of excellent prices and wide selection, along with convenience was a dream come true. Much as I might hate to admit it, I probably spent just as much on books from on-line stores as I would have at their brick-and-mortar counterparts. However, although my book-buying budget was unchanged, the number of books I could buy, as well as the variety that was available, was unparalleled in the physical world.
Things got even better when Amazon opened its doors to third-party booksellers. Now I could not only compare new book prices from the comfort of my living room, but I could browse and buy used books as well. The number of interesting books available for less than $1 US (plus shipping) has turned me into something of a book-buying monster; the shelves of my graduate-school office are filled with books that I hope will be useful in my research, but that I bought largely because the opportunity existed. When I hear about an interesting book, my first instinct now is to check at Amazon—or even better, at isbn.nu, which compares prices across multiple sites.
Over the years, Amazon has assembled a huge database of information about books. I'm sure that this database of books, buyers and sellers continues to be an important source for Amazon's decision-makers. But a few years ago, Amazon decided to do something surprising—they opened part of their internal database to third-party developers, in a program known as Amazon Web Services (AWS). Using AWS, developers can perform nearly every task they would normally be able to do on the Amazon site, using a client-side program rather than a Web browser. AWS also includes a number of features aimed at booksellers, for pricing and inventory management.
In the latter half of 2005, Amazon unveiled a number of new initiatives that fit under its “Web services” umbrella, only some of which are related directly to selling and buying books. At about the same time, eBay announced that it would no longer be charging developers to use its Web services, making it possible to query two of the largest databases of sales data. And, of course, Google has long offered Web services of its own; although data is currently limited to the main index, it is probably safe to assume that it is a great resource.
This month, we begin to explore the world of commercial Web services, looking especially at ways in which we can integrate data from external Web services into our own applications. Along the way, we'll see some of the different ways in which we can invoke Web services, some of the different offerings that are available to us and how we might be able to build on existing Web services to create new and interesting applications.
During the Web's first decade or so, it was mostly designed for user interaction. That is, most HTTP clients were Web browsers, and most of the content downloaded by those browsers was HTML-formatted text intended for people to read.
At a certain point, developers began to consider the possibility that they could use HTTP for more than just transmitting human-readable documents. They began using HTTP to transmit data between programs. The combination of HTTP as a transmission protocol and XML as a data format led to XML-RPC. Because XML and HTTP are platform-neutral, one did not have to write both the client and server programs in the same language, or even use the same operating system. XML-RPC thus provides a means for cross-platform RPC (remote procedure calls), with far less overhead than other similar approaches to the same problems (such as, CORBA middleware) might require.
XML-RPC was and is a good, clean and lightweight protocol, but it lacked some of the sophistication, error handling and data types that many developers wanted. Thus, SOAP (originally short for the Simple Object Access Protocol) introduced a number of extensions to make it more formal, including a separation between the message envelope and body.
XML-RPC and SOAP both assume that the server will be listening for method calls at a particular URL. Thus, a server might have an XML-RPC or SOAP server listening at /server, or /queries, or some such URL. The client is then responsible for indicating which method it needs in the request. In XML-RPC, we use the methodName tag. Parameters and metadata are all passed in the XML envelope, which is sent as part of an HTTP POST submission.
A different technique, known as REST, identifies the method calls in the URL itself. It passes parameters like a standard GET request. REST has a number of nice features, especially its simplicity of implementation and use. And, debugging REST is easy, because you can enter the URLs into a Web browser instead of a specialized program. However, a large number of people are still using SOAP and XML-RPC, especially when working with complex data structures.
Web services form the core of what is increasingly known as service-oriented architecture, or SOA, in the high-tech world. A Web service brings together all of the advantages of the Web—platform independence, language independence and the ability to upgrade and change the service without having to distribute a new version.
SOA makes it possible to create new services, or even to unveil new versions of existing services, either by replacing an existing implementation or by unveiling a new implementation in parallel with the old one. Those who use Web services can benefit from improved speed and efficiency, or from completely new APIs, without having to worry about incompatibilities or installation problems. In addition, as long as developers follow the service's published specification, they can use whatever language and platform they want, creating anything from an interactive desktop application to an automated batch job that crunches through gigabytes of data.
Amazon was one of the first companies to begin working with Web services. AWS is now a suite of different APIs, some of which have to do with Amazon's catalogs, and others (for example, the Mechanical Turk and Amazon's Simple Queue Service) are more generalized services. The most popular service is known as the E-Commerce Service (ECS). ECS makes it possible to retrieve product data from several of Amazon's stores, get detailed information about particular items and vendors, and also perform basic operations having to do with e-commerce, including the creation and manipulation of shopping carts.
ECS has two basic modes of operation, known as search and lookup. Searches return a list of products matching a set of criteria—for example, all of the books written by Larry Wall, or books with the word Python in the title or movies directed by Woody Allen. Lookups are meant for when you know the specific ID code associated with a product, known as an ASIN (Amazon Standard ID Number). The ASIN for books is the same as its International Standard Book Number (ISBN); other types of products have ASINs defined by Amazon.
So, let's say I'm interested in finding out whether Amazon stocks the Pragmatic Programmers' book about Ruby on Rails, and how much it costs. Because I'm looking for a particular item, I should use the ItemLookup operation. But this means that I need to know the ISBN, which I find is 097669400X. (ECS expects the ISBN without any hyphens or other punctuation.) Finally, I have to get a value for AccessKeyId, an ID number that tells Amazon which developer is accessing the system. (Getting an AccessKeyId is free and easy; see the the on-line Resources for details.)
The base URL for ECS REST requests is https://webservices.amazon.com/onca/xml?Service=AWSECommerceService.
To indicate the operation, AccessKeyId and ItemId, we add name-value pairs onto the URL, using the name=value format and separating the pairs with ampersands (&). Our combined URL thus looks like this: https://webservices.amazon.com/onca/xml?Service=AWSECommerceService&Operation=ItemLookup&AWSAccessKeyId=XXX&ItemId=0735619530.
If you put the above into a Web browser (replacing the XXX with an actual AccessKeyId value), you should see the XML document (with a content-type of text/xml) returned from Amazon's server. That document begins with an ItemLookupResponse tag and is then divided into two sections, OperationRequest (which describes the request that you made, including your browser's UserAgent header and all of the arguments you passed to the service) and Items (which contains the responses from Amazon).
For example, here is the response that I received from my request to Amazon:
<ItemLookupResponse> <OperationRequest> <HTTPHeaders> <Header Name="UserAgent" Value="Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8) Gecko/20051111 Firefox/1.5"/> </HTTPHeaders> <RequestId>1NBTWT1FHDEHJK2G16CT</RequestId> <Arguments> <Argument Name="Operation" Value="ItemLookup"/> <Argument Name="Service" Value="AWSECommerceService"/> <Argument Name="AWSAccessKeyId" Value="XXX"/> <Argument Name="ItemId" Value="097669400X"/> </Arguments> <RequestProcessingTime>0.00745105743408203</RequestProcessingTime> </OperationRequest> <Items> <Request> <IsValid>True</IsValid> <ItemLookupRequest> <ItemId>097669400X</ItemId> </ItemLookupRequest> </Request> <Item> <ASIN>097669400X</ASIN> <DetailPageURL> https://www.amazon.com/exec/obidos/redirect?tag= ↪ws%26link_code=xm2%26camp=2025%26creative= ↪165953%26path=https://www.amazon.com/gp/ ↪redirect.html%253fASIN=097669400X%2526tag= ↪ws%2526lcode=xm2%2526cID=2025%2526ccmID= ↪165953%2526location=/o/ASIN/ ↪097669400X%25253FSubscriptionId=XXX </DetailPageURL> <ItemAttributes> <Author>Dave Thomas</Author> <Author>David Hansson</Author> <Author>Leon Breedt</Author> <Author>Mike Clark</Author> <Author>Thomas Fuchs</Author> <Author>Andrea Schwarz</Author> <ProductGroup>Book</ProductGroup> <Title> Agile Web Development with Rails (The Facets of Ruby Series) </Title> </ItemAttributes> </Item> </Items> </ItemLookupResponse>
There are several particularly useful fields in the previous XML. You can see how much time it took for Amazon to process our request (0.008 seconds, in this case), which might be useful if we need to debug and/or benchmark our application. The DetailPageURL contains the URL to which we can refer users who want to see information about this product on the Amazon site. And, we get information such as the title and author(s), which might be useful when displaying book information.
And indeed, it should be easy to see how we can parse this XML, displaying parts or all of it in a Web, GUI or console application. Or, we can add some part of this data to a larger database application that we are creating, making sure not to violate Amazon's restrictions on the use of retrieved data.
As useful as the above information is, it still doesn't answer all of my original question, which is whether Amazon stocks the Pragmatic Programmers' book about Ruby on Rails, and how much it costs. I know that the Rails book is available from Amazon, but I don't know how much it costs. This is because ECS returns a small amount of data by default, corresponding to what we saw above. We can tailor the information that Amazon returns to us by specifying one or more response groups. Each response group corresponds to one or more types of data that ECS will return in its response.
To get basic pricing information about a book, we thus can ask to see the OfferSummary response group: https://webservices.amazon.com/onca/xml?Service=AWSECommerceService&Operation=ItemLookup&AWSAccessKeyId=XXX&ItemId=0735619530&ResponseGroup=OfferSummary“.
Instead of the previous listing, which described the book itself, we now get a list of the lowest new and used prices for a particular book. Here is the XML response from the above query:
<ItemLookupResponse> <OperationRequest> <HTTPHeaders> <Header Name="UserAgent" Value="Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8) Gecko/20051111 Firefox/1.5"/> </HTTPHeaders> <RequestId>0SNXJ8T5V2JA18M8AJQC</RequestId> <Arguments> <Argument Name="ResponseGroup" Value="OfferSummary"/> <Argument Name="Operation" Value="ItemLookup"/> <Argument Name="Service" Value="AWSECommerceService"/> <Argument Name="AWSAccessKeyId" Value="XXX"/> <Argument Name="ItemId" Value="097669400X"/> </Arguments> <RequestProcessingTime>0.0331768989562988</RequestProcessingTime> </OperationRequest> <Items> <Request> <IsValid>True</IsValid> <ItemLookupRequest> <ItemId>097669400X</ItemId> <ResponseGroup>OfferSummary</ResponseGroup> </ItemLookupRequest> </Request> <Item> <ASIN>097669400X</ASIN> <OfferSummary> <LowestNewPrice> <Amount>2295</Amount> <CurrencyCode>USD</CurrencyCode> <FormattedPrice>$22.95</FormattedPrice> </LowestNewPrice> <LowestUsedPrice> <Amount>2341</Amount> <CurrencyCode>USD</CurrencyCode> <FormattedPrice>$23.41</FormattedPrice> </LowestUsedPrice> <LowestCollectiblePrice> <Amount>3495</Amount> <CurrencyCode>USD</CurrencyCode> <FormattedPrice>$34.95</FormattedPrice> </LowestCollectiblePrice> <TotalNew>41</TotalNew> <TotalUsed>12</TotalUsed> <TotalCollectible>2</TotalCollectible> <TotalRefurbished>0</TotalRefurbished> </OfferSummary> </Item> </Items> </ItemLookupResponse>
As you can see, the initial portion of the response is the same. But the second half of the response, inside of the <Items> tag, is different, with LowestNewPrice, LowestUsedPrice and LowestCollectiblePrice tags showing us how much we can buy this book for.
We also can ask for other response groups, mixing and matching their names as necessary. For example, we can request the Medium response group, giving us not only information about the request and the book, but also the images (in a number of sizes) associated with the book, the book's size and weight, and editorial reviews. If we want to go beyond that, getting reviews of the book that have been left by Amazon customers and lists of similar products, we can request the Large response group.
Amazon's Web services provide us with a tool to look through a huge database of product information, for both personal and commercial use. In addition, ECS gives us a taste of what it is like to create REST-style queries, and how we might parse the results. Finally, just as Web developers often learn from the HTML and JavaScript on existing sites, we can learn how to create good Web services for our own use by studying the way in which Amazon has done theirs. In particular, I like Amazon's concept of response groups, which allows us to mix and match the types of responses we might get—something that I may well emulate in my own Web services.
Next month, we'll build on what we saw here, creating a Web service of our own that aggregates data from Amazon and my local public library to give me a personalized book lookup system.
Resources for this article: /article/8748.
Reuven M. Lerner, a longtime Web/database consultant, is a PhD student in Learning Sciences at Northwestern University. He lives outside of Chicago with his wife and three children, including newborn son, Amotz David.