Introducing SOAP
In the January and February installments of “At the Forge”, I demonstrated a simple three-tier web application using a database, web server and the Mason templating system for mod_perl. We were able to see some of the advantages and disadvantages of a three-tier web application, particularly when compared with its two-tier counterpart.
But as I pointed out last month, our three-tier architecture was incomplete and wasn't necessarily a fair demonstration. That's because our Perl middleware object layer had to reside on the same computer as the components we wrote for HTML::Mason, a templating system built on mod_perl. Depending on how you count things, this might be considered a two-tier application, albeit one with an object-oriented abstraction layer between the tiers.
In order to put the Mason components and Perl objects on separate computers, we somehow need the ability to call an object method across a network. That is, the following line of Perl would work, regardless of whether $object resides on the same computer as our Apache server or somewhere else on the Internet:
$object->method($arg1, $arg2);
Distributed-object technology and remote-procedure calls have existed for many years on a variety of platforms. In almost every case, this technology was restricted to a particular language or platform. DCOM (Distributed Component Object Model) allows objects of any language to communicate but only under Windows. Java's RMI (Remote Method Invocation) can only communicate with other Java objects. CORBA is an exception to this, allowing objects to communicate across platforms and languages, but CORBA is complex, has taken awhile to get off the ground and isn't yet a part of most programmers' knowledge base.
In response to these proprietary and complex protocols, a number of people in the Internet community have created SOAP, the Simple Object Access Protocol, that makes it extremely easy to create distributed applications. Two of the biggest proponents of SOAP have been Dave Winer (famous for his Scripting News “weblog”) and Microsoft, which is not usually associated with open standards and cross-platform protocols. Regardless of what we in the Linux community might think, Microsoft has publicly embraced SOAP, making it a cornerstone of its .NET effort.
SOAP depends on the idea that any two computers on the Internet can communicate using HTTP, the protocol that powers the Web. (Actually, SOAP can be transmitted over nearly any high-level protocol, including SMTP and POP3, but HTTP is by far the most common.) It then transmits information using XML, the markup language that allows us to create tags and document standards. The server turns the incoming XML into an object method call, and then turns the object's response into an XML document that is returned as the HTTP response. Since both HTTP and XML are open standards, published by the World Wide Web Consortium, they can be (and are) implemented on a variety of platforms and, thus, interact without any trouble.
The predecessor to SOAP, known simply as XML-RPC, provided a simple mechanism for remote procedure calls (RPC) using data formatted in XML and transmitted over HTTP. For a variety of reasons, including the fact that XML-RPC could not handle advanced data structures, the W3C adopted SOAP.
A number of languages and platforms continue to support XML-RPC, and it's possible that some situations might call for its use because it has a smaller overhead. Practically speaking, however, the fact that SOAP has gotten so much attention has led to the development, use and debugging of its libraries to a much greater extent than those for XML-RPC. As of this writing, however, there are more implementations of SOAP than XML-RPC, meaning that your choice of platform or language might force your hand toward one protocol or the other.
SOAP, as its name implies, expects to work with objects rather than simple procedure calls. Thus, SOAP client invokes a method on a particular object on the server. The method is specified in the body of the XML document itself, while the object with which it is associated is named in an HTTP “SOAPAction” header. Of course, we also need to specify a computer name and port to which the SOAP request can be directed.
The server itself, including its name and the port number on which the SOAP request is transmitted, is known as the SOAP proxy. This makes sense when you consider that the HTTP server is simply relaying an object method invocation and isn't doing any of this work by itself. Do not confuse the SOAP proxy with an HTTP proxy. An HTTP proxy relays requests from an HTTP client to an HTTP server and often performs security checks and caching. A SOAP proxy, by contrast, relays messages between a SOAP client and an object on the proxy's computer.
The object for which the SOAP server acts as a proxy is sometimes known as the endpoint and is specified in a “SOAPAction” HTTP header. The name of the endpoint can be virtually any text string, including hierarchy separators such as :: and /. In practice, the endpoint has a direct connection to the object hierarchy associated with the language in which the SOAP proxy is written. In Perl, the endpoint might be something like “Foo/Bar”, which refers to the Foo::Bar object located in the file Foo/Bar.pm.
Let's look at a simple SOAP conversation. Our examples will demonstrate SOAP on top of HTTP, which is the most common configuration. There may be slight differences when working with other protocols.
HTTP is stateless, meaning that every connection between two computers consists of one request (from the client to the server) and one response (from the server back to the client). The request and response are each divided into two parts, known as the headers and the body. Of course, the client and server can add any other headers they want, opening the door to all sorts of specialized communications protocols.
The body of a SOAP request or response will be in XML. (If you have never worked with XML before, don't worry; while it can be a deep and intriguing topic, you don't need to know much XML to work with SOAP.) Each SOAP message—a request or response—consists of an optional SOAP header and a mandatory SOAP body wrapped inside of a SOAP envelope. The envelope identifies the contents as belonging to SOAP and sets out the namespaces that will be used for the rest of the message. The headers describe the data in the body, and the body contains the method call or its results.
In order to invoke an object on a remote server via SOAP, we will have to open an HTTP connection to the appropriate URL, identifying the object via the SOAPAction header. We send an XML document containing a SOAP envelope, inside of which our SOAP headers and body identify the method to be invoked on this object, as well as any parameters that the method might require. The client must additionally be prepared to parse the response returned by the SOAP server, extracting data structures contained in that response and using them as necessary.
A SOAP server performs complementary actions, receiving the SOAP request, parsing its contents and invoking the appropriate method on the local computer with the passed parameters. The server also returns the SOAP response to the client, containing one or more values as necessary.
Now that you understand the terminology associated with SOAP, you can forget nearly all of it. SOAP implementations provide us with an abstraction layer that allows us to ignore the fact that it communicates via HTTP and that the request and response use XML. When your program invokes a remote object method using SOAP, it cares about receiving a response; the way in which the request and response are packaged is of little concern.
I'm going to write some demonstration programs in Perl using the excellent SOAP::Lite module written by Paul Kulchenko. This should give you some idea regarding how to write SOAP clients and servers, as well as how to integrate them into your web applications. Despite its name, SOAP::Lite offers a rich array of functionality and can be a great way to add SOAP functionality to your Perl programs. Similar SOAP libraries and objects are available for most major programming languages, so don't think that SOAP is available only for Perl.
Because SOAP acts as a proxy for an object, we first have to create an object whose methods will be available over the network. Listing 1 contains the simple “Text::Caps” Perl module, which handles two fairly useless methods:
Listing 1. Caps.pm, the Perl Module
capitalize, which takes a single string as a parameter and returns the capitalized version of that string
capitalize_array, which does the same thing as capitalize, but to each element in a list of strings rather than to a single string
Notice that while SOAP describes everything in terms of objects and methods, this sample module uses standard Perl modules and subroutines rather than object-oriented syntax. So, when I mention the capitalize method for the Text::Caps object, I really mean that we'll be invoking the Text::Caps::Capitalize subroutine.
SOAP is normally carried over HTTP, which sits on top of TCP/IP. This means that we can create a simple SOAP server by taking advantage of the TCP/IP socket code that comes with Perl. However, SOAP::Lite does much of the dirty work for us; we don't have to create the socket or wait on it. Rather, we create an object of type SOAP::Transport::HTTP::Dæmon, which knows how to act as the appropriate kind of SOAP server. You can see the source code for such a simple server in Listing 2.
Listing 2. A Simple Standalone SOAP Server
The code is relatively simple yet will look odd to even the most experienced Perl programmers. That's because objects associated with SOAP::Lite usually return themselves to indicate success. This allows us to invoke more than one method in a single call. In other words, we can say
$object->method1()->method2();
rather than the traditional
$object->method1(); $object->method2();You may choose to work with SOAP::Lite using either syntax, but the first version is common in documentation.
When we invoke the “new” constructor for SOAP::Transport::HTTP::Dæmon, we pass it two arguments: the name of the computer and the port on which the server should listen for connections.
Once we have created the server object, we must tell it where objects are located. This is a security feature, albeit one that can take some time to understand. Normally, Perl looks for modules in @INC, an array of directory names. When we import a module, Perl searches sequentially through each element of @INC until it finds our module. If it fails to find our module, Perl returns an error message.
However, since SOAP exposes our modules to the entire world, we must be careful before making them available. Perhaps some of our modules return confidential data or manipulate information in a relational database. In order to ensure that only those modules we wish to expose are actually available via SOAP, SOAP::Lite completely ignores @INC when handling incoming SOAP requests. Only those modules explicitly mentioned in a call to dispatch_to(), or in a directory named in dispatch_to(), will be available via SOAP.
In a sense, dispatch_to() effectively defines the equivalent of @INC for incoming SOAP requests. If a module resides in a directory not mentioned in dispatch_to(), it will be invisible to SOAP requests. That this is not the same as modifying @INC.
Note that while I use /tmp for the examples in this article, it is a poor idea to use /tmp in this way in a real development or production system. If you want to put SOAP-related Perl modules in a separate directory from /usr/lib/perl, I strongly suggest that you keep it on the main file system, such as in /usr/lib/soaplite.
Now that our standalone SOAP server is running, we should test it to see if it works. In order to do that, we must create a SOAP request, send it to the server and then parse through the XML-encoded SOAP response that it returns. Luckily, SOAP::Lite includes such a utility, SOAPsh.pl. This small program allows us to create and send SOAP requests interactively, displaying the results. SOAPsh.pl alone justifies the download of SOAP::Lite, even if you are planning to work with another SOAP library for Perl.
If we are running our standalone SOAP server on localhost (i.e., on the same computer as we run SOAPsh.pl), and if we are running it on port 8080, we can invoke it as follows:
perl SOAPsh.pl https://localhost:8080/ Text/Caps
Notice how the first argument to SOAPsh.pl is the URL of the SOAP server, and the second argument is the object that we want to invoke. You can avoid a lot of grief by remembering that the second argument must be passed using a URL-style object hierarchy divider, namely a slash (/). Typing “Text::Caps” rather than “Text/Caps” will confuse the SOAP server and result in hard-to-debug errors.
If your invocation of SOAPsh.pl succeeds, you will see the following prompt:
Usage: method[(parameters)] >
The “>” sign indicates that it's your turn to type and you can invoke any method for the object to which you've connected. You may now call any method that the object supports, including any parameters. So to capitalize a word, I simply type:
> capitalize('abc')Because my SOAP client and server are both on the same computer, the response is nearly instantaneous. SOAPsh.pl prints out:
--- SOAP RESULT --- $VAR1 = 'ABC';Hey, that's pretty great! I just invoke an object method across the network. That wasn't so hard, was it?
SOAP would be nice if we could send simple scalars back and forth. But we can send and receive a variety of data types. For example, we can invoke capitalize_array, sending a list of arguments:
> capitalize_array('abc', 'def', >'GHi')
The return value is an array reference:
--- SOAP RESULT --- $VAR1 = bless( [ 'ABC', 'DEF', 'GHI' ], 'Array' );The returned array reference looks a bit funny because it has been turned into a format that SOAP::Lite can send and retrieve. We will soon see how our programs can ignore this intermediate format, seamlessly exchanging complex data structures over the Internet.
As you can see, it's possible to work with SOAP without understanding the underlying XML-encoded data. However, debugging SOAP problems often requires that you look at the XML as well as the HTTP headers that are sent in the request and the response.
SOAP::Lite objects support the on_debug( ) method, which takes a subroutine reference as an argument. This subroutine is invoked for each SOAP transaction, meaning that we can log information to the disk or screen. The simplest use of on_debug( ) is as follows:
on_debug(sub{print STDERR @_})
In other words, we ask SOAP::Lite to send a copy of everything to STDERR. This provides us with a marvelous opportunity to see what happens behind the scenes. After we invoke this method, SOAPsh.pl reminds us that we invoked a local method rather than a SOAP method:
--- METHOD RESULT --- SOAP::Lite=HASH(0x82e1174)With debugging turned on, our invocation of capitalize(abc) from before gets translated into a SOAP request (see Listing 3)
As you can see, the request is divided into a header and a body, as with all HTTP requests. And as with a normal HTTP request, we indicate an action (“POST”) along with a URL, as a Content-Length (indicating the number of bytes in the request) and the Content-Type (which is always going to be “text/xml”).
Then the fun begins: the final header is SOAPAction, which names the object and method that are being invoked. The SOAPAction header is designed to allow corporate firewalls to filter out dangerous objects and methods from being invoked. Currently, however, it would seem that support for SOAPAction is relatively hard to find. Besides, information about both the object and its method are buried inside of the XML request and response themselves, making the header unnecessary for parsing purposes.
The XML itself begins with an XML declaration and then a SOAP envelope. Inside the envelope is an optional header (not shown in this particular invocation) and a mandatory body. The body names the object and method that we wish to invoke, as well as any arguments that we might have passed.
This XML is parsed into the native operating system and language format and is then passed along to the target object. The object returns a response value to the SOAP server which then creates a SOAP response in XML as seen in Listing 4.
Listing 4. SOAP Response in XML
The response, like the request, uses HTTP and HTTP headers to pass some metadata, including the server type, date, content length, type (“text/xml”) and even the type of SOAP server being run.
The envelope for this particular response, like the request, contains no header. However, it does contain a body, in which the return value (of type “xsd:string”) is returned. While the request uses a namespace of “namesp3:capitalize”, the response uses a namespace of “namesp1:capitalizeResponse”. This is standard in SOAP; XML namespaces are used to identify whether the message contains a request or a response and for which method the response is being sent.
Without any explanation, Listing 5 is the similar debugging output from a call to capitalize_array(reuven, shira, atara):
SOAPsh.pl can demonstrate interactive requests, but SOAP is much more useful when our programs can create and issue their own requests. Listing 6 demonstrates the code for a simple SOAP client that connects to our server on port 8080 of localhost. Notice how the URI is once again the name of the object, and the proxy is the name of the SOAP server.
What is particularly amazing about SOAP::Lite is how it allows us to invoke methods on our object that only exist across the network. That is, the “uri”, “proxy” and “result” methods obviously exist for the SOAP::Lite object. But the “capitalize” method only exists for our remote Text/Caps object. SOAP::Lite is normally smart enough to figure out the difference, passing along any method that cannot be locally resolved.
Our standalone server was meant to be simple, and it is. However, what happens when we begin to get millions of requests per day? Then our standalone server will no longer be able to keep up, and users' method calls will no longer be serviced.
A practical solution is to use a piece of software optimized for receiving many incoming HTTP transactions, namely Apache. Using Apache to handle our incoming SOAP transactions means that we can scale it as high or as low as we need. Our server program no longer needs to take this into consideration; it can focus on the mundane details of receiving SOAP packets and passing them along to Perl modules.
Creating a CGI-based SOAP proxy is not very different from creating a standalone program. Perhaps the most important thing to keep in mind is that you should not use CGI.pm or any other CGI-related module with which you might be familiar. Remember that the CGI program here is the SOAP proxy, and the CGI protocol is being used to transfer the XML-encoded request and response back and forth.
SOAP::Lite comes with so many goodies, it's hard to know when to stop describing them. For example, those of you who have been convinced to use mod_perl in place of CGI will be pleased to know that SOAP::Lite has native mod_perl support, along with CGI and standalone support.
Your own programs can take advantage of the “autodispatch” mechanism we saw above, in which any method name not recognized locally is transmitted to a remote object.
SOAP::Lite can handle the transfer of most data structures supported by SOAP, including objects. In other words, you can invoke new( ) on a remote object, and then invoke various methods on the object returned by new( ). This functionality has existed for years on a number of specific platforms, but the fact that SOAP makes it platform-independent is truly amazing.
Finally, SOAP has grown beyond its exclusive use of HTTP and now supports a variety of other protocols, including some that we might not expect, such as POP3 and SMTP. SOAP::Lite supports all of these protocols; by the time you read this, it will undoubtedly support many more.
Now that we have seen how SOAP can be used in simple circumstances, let's consider how it might be used in more complex situations. For example, let's assume that I have to create an extremely large web site that depends on a back-end relational database. In many cases, as regular readers of this column know, I will prefer to do the implementation in mod_perl and HTML::Mason.
But as the market for server-side Java grows, it's possible that some or all back-end functionality might be available using JavaBeans. Moreover, as Enterprise JavaBeans (EJB) becomes an increasingly pervasive and intriguing technology for distributed applications that require transactions, I might even prefer to do some of the implementation in Java.
With SOAP, I am now free to mix and match languages and platforms as I wish. If I create a SOAP server with access to the appropriate Java objects, there's no reason why my Mason components cannot communicate with a Java middleware layer. In some cases, this might even be preferable even if the system begins as a one-language affair. Given that Perl has no support for networked transactions, we might want to write an initial implementation in Perl and move toward EJB in the future. Using SOAP, this might be possible and even desirable.
Finally, web application servers are already beginning to work with SOAP. Not only does this allow objects on other computers and written in other languages to communicate with a given server, but it opens the door to Internet services that are not necessarily based on the Web. Perhaps web-based newspapers will begin to offer SOAP-based headline systems, taking the same content that is available on their web site but packaging it in such a way that someone can download a customized set of headlines with a single SOAP call. With such a service in place, users could install a desktop (non-web) application that would update itself to display the latest headlines every few minutes.
SOAP heralds the beginning of a new type of distributed Internet application, namely one that can perform remote procedure calls across operating systems and programming languages. No longer does RPC have to be a proprietary, difficult to-understand or difficult-to-invoke process; in the course of an afternoon, you can create a simple distributed application. Just what this means for the future of the Web and the Internet is a good question, but already some are claiming that desktop applications will increasingly be GUI shells that send SOAP requests to centralized servers. Regardless of what the future may bring, the fact that Perl and other free languages can use SOAP means that we will soon be able to communicate more easily than ever. And hey, isn't that the whole point of the Internet?
Reuven M. Lerner owns and runs a small consulting firm specializing in Web and Internet technologies. He and his wife Shira recently celebrated the birth of their daughter, Atara Margalit. You can reach him at reuven@lerner.co.il or on the ATF home page, https://www.lerner.co.il/atf/.