High availability & disaster recovery for print management

Language Options

A common question that I hear from the field is “How do you handle HA and DR?” High availability (HA) and its oft-forgotten cousin disaster recovery (DR) are terms that no business should ignore. What high availability means to you, however, likely depends on your business application, IT infrastructure, and the end user’s perspective. So the real question for us is: “what does a high availability configuration involve in the context of an enterprise output environment?” Are there any specific solutions required, or can I use the ones I already have? Are there any specific technical challenges associated with output management that I need to be aware of?

First things first: what constitutes “available” for an enterprise output server, and how do you ensure availability? Your first step should be selecting hardware and software that are intrinsically stable (this may sound obvious, but it is still worth stating). When it comes to hardware, the server is not the only consideration. For instance, a server may be up and functioning perfectly according to all platform and application metrics, but if users can’t print, then the server is “down” as far as they are concerned. LRS output management software is built to scale both vertically and horizontally, however it is still imprudent to rely on one server for anything. The pages of history are full of “unsinkable” ocean liners and datacenters that could “never” go down. Your HA environment will likely involve multiple servers.

Business application and financial considerations will often dictate the type of HA configuration that is used. One of the first questions an LRS engineer may ask when working with a client on high availability is “what are you using now?” This is not to avoid doing the work. VPSX software and its components are designed to work with any number of existing high availability systems. Since the existing solution can probably continue being used, it is a good idea for us to determine what the customer already owns and is comfortable managing on a day-to-day basis.

Another factor is how quickly print output functionality must be restored during an outage. In other words, is downtime measured in terms of convenience, customer experience, monetary loss, or lives? During a system failure, should the system be instantaneously and transparently recovered, or is 5 to 15 minutes an acceptable recovery time? It is easy to say “instantaneous” and that is of course possible, but that level of availability may require more complexity and more infrastructure.

Before designing a high availability solution, there are some key facts to consider. First, the LRS software is already designed to support an installation where the executable components are separated from the application configuration and data. This design can be leveraged to use highly available storage systems like a SAN or NAS. It has also worked well with a variety of hardware clustering systems running on both Windows and UNIX that require a shared data store. Lately, Application Delivery Controllers (commonly referred to as load balancers) have been used. All of these types of HA solutions are currently in use and working well in real customer environments. But there is more to the story.

You need to consider the nature of output. For example, an application may generate a business “document” that is actually a collection of documents that need to be printed in a specific order. If an HA system were configured to simply “deal the cards” to multiple output servers, then those documents could be printed out of order, or an unrelated document may be inadvertently inserted into the bundle. Luckily the LRS software supports a number of approaches to ensure that the systems are highly available, and that output integrity is maintained.

Another common question is: “If I am running more than one Output Server in an active/passive configuration, how do I tell an Application Delivery Controller which server is the active server?” Traditionally, this was done by having the HA appliance check to see if the application is listening on certain IP ports, specifically IPP, LPR and/or application interface ports. Sometimes, we can even make a call to the application web service. But recently, LRS has developed a utility that can interrogate the LRS solution using a complex set of heuristic rules to determine overall output environment “health.” This same utility can manage more than two VPSX instances and can even assist with an automated failover scenario.

There are also a number of mechanisms and features built into the LRS solution and its supporting components. For instance, when printing from a server application like SAP, Epic, etc., the LRSQ protocol can be used to provide more reliable document delivery as well as specify an alternate output server destination. In the event that the primary LRS server is not available, the print stream is automatically routed to a secondary destination. Though this is a simple feature, I have seen it work spectacularly well in real-life situations when all other high availability provisions have failed (usually due to human error).

Also built into the LRS solution is the basic functionality of a queue. What this means is that when a print job is sent to the VPSX server and for some reason the end point destination is unreachable due to a problem like a network “hiccup” or printer jam, the job is not lost. We can wait for a pre-determined length of time for the device to be available and then send the print job. Similarly, the selection of the correct communication protocol to the printer can also add to availability. Most commercial grade printers support Printer Job Language (PJL). Using PJL communications to the printer allows a job that was interrupted to re-start at the page where it left off.

I could go on and on; there are many ways to assure the availability and recoverability of your enterprise output environment. The best advice I can give is this: establish a consultative relationship with your LRS support team. They have the real world experience and an extensive network of colleagues (including me) that can help you design a resilient output landscape. One that will withstand any challenge, often using existing infrastructure that your IT department already owns, understands, and trusts.

404

“High Availability” Meets Enterprise Print