Alive, Alert… and Highly Available
Wednesday, April 27, 2016
by Paul Conklin
It’s a beautiful summer day, you are driving down the road in a convertible, wind in your hair. Life is good. While rounding a curve, you notice your check engine light has come on. The car is acting fine, but being a conscientious driver, you take it to the nearest shop and have it checked. Turns out to be a relatively inexpensive repair, but the mechanic tells you it could have been much more costly down the road if you had waited much longer.
Wouldn’t it be nice if your mission-critical applications could do the same?
Many highly available applications make use of a “heartbeat” function to see if an application is “alive” and whether users should be redirected to an alternate resource. But “alive” is a fairly low standard and by no means the same as “healthy.” It’s like a hospital patient in a coma. No one would claim such an individual to be in perfect health, but their heart is still beating. If that’s the only criteria you’re tracking, then the results are the same for that patient and a marathon runner.
The same applies in the IT world — and it’s why we developed the LRS/AlertX utility to ensure a stable printing environment. LRS/AlertX acts like a doctor who goes beyond a stethoscope check to ask standard diagnostic questions like “Do you know where you are? Do you know who you are? Do you know what the date is?” In other words, the goal is to assess whether the application is not only “alive,” but is actually “sane.” It’s important to know that the application can interact with its environment in a meaningful way and with acceptable performance. If it can’t, then LRS/AlertX can take actions and notify IT staff before minor infrastructure problems jeopardize the delivery of business-critical documents. Specifically, LRS/AlertX can handle:
- Notification — of how the environment is running and what actions LRS/AlertX is taking to maintain stability
- Mitigation — by restarting any output-related processes that may have died or become instable
- Validation — that all critical communication channels are working properly
- Communication — with 3rd party cluster management solutions, network load balancers, etc.
In other words, LRS/AlertX actively looks for problems before they occur, mitigates those that do occur, and documents both so administrators can prevent them from occurring in the future. It also has the ability to “tune” itself to your particular environment and learn what’s normal to set a proper baseline (i.e., what is the “normal” amount of time it takes to complete a particular task) and make recommendations about the best configuration.
For example, if it normally takes 2 seconds to complete an audit task, and you have it set to audit every 1 second, the system will automatically adjust the interval to 2 seconds and notify you of the change. Every 24 hours, it sets all of the values back to their original settings. This allows the process to not only be as efficient as possible, but also provide insight as to what is normal for your environment.
In the case of serious trouble in your environment (network problems, server failures, etc.), LRS/AlertX is designed to attempt to keep the print infrastructure alive at all costs… all the while communicating with load balancers and other components of your IT environment. This “Last Man Standing” mode kicks in when LRS/AlertX senses there is only one LRS server instance still operational.
It’s a “Check Engine” light that does more than just indicate an impending problem. It’s a canary in a coalmine… one who not only signals danger, but knows how to use a gas mask and help others get to safety. Whatever metaphor you want to apply, LRS/AlertX can add a critical layer of protection to your High Availability print environment so you can spend more time cruising down the road and less time in the mechanic’s shop. Contact LRS to learn more about the LRS/AlertX solution.