Print Data Part 1 – Collect, Report, and Analyse

Language Options

To make good decisions about your IT environment, you need timely and accurate information. In this series of blog posts, I will discuss the collection, reporting, and analysis of print data from multiple sources. First, let’s think about how to collect and how to avoid black holes in your printing metrics.

Considering the title of this article, the following question may sound a little odd. But it’s worth asking yourself “why do I want to collect print data in the first place?”

Depending on your role, there are many ways to answer this question. Here are a few examples:

As the owner of a centrally managed print service, I want to charge back printing costs to individual departments to allocate or recover the cost of printing resources. I also want to understand which printers and multifunction devices (MFDs) are over-, under-, or never utilized.
As the person responsible for providing Environmental Impact Information, I want to understand how much we are printing and how this affects the environment.
As a service improvement professional, I want to understand why people are printing certain types of documents, so I can improve our service to end users and customers whilst improving efficiency and reducing costs.
As a security expert, I want to know who printed what, where, and when.

These questions allow us to understand the data points required for the information to be effectively analysed and presented. This is an important aspect of reporting: understand what information is required, then collect that specific data. The questions above require information about the physical print job: number of pages, colour vs. black & white, duplex or simplex, date/time, etc. To get the full picture, all of these need to be augmented with other information such as user, department, location, and origin information.

Now that we know which data we want to collect, where can we get it from? Most large organisations have four potential sources of print data that we can use:

Printers
Print Spool Servers
Client devices
Centralised applications such as SAP, ERM, Mainframe, etc.

“That’s easy!” I hear you say… just collect the data from the printers and run the reports. Many printers can indeed provide lots of useful information about printing activity using device-based accounting (DBA) or job-based accounting (JBA) data. This data is an ideal starting point for our print data collection quest, but — and anyone who has worked with printers knows there is always a “BUT” involved — many printing devices do not have DBA functionality nor do they record where the print job came from. In the case of centralised applications, they may simply report the remote/system user. What’s more, label printers, smaller print devices, and nearly all older devices typically do not have the accounting ability at all.

Print Spool Servers sound like the next best collection point. Surely, they must know what is being printed, right? A deeper look reveals that traditional print spool servers are more concerned with spooling and routing of print traffic rather than recording what is actually being printed. With the move to cloud and centralised infrastructure, enterprises are also typically trying to avoid routing files via traditional print servers to minimise network traffic and server infrastructure.

Client devices can provide a useful insight into what is being sent from the client either to a print server or directly to a client printer. Unfortunately, they do not provide an easy mechanism for accessing or reporting the data.

Print Spool Servers, client devices, and printers may also fail to account for the centralised application printing that can represent as much as 50% of an organisation’s output. This is especially true when the central application directly communicates with the printer or uses Linux, UNIX, or z/OS spooling. Another issue: many legacy and/or centralised applications may use different usernames or system accounts to print the information on behalf of the user, thus losing valuable metadata regarding the print job. Typically, Mainframe and SAP applications also contain application-specific metadata that is often lost during transmission to a server or printer.

So now we need to understand what data is available, where it is, and how to collect it.

Print accounting software solutions typically only interact with one or two of these potential sources of information, with varying levels of data and accuracy. Effective collection of print data often requires the ability to not only collect information from all four of the above sources, but to augment that data during the printing and/or collection process from other sources such as Active Directory.

When assessing software for print accounting in your enterprise, you need to make sure you look beyond the obvious sources of print to avoid falling into the print data black hole.

In Part Two of this series, we will look at using the collected data for routine reporting.