Why is it important to track PDFs?
For one of our clients, a key online goal is to maximise views of their documents, most of which are in PDF format. In order to track page views and measure conversions, like many businesses do, they are using Google Tag Manager to bring in various tracking scripts such as Google Analytics.
The complication was that the majority of their campaigns involved having links in printed publications and emails sent to users on their mailing lists that would directly link to PDF files to be downloaded. As these were direct links to files, the tracking scripts in place on the website were not able to track these downloads, leaving an incomplete picture of how many people are actually accessing these documents. While there are many simple solutions which involve tracking the clicks on links to PDF files, we would not get any information about direct pageviews using only these methods.
We have explored several solutions to this problem, which are covered in this blog post along with their individual merits and limitations. Which of these is best for your requirements, however, might not be what was best for us, and in fact might be something else entirely.
Solution 1: embedding in a HTML page
This satisfied all the tracking requirements, as the tracking code snippets were brought in on this page as they are on every other page, and most users would not notice any difference in the experience when reading documents. However, it would present a problem if users tried to save or print using the browser menu. What happens is that the browser will attempt to print a HTML page with a PDF embedded inside it, which is very different behaviour to what happens when you print off a PDF directly. The same problem happens when saving; the browser saves the HTML page, not the PDF. For security reasons, developers cannot override the browser's functionality when users use these features.
Ultimately that meant this solution was not appropriate for our needs.
- Loads almost as fast as a regular PDF request
- Look and feel of the page seems consistent in most browsers
- Older browsers can display a direct link to the document as a fallback if they do not support embedding PDFs
- The browser's save and print functions don’t work correctly (NOTE: printing issues can be overcome with a well-written print stylesheet, but saving cannot be overridden)
- Rendering on mobile devices is unreliable
Solution 2: PDF.js
Solution 3: Analytics Measurement Protocol
- No ‘fake page’ HTML trickery to render the PDF. It can be viewed natively by any device.
- All Google Analytics data can be sent this way
- Trouble syncing up new visitors who visit the document first then a page on the website second, as the tracking ID is not retained
- Due to byte serving, multiple requests can be made for a single document, resulting in tracking the same document several times. This would need to be turned off, or compensated for.
Solution 4: interstitial page
An interesting problem this solution presented is that if the URL of the interstitial page and the final page you land on (the PDF) differ, then the user sharing that final link to the PDF or visiting it again would result in no tracking, as we have bypassed the interstitial page.
So, we must make it impossible to bypass, and to do this we simply redirect from the interstitial page to...itself!
What we can do server-side is check the HTTP Referer header, which will tell us whether this is a fresh request which we need to track, or a request which came from the interstitial page itself.
An example request would work like this:
- A user makes a request for the PDF
- The server checks the Referer, and if the Referer does not indicate that the request came from the interstitial page, then we give the user the interstitial page
- The interstitial page loads, tracks all that it needs, then reloads itself
- The server checks the Referer again, and sees that we have already come from the interstitial page (and therefore tracked), meaning we can return the PDF document
- The user receives the document they requested, and the document is handled natively by the browser
One could also use a custom header to achieve the same result as the above.
- PDFs are handled natively by whatever device requests them
- If a user shares the link when viewing the PDF, or visits it again directly, they are tracked for these actions
- The user has to view the interstitial page for a fraction of a second while they are being tracked before getting the document they want; however, you can add whatever you like to this page to help improve the user experience
- Pages cannot be cached in a simple manner, as the same URL returns a different output depending on how it is requested
It has been a long journey working out the best way to handle this, and if you have the same problem as we did, you will need to choose whichever solution suits you best - each come with their pros and cons! I cannot say if there is a ‘perfect’ method, maybe there is, and I’d love to take any advice on improving this further!