For one of our clients, a key online goal is to maximise views of their documents, most of which are in PDF format. In order to track page views and measure conversions, like many businesses do, they are using Google Tag Manager to bring in various tracking scripts such as Google Analytics.
The complication was that the majority of their campaigns involved having links in printed publications and emails sent to users on their mailing lists that would directly link to PDF files to be downloaded. As these were direct links to files, the tracking scripts in place on the website were not able to track these downloads, leaving an incomplete picture of how many people are actually accessing these documents. While there are many simple solutions which involve tracking the clicks on links to PDF files, we would not get any information about direct pageviews using only these methods.
We have explored several solutions to this problem, which are covered in this blog post along with their individual merits and limitations. Which of these is best for your requirements, however, might not be what was best for us, and in fact might be something else entirely.
This satisfied all the tracking requirements, as the tracking code snippets were brought in on this page as they are on every other page, and most users would not notice any difference in the experience when reading documents. However, it would present a problem if users tried to save or print using the browser menu. What happens is that the browser will attempt to print a HTML page with a PDF embedded inside it, which is very different behaviour to what happens when you print off a PDF directly. The same problem happens when saving; the browser saves the HTML page, not the PDF. For security reasons, developers cannot override the browser’s functionality when users use these features.
Ultimately that meant this solution was not appropriate for our needs.
An interesting problem this solution presented is that if the URL of the interstitial page and the final page you land on (the PDF) differ, then the user sharing that final link to the PDF or visiting it again would result in no tracking, as we have bypassed the interstitial page.
So, we must make it impossible to bypass, and to do this we simply redirect from the interstitial page to…itself!
What we can do server-side is check the HTTP Referer header, which will tell us whether this is a fresh request which we need to track, or a request which came from the interstitial page itself.
An example request would work like this:
1. A user makes a request for the PDF
2. The server checks the Referer, and if the Referer does not indicate that the request came from the interstitial page, then we give the user the interstitial page
3. The interstitial page loads, tracks all that it needs, then reloads itself
4. The server checks the Referer again, and sees that we have already come from the interstitial page (and therefore tracked), meaning we can return the PDF document
5. The user receives the document they requested, and the document is handled natively by the browser
One could also use a custom header to achieve the same result as the above.
It has been a long journey working out the best way to handle this, and if you have the same problem as we did, you will need to choose whichever solution suits you best – each come with their pros and cons! I cannot say if there is a ‘perfect’ method, maybe there is, and I’d love to take any advice on improving this further!