For one of our clients, a key online goal is to maximise views of their documents, most of which are in PDF format. In order to track page views and measure conversions, like many businesses do, they are using Google Tag Manager to bring in various tracking scripts such as Google Analytics.
The complication was that the majority of their campaigns involved having links in printed publications and emails sent to users on their mailing lists that would directly link to PDF files to be downloaded. As these were direct links to files, the tracking scripts in place on the website were not able to track these downloads, leaving an incomplete picture of how many people are actually accessing these documents. While there are many simple solutions which involve tracking the clicks on links to PDF files, we would not get any information about direct pageviews using only these methods.
We have explored several solutions to this problem, which are covered in this blog post along with their individual merits and limitations. Which of these is best for your requirements, however, might not be what was best for us, and in fact might be something else entirely.
The main problem with tracking PDFs is that the tracking codes used by the client rely on JavaScript, and when requesting the PDF file from the website there is no JavaScript, just the file which is to be downloaded. A solution to this problem is to create a HTML page which contains the JavaScript, and also an embedded PDF that takes up 100% of the page height and width. This gives the browser window the appearance of displaying a PDF just like it would when you view a PDF in your browser, except that it is not a PDF file, it is a HTML page.
This satisfied all the tracking requirements, as the tracking code snippets were brought in on this page as they are on every other page, and most users would not notice any difference in the experience when reading documents. However, it would present a problem if users tried to save or print using the browser menu. What happens is that the browser will attempt to print a HTML page with a PDF embedded inside it, which is very different behaviour to what happens when you print off a PDF directly. The same problem happens when saving; the browser saves the HTML page, not the PDF. For security reasons, developers cannot override the browser’s functionality when users use these features.
Ultimately that meant this solution was not appropriate for our needs.
Pros
Cons
Another route we explored was pdf.js, made by Mozilla and individual contributors, which would give our PDFs a consistent look and feel across browsers, allow us to embed JavaScript tracking, and provide improved print functionality. However, its browser compatibility was unacceptable for our needs, and pdf.js still cannot overcome the issue with using the browser’s File->Save function; it still tries to save it as a HTML page (see this example to try it yourself). The pros and cons otherwise are the same as the embedded PDF solution.
The measurement protocol for Google Analytics allows us to send data to Google Analytics server-side, without the need to use JavaScript. Using this we were able to send all the data to Google Analytics with a small amount of server-side code, before returning the PDF file to the user. This meant that the PDF could be viewed natively by any device which accesses it, and we were able to track all we needed.
While this was great, and seemed to solve all the issues, there was a problem. Google Analytics was not the only piece of JavaScript tracking we needed to take care of; we also have other tracking scripts on the site which are just as important to the client, and unfortunately these do not support a server-side solution. This ruled out the Analytics Measurement Protocol.
Pros
Cons
Finally, we created an interstitial page, which would always appear before accessing the PDF file. You may have seen this on other sites, when requesting a download for example, where a message perhaps reading something like “If your file does not load immediately, please click here” is displayed, along with a link to the file. This is commonly used for advertising space or, as in our case, to take care of some JavaScript business!
The way this would work is that you request the PDF file you want, and are given a page with some text similar to the example above, which itself redirects you to the PDF once all the JavaScript tracking has been taken care of.
An interesting problem this solution presented is that if the URL of the interstitial page and the final page you land on (the PDF) differ, then the user sharing that final link to the PDF or visiting it again would result in no tracking, as we have bypassed the interstitial page.
So, we must make it impossible to bypass, and to do this we simply redirect from the interstitial page to…itself!
What we can do server-side is check the HTTP Referer header, which will tell us whether this is a fresh request which we need to track, or a request which came from the interstitial page itself.
An example request would work like this:
1. A user makes a request for the PDF
2. The server checks the Referer, and if the Referer does not indicate that the request came from the interstitial page, then we give the user the interstitial page
3. The interstitial page loads, tracks all that it needs, then reloads itself
4. The server checks the Referer again, and sees that we have already come from the interstitial page (and therefore tracked), meaning we can return the PDF document
5. The user receives the document they requested, and the document is handled natively by the browser
One could also use a custom header to achieve the same result as the above.
Pros
Cons
It has been a long journey working out the best way to handle this, and if you have the same problem as we did, you will need to choose whichever solution suits you best – each come with their pros and cons! I cannot say if there is a ‘perfect’ method, maybe there is, and I’d love to take any advice on improving this further!
At Box UK we have a strong team of bespoke software consultants with more than two decades of bespoke software development experience. If you’re interested in finding out more about how we can help you, contact us on +44 (0)20 7439 1900 or email info@boxuk.com.