Guide: A Practitioner’s Tips for Reducing the Impact of Sampling
Do you have large data sets being collected via Google Analytics? If you’ve got more than 250,000 visits within the time period you are analyzing, then you will likely encounter sampling in Google Analytics. Why? Sampling occurs for data sets larger than 250k visits in a single property (or 500k if you maximize the sampling ratio) in order to reduce processing times for data via the interface and allow marketers to view their data and make decisions more quickly.
Sampling alert message:
While the speed to insight is great, at times, the sampling ratio can make it more difficult to analyze your data in the most efficient way. In order to get the most out of my data, I’ve invested time and effort in various ways of reducing the impact of sampling:
1. Remove unnecessary pages or sites from being tracked in your analytics properties.
Streamline the data being collected via each unique property to ensure your visit base is not elevated higher than necessary for the information you are trying to analyze. By removing unnecessary pages from the property you may reduce sampling rates as the total visit numbers per time period will be lower with fewer tracked pages.
2. Create regional vs global views of data in different properties.
This can be done in a few ways, either by double (or triple, etc) tagging pages with multiple property IDs, or by using a tag management system.
3. Use GTM to send visit data to multiple properties, filtering out unnecessary duplicate data.
Building on the previous tip, I’ve found Google Tag Manager to be an awesome tool to help with sampling. By sending to multiple analytics properties (enabled by GTM), I can have a global view (which likely will encounter more sampling) as well as more filtered views, either by campaign, website, region or other views. This has enabled me to give regional leads access to nearly unsampled data at the country and region level because the base of visit data is much lower than the global property.
(Bonus) If you have GA Premium, export unsampled reports.
Google Analytics Premium customers have access to unsampled downloads for many reports. These reports are a great addition to the above outlined strategies for reducing the impact of sampling.
*please note that thoughts & tips are my own and not of my employer
It is worth noting that for users who double or triple tag their websites, there is a 500 hit per session limit, which can be encroached upon relatively quickly with more than 2 trackers.
If all the above is not enough (also note that GA premium is limited to 3M rows of data) you can use RGA, an R library that connects to the GA API, gets data by day (to avoid hitting sampling) and then adds all up. This way you can request, 1 or 2 years worth of data, well beyond 3M rows GA Premium or 1M row GA standard allow. 😀