Solving the Google Analytics Sampling Problem – Without GA Premium

If you’ve got a lot of site traffic then you’ve more than likely experienced sampling in your Google Analytics reports. Typical solutions to solve this issue include Google Analytics Premium, filtered views, or changing data ranges. Premium can be the right solution, but only if you really need the other features it offers. Being forced to change your views, date ranges, or other hacks creates disconnected data. Not to mention, data is still sampled in the interface with Premium. You must export it to Big Query to view the unsampled data, which means you have lost the visualization component of reporting completely.

There is one solution that eliminates the need for Premium and keeps all your data intact, without sampling. We call it ChannelMix, but really any data management platform will do.

To understand how ChannelMix eliminates sampling, lets first clarify what it is and how Google Analytics is using it.

What is sampling?

Sampling is a common technique used in statistics to infer an outcome of a larger group using data gathered from a representative subset. In other words, instead of obtaining data from an entire population, sampling allows analysts to extrapolate trends using only a small group.  

VotingTo give some timely context, think of it in terms of presidential public opinion polling. In many cases, it would be impossible (or at least wholly unrealistic) for pollsters to gather data from everyone within the state of Missouri, for example, in the run up to the primary elections on March 15th. Instead, they often use the views of only a few thousand randomly selected individuals to represent the opinions of the over 4 million registered voters in the state.

The obvious problem with this, as you would imagine, is that the data is prone to mistakes–namely, something called sampling error. This term represents the statistical imprecision that results from using only a fraction of the pollable population, leaving a margin range with upper and lower bounds of the predicted result. For example, if a candidate is polling at 30 percent, they may have a margin of error range of +/- 4 percent. This suggests that, when the official ballots are counted, the candidate may actually have an outcome of anywhere between 26 and 34 percent of the vote. That’s a massive swing!

Cool story, bro. But how is this relevant to Google Analytics?, you might be saying to yourself. Well, Google Analytics often uses these same statistical principles and techniques when dealing with your data.

How does Google Analytics use sampling?

For Google Analytics, “sampling occurs automatically when more than 500,000 sessions (25M for Premium) are collected for a report.” Sampling is handled this way for a practical reason: It allows Google Analytics to speed up the query process and produce reports on large datasets faster. The problem with this is that if you want to manipulate the data through custom reports, filters, or segmentation, it’s going to be managed through sampled data. And the more you manipulate the sampled data, the less accurate it becomes. Thus, if you’re looking at a report of ad revenue generated by your site, it could be wildly inaccurate and may not provide you with a clear picture of how much money you’re bringing in over a given period. That’s because it’s extrapolating trends based on fractions of the data collected, leaving it with a potentially wide margin of error.

SOLUTION: ChannelMix

With ChannelMix, data is being collected each night via the Google Analytics API and securely stored in our data management platform. This method eliminates sampling since the data is obtained at a daily level (or multiple times a day, if necessary), so it won’t go over the predefined thresholds for sampling. Therefore, every dataset that you’re using through ChannelMix is its purest and most accurate form, leaving you with unparalleled accuracy with your reporting data. You can use any data visualization tool you like to build reports and perform analysis, all for a fraction of the cost of Premium.

Interested in learning how ChannelMix can help you sidestep this sampling issue?
Contact us today.