Google Analytics Sampling Drastically Under Reports AMP Traffic

If you’re a free Google Analytics user, you’ll want to read this.

Free Google Analytics accounts sample data when you have more than 500,000 sessions in a given date range or comparison window. Sampling is a method to save processing time and to speed up reporting. However, for clients who report AMP and canonical traffic to the same property, sampling in Google Analytics can under report AMP traffic by nearly 50%.

While AMP traffic is under reported, the difference is made up by other traffic sources, such that the total number of sampled sessions is within a reasonable margin of error. For example, organic AMP traffic may be 47% lower than it ought to be, while Pinterest, Reddit and non-AMP organic will be 5-10% higher than usual.

This is an unfortunate pattern that can make AMP performance appear artificially low.

We reported the issue to Google in January 2018 and while we’ve sent examples to verify our findings, we haven’t heard of any updates nor seen changes to sampling.

What Is Sampling?

Sampling is a method that takes a subset of your data with the reasonable assumption that it’s statistically close to the entire population. It improves data collection performance, and when done correctly does not degrade output too much. For instance, rather than ask all 7.4 million Washingtonians if they’re Seahawks fans, a sample of a few thousand Washingtonians could answer the question with some level of accuracy.

When data is sampled, there’s some margin of error, based on the population size and standard deviation from the estimated value. As the size of the sample increases, the margin of error tends to shrink, meaning you’re closer to what is likely the true value.

For Google Analytics, they take samples of data when the entire data set contains more than 500,000 sessions. These samples tend to have an error of 2-7% and, as the full data set grows, there’s potential for greater error. However, we’ve found that while other data sources within the sampled set fall within a reasonable error, AMP traffic sees an excessive error of 40-50%.

How Did We Determine Google Analytics Under Reports Sampled AMP Traffic?

At the start of 2018, when implementing new Google’s AMP ClientID, we began updating clients’ AMP analytics tags to report to their canonical property. Soon after, one of our clients emailed us, frustrated that their year-over-year organic, mobile traffic was down. We investigated their total mobile clicks in Search Console and saw that year over year traffic was actually increasing, with AMP leading the way in terms of clicks, position and CTR. This client was reporting AMP traffic to both the canonical and AMP analytics properties, we noticed a discrepancy between the properties.


Next, we compared the two analytics session counts with Search Console to determine which beacon was more accurate, and found the original AMP Analytics to report sessions far closer to Search Console clicks.

Shortening the time span, so data was unsampled, we saw daily organic clicks rebound to closely match Search Console and the AMP analytics property:

DateSearch Console ClicksUnsampled AMP Analytics SessionsUnsampled Canonical GA SessionsSampled Canonical GA Sessions
2018/1/110,03210,61910,5266,171
2018/1/29,0299,4009,3104,556
2018/1/38,8049,1709,1014,758
2018/1/49,1109,4959,4274,498

 

The sampled AMP sessions are 40-50% lower than  the unsampled “true” value.

After identifying the sampling error, we looked into sampled traffic for other clients and saw the same issue emerge.

What Does This Mean for AMP Users?

Any sampled data in GA that contains AMP traffic is going to be suspect, at best, and misleading, at worst. In light of this sampling issue, we recommend being extra vigilant and making sure date ranges aren’t reporting sampled when reviewing AMP traffic. If you look at a date range with 500,000 sessions or more, sampling will kick in and your AMP traffic will look woefully anemic compared to the unsampled equivalent.

For some smaller sites, Google’s sampling could be close to the full set. The sampled report displays the sample rate when you highlight the yellow shield at the top of the screen:

But as the date range and session size extends, that percentage will get lower:

What to do Going Forward

AMP traffic under reporting when data is sampled can be frustrating, but there are some ways to work around the issue and maintain reporting accuracy.

Dual-report AMP traffic to both AMP and canonical Analytics properties

A free workaround is to have a secondary Google Analytics property that receives just AMP traffic.This way you can get basic metrics, such as sessions, users and page load speed. Engagement metrics on the canonical Analytics property may not be 100% accurate, but they’ll be close enough.

Keep your date ranges short enough to not be sampled

If your site doesn’t get a ton of traffic, this is pretty easy and straightforward. But we have clients where sampling occurs with just a few days of traffic and so there’s potential for small variance getting misconstrued as major wins/losses.

Stitch together unsampled date ranges

If your reports need to cover quarters or any other date range that sampling is enabled, breaking it down into smaller unsampled bits, then adding them back together may be the best option. It can also be automated, using the Google Analytics API or the Google Analytics Spreadsheet Add-On.

Look at unsegmented traffic

Sampling only occurs when additional segments, dimensions or filters are added, and there are some ways to get toward your AMP traffic performance without having to apply segments or filters.

The device report (Audience->Mobile->Devices) allows you to see mobile and desktop traffic without sampling. However, this is all mobile performance, not isolated to show organic or other traffic sources.

In most reports, you can set the hostname as the primary dimension. This way, you can see AMP performance, specifically, though the non-AMP hostname will contain mobile, tablet and desktop traffic.

Analytics 360

Another solution is to upgrade to Analytics 360, which starts at $150,000 a year and only samples when there are more than 100 million sessions in a date range.

WompMobile

You can always turn to WompMobile, the world’s leading AMP development agency, which services more than 15-million AMP pages each day. Our platform provides scalability and ensures AMPs are fully featured, robust and guaranteed to be fast. As part of our service, we provide ongoing analytics reporting to accurately track and  prove the value of AMP.

Conclusion

Data sampling in Google Analytics can lead to inaccurate AMP performance reporting. We’re continuing to work with Google to find a solution to this issue, and hopefully we’ll find resolution soon. In the meantime, the methods above can help you find the accurate data you’re looking for. The good news, when implementing the above workarounds, WompMobile consistently sees increased traffic, engagement and conversions when AMP is added to a property and measured correctly.