Skip to Main Content

Navigation

Minneapolis 2470 University Ave W
St Paul, MN 55114
(651) 646-0696

New York City 1216 Broadway, 2nd Floor
New York, NY 10001
(929) 322-4971

Google Analytics Data Retention: Where did my data go?!

As the deadline for GDPR (General Data Protection Regulation) compliance loomed, Google Analytics rolled out a new feature called “data retention controls.” This seemingly helpful and innocuous little setting went into effect on May 25th, 2018 and with it, some of your most meaningful Google Analytics data is now lost forever! If you have not already taken a closer look, you should do it now – like RIGHT NOW.

Google Analytics Data Retention

Disclaimer: Talk to your legal counsel.

First things first: I’m not a lawyer, and I can’t speak to your business model. It’s up to you to decide how to use the information provided here, confirm its validity, and adjust settings appropriately for your business. The GDPR establishes huge fines for non-compliance, and the reach of this European regulation is long. However, while many US companies are affected, many are not (or at least not to the extent they might think) and the consequences of compliance may be greater than the risks of non-compliance.

General Data Protection Regulation (GDPR)

Before diving into data retention settings within Google Analytics, let’s take a quick look at the GDPR. If you are not already somewhat familiar, the GDPR is a European regulation designed to clarify and expand the rights of European citizens by putting control over personal data back into the hands of the people.

Transparency, Consent, Access & Control

In broad strokes, companies that do business with people within the EU now have a responsibility to honor the rights of “data subjects” through:

  • Transparency – Organizations must be clear about what data they collect, how it is used, who it might be shared with, and how the data subject can exercise their rights.
  • Consent – Organizations must obtain clear consent from the data subject in order to capture and process personal data, and must provide a means to retract that consent.
  • Access and Control – Organizations must provide access to personal data, and must provide a means to correct, erase, and move that data.

To be clear, this is a massive oversimplification. I’ve left out quite a bit, but it should give you a fair idea of what the GDPR is all about. What makes this regulation different from those in the past is the level of specificity and the breadth of impact – from the responsibilities of organizations and the rights of data subjects, to the severity of fines which could be the larger of €20,000,000 or 4% of all annual revenue from the preceding year.

Google Analytics and Data Retention

In order to help users of Google’s popular website data collection platform ensure they are compliant with these new regulations, they introduced several new features. The feature with probably the biggest impact is Google’s new data retention settings.

Leading up to the GDPR compliance deadline, Google sent emails to users informing them about the new settings and urging them to take action. However, even today many organizations have NOT taken action or have left the settings at the default values. I have already noticed this for many of our clients.

If you have not changed your settings, any user data that is more than 26 months old is now gone forever. Worse, if you do not update these settings you will continue to lose data. So what can you do about this? Should you change these settings?

What do the Data Retention settings do?

Deep in the Admin area of Google Analytics, you will find the Data Retention settings. These data retention settings allow you to configure how long Google will store individual user-level and event-level data within the system. Every time a user visits the website, Analytics records a series of actions the user takes (like viewing a page, completing a form, making a purchase, etc.) and the date/time this action took place. Once any recorded action (and other associated data) is older than the established retention period, it will be deleted and no longer available.

There are two different settings available here:

  • User and event data retention – This setting controls the “retention period” – the length of time Google will keep user and event data. Any data older than this setting will be purged. There are five possible values here: 14, 26, 38, and 50 months, and a setting to disable it entirely “Do not automatically expire.” The default value is 26 months, or two years and two months.
  • Reset on new activity – When this setting is “ON”, it restarts the clock each time the same user visits the site. For example, if a user visited your site a month ago and then returned to the site today, the expiration date for ALL of the actions recorded from both of these visits will be reset to the current date plus the retention period. By default, this setting is “ON.”

At first glance, 26 months of data doesn’t seem so bad. Especially if the clock keeps getting reset every time the user visits again. Google’s documentation is also fairly reassuring:

Keep in mind that standard aggregated Google Analytics reporting is not affected. The user and event data managed by this setting is needed only when you use certain advanced features like applying custom segments to reports or creating unusual custom reports.

But the devil is in the details. “Aggregated reports” are the default, canned reports, but once you start customizing them to gain better insight, you are using “ad-hoc reports” from the raw data within Analytics.

How do these settings impact reporting?

Default Reports vs. Ad-Hoc Reports

The power of data provided by a platform like Google Analytics comes from the ability to slice and dice the data to answer questions and gain insights:

  • Is a visitor MORE or LESS likely to generate a sale or lead after visiting the blog?
  • How does the mobile user conversion rate compare to that of desktop users?
  • Are visitors from search more likely to return compared to visitors from referrals?

Meaningful questions like these can be answered using Google Analytics, but not with basic, out-of-the-box reports. We need to use more advanced features to generate an “ad-hoc” report. Google’s documentation clarifies the difference between “default reports” and “ad-hoc reports.” To summarize:

Default Reports

Analytics stores one complete, unfiltered set of data for each property in each account. For each reporting view in a property, Analytics also creates tables of aggregated dimensions and metrics from the complete, unfiltered data. When you run a default report, Analytics queries the tables of aggregated data to quickly deliver unsampled results.


Ad-hoc Reports

If you modify a default report in some way, for example, by applying a segment, filter or secondary dimension, or if you create a custom report with a combination of dimensions and metrics that don’t exist in a default report, you are generating an ad-hoc query of Analytics data.

Ad-hoc reports, such as those that use custom segments or a filtered table, rely on the complete, unfiltered data, and this data is now automatically deleted. In many cases, even a single day of purged data can render entire metrics useless. For example, let’s say we wanted to investigate the impact of a website redesign on user engagement: Do we see more return visits to the site after the launch of the redesign?

To do this, we could take a look at a 3-month period before the launch of the redesign and compare that to a 3-month period after the launch. In this example, the launch of a redesign took place on August 8th, 2016. Unfortunately as of the date this article was published, ALL user data before June 15th, 2016 is gone, so we only have about 3 weeks of data and can no longer perform this analysis.

This image shows what the report SHOULD look like, using data from after June 15th. This report compares the period June 15th - Sept 14th, 2017 to the same period the previous year.

Report showing valid data.

Here is what the report looks like when even a SINGLE DAY of purged user data is included. You can see that by shifting the reporting period just one day earlier, we lose ALL user data.

Report showing missing user data.

What to Do Now

As I mentioned previously, it is important to consult with your legal team about your specific scenario. In my opinion, most organizations should adjust these settings immediately in order to prevent further loss of data. While many U.S-based organizations are not affected by the GDPR, I haven’t seen anything within the GDPR that mandates a specific time period you can retain the information, or when you must purge it. In addition, compliance requires major updates to the privacy policy and without these updates your organization will remain non-compliant, regardless of your data retention settings.

Change the Data Retention Settings

To adjust these settings within Google Analytics, click on the Admin button at the bottom of the toolbar on the left. Select the account and property that you intend to edit, and then click on “Tracking Info” to reveal the additional options. Click on “Data Retention” to access the settings.

Access data retention settings.

Adjust “User and event data retention” to: “Do not automatically expire.” This setting will disable the automatic deletion of data, allowing you to start rebuilding your historical data for analysis. Once you have changed these settings they will not go into effect for another 24 hours, so you can change your mind.

Adjust data retention settings.

I hope this information has been helpful! The sooner you can make this settings adjustment, the sooner you can start rebuilding your data within Google Analytics.

Sign Up For Our Newsletter

Join to receive updates, inspiration, and news in your inbox from time to time.