The numbers are the Pearson Correlation Coefficients. r=0.9483 indicates a very good fit between the target data and the query time series. Here’s what ‘alpine touring’ looks like next to the time series that we uploaded:
A few things to note here:
- Any query containing ‘alpine touring’ will contribute to the time series for ‘alpine touring’. This includes queries like ‘alpine touring skis’, ‘alpine touring vacations’, etc.
- The data is aggregated to weekly counts. Each week goes from one Sunday to the next. The points for 2006/01/01, for example, include queries from the start of Sunday, January 1, 2006 to the end of Saturday, January 6, 2006. Google Correlate contains data starting from January 5, 2003 (the first Sunday of 2003).
- The vertical grid lines mark the beginning of each year.
- The units on the y-axis are standard deviations away from the mean. Each time series is normalized so that its mean is 0.0 and its standard deviation is 1.0. This puts all series on the same scale so that they’re easier to compare. This also explains why the ‘Winter Wave’ time series ranges from -1.4 to +1.4, even though the input series only ranged from 0 to 1.
Google Correlate only shows you positive correlations. But sometimes the negative correlations can be just as interesting. If you want to see queries which are negatively correlated with your data, just multiply your input data by -1 in your spreadsheet program before uploading it to Google Correlate.
Here are the negative correlations for the seasons time series:
0.9729 boat trailer
0.9664 trumpet vine
0.9630 golf course
0.9626 rotary mower
0.9618 gary fisher
0.9603 deck railing
0.9597 used bikes
0.9590 pig roast
0.9578 bike carrier
0.9577 course rating
So the time series for the query ‘boat trailer’ had a correlation of r=-0.9729 with the original ‘Winter Wave’ time series. As you might expect, the queries which are negatively correlated with winter are summer queries.
Holdouts and Missing Data
Sometimes you don’t have a complete time series or would prefer to hold out a portion of your data for testing. You can accomplish this in Google Correlate by putting blank values in your data when you upload it:
For example, here is the Winter Wave time series with 2006 and 2007 withheld:
If you look closely, you can see that the blue line has a gap between the end of 2005 and the start of 2008. When computing correlations, these weeks will be ignored in the time series for candidate queries. This means that, if you build a model for your time series using query data, you can use this held out portion of the time series as a test set.
Removing selected weeks from uploaded data sets is a general technique which can be used for other purposes as well. For instance, if your uploaded data has a large spike over a small time period, that spike may have a large (and unwanted) influence on the results. If you withhold the spiking weeks from your data set, you can remove their influence entirely.
Building a Model with Query Data
Note: Statistical modeling is a fine art. This example is presented simply as a demonstration of what’s possible, not as a demonstration of good modeling techniques.
Having found queries which are correlated with the winter, we can use them to build a model. Using the Winter Wave with holdout, we get a list of queries whose time series is correlated with the winter. If you click “Export data as CSV” on that page, you’ll get a CSV file containing weekly time series for the top few results.
You can import this data into a spreadsheet or your favorite numerical analysis tool to do the modeling. For example, in this spreadsheet, we built a very simple model by summing up the time series for the 20 most highly-correlated queries. We then computed the Pearson Correlation Coefficient between the target time series and the model estimates on the holdout period (2006-2007), which was r=0.979. This indicates that the query data was able to predict previously-unseen real-world data.
Of course, there are better ways to model whether it’s winter in the United States. But it is interesting that we can do so exclusively with query data. A similar sequence turned influenza data from the CDC into Google Flu Trends and there are no doubt other time series which can be modeled in a similar way.
Correlate by States
The examples thus far have worked exclusively with time series. Google Correlate can also find queries whose popularity correlates with a data set across space rather than time.
As a simple example, let’s create a data set which is 1 for every state in New England but 0 for all other states:
Here are the queries whose popularity is most highly-correlated with this New England data set:
0.9903 gorges grant hotel
0.9863 england basketball
0.9846 boston dirt dog
0.9829 new england association of schools and colleges
0.9815 new england map
0.9805 hood ice cream
0.9800 map of new england
0.9799 new england inns
0.9794 new england recruiting report
As before, these are Pearson Correlation values. But what does it mean for a query to be correlated with this US states data set? Let’s look at the maps for our New England set and the query “map of new england” side-by-side:
Left: our “New England” data set. Right: the popularity of the query “map of new england”.
The maps indicate that the query ‘map of new england’ is popular in states where our data set has a 1 and not popular in states where our data has a 0. Clicking the “Scatter plot” link on the result makes this more explicit:
The six points on the top right are the six states in New England. The smattering of dots on the lower left are the other 44 states and the District of Columbia. This makes it clear that the query ‘map of new england’ is popular in the six states in New England and nowhere else.
For the New England data set, Google Correlate brings back queries which are characteristic of the New England region. If you have a data set which can be broken down by state, uploading it to Google Correlate may give you insight into some of the driving factors behind your data.
The same techniques discussed for the time series examples also apply to states correlation. If you don’t specify a state then it will be held out. In particular, it is often useful to hold out the District of Columbia which is an outlier in many data sets.
Google Correlate makes an attempt to filter out queries which are unlikely to be interesting. These include:
- Queries with a low correlation value (less than r=0.6)
- Misspelled queries
- Pornographic queries
- Rare queries
- Queries which only correlate with a small portion of the time series
For more information about the filtering operations performed by Google Correlate, please refer to the Google Correlate Whitepaper.
Protecting User Privacy
At Google, we are keenly aware of the trust our users place in us, and of our responsibility to protect their privacy. Google Correlate can never be used to identify individual users because we rely on anonymized, aggregated counts of how often certain search queries occur each week. We rely on millions of search queries issued to Google over time, and the patterns we observe in the data are only meaningful across large populations of Google search users. You can learn more about how this data is used and how Google protects users' privacy at our Privacy Center.
What is Google Correlate?
Google Correlate is a tool on Google Trends which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter.
What information is provided by Google Correlate?
Google Correlate uses web search activity data to find queries with a similar pattern to a target data series. The results can be viewed on the Google Correlate website or downloaded as a CSV file for further analysis.
How is Google Correlate different from Google Trends or Google Insights for Search?
Google Correlate is like Google Trends in reverse. With Google Trends, you type in a query and get back a data series of activity (over time or in each US state). With Google Correlate, you enter a data series (the target) and get back a list of queries whose data series follows a similar pattern.
How up to date is the information provided by Google Correlate?
Google Correlate contains web search activity data from January 2003 to present. This data is updated weekly.
What’s the difference between comparing US states and comparing time series?
Google Correlate lets you search in two different ways. The US states option lets you find queries which have similar state-by-state patterns. The time series option lets you find queries that have similar patterns across time. For more information, see the Google Correlate Tutorial.
How can I use the information I find on Google Correlate?
You're free to use any of the information you find on Google Correlate, subject to the Google Terms of Service. Please attribute it to Google as follows: "Data Source: Google Correlate (http://www.google.com/trends/correlate) ".
What are the units for the data in the chart and CSV file download?
The units are standard deviations above mean. If you download any data, you should find that all of the time series have mean 0 and standard deviation 1. For more information see this Wikipedia article on standard score.
How should I format my CSV file for upload to Correlate?
For us-weekly data, your csv should be of the form:
where YYYY-MM-DD are weeks starting on Sundays.
For us-states data, your csv should be of the following form:
Note that for both CSV files, comment lines (starting with a # character) are ignored.
How will Google use data sets that I upload to Google Correlate?
We will not use any of your content for any purpose except to provide you with the Google Correlate Service, monitor traffic and detect spam and fraud. See the Terms of Service for more information.
This tool makes search information public. What about my personal search data?
Your personal search data remains safe and private. Our graphs are based on aggregated data from millions of Google searches over time. Moreover, the results Google Correlate displays are produced by an automated system. See our Privacy Center for more about how we use search query data.
How do you determine the location of a query?
Google Correlate uses IP address information from our server logs to make a best guess about where queries originated.
Can you tell more about what Google does with my personal search data?
Please read more at the Google Privacy FAQ.
How has sampling changed for Google Correlate?
In December 2011, we added support for time series correlations on a number of new countries. As part of this change, we reduced our sample size for US states and US time series to match that of the other countries. While this does not have much of an effect on popular queries, it may cause a noticeable increase in variance for queries with lower volumes.
Have an interesting finding you want to tell us about? A technical problem? A feature request? A question not answered by this FAQ?
Please send us feedback!
Google Correlate for web traffic analysis
Both my partner and I were asking: what factors influence website traffic? How does one find any correlations in business intelligence related to organic searches? This post was born out of my attempt to join together both traffic data from the business blog (data source being Google Analytics) and real organic queries done in Google, in order to get some insight into which items my traffic correlates with, in specific how these items ( i.e. those which people are searching for) might have influenced website traffic.
The Google Lab project called Google Correlate is not as well known to most website owners as Google Analytics or Google Adwords are.
Google Correlate trial idea
Google has not just indexed the whole web but also, being the leading search engine, has accumulated organic search term queries over the years. These search terms vary in popularity over time. Why not use Google’s courtesy to find correlations between the organic traffic flow of my site and these organic queries? Now I can get the site’s time series from Google Analytics and insert it into Correlate Labs to find out what the “chemistry” is. What’s going to be the result? The relevant organic queries that are closest to my site incoming traffic data in time span. This might give me more insight into what has influenced the traffic, and what I might do to improve the content strategy or the AdWords campaign direction.
Of course there might be some correlations with weird search terms, yet it does not eliminate the overall relationship between people’s online searches and the organic website traffic. In addition, Google Correlate gives an opportunity for any arbitrary search query to find correlated queries based on the same Pearson’s linear correlation coefficient.
1. Prepare data
The Google Correlate tutorial is here. First thing is to upload and save the site’s web traffic data from Google Analyitcs onto a spreadsheet (of course you can export other analytics tool data if preferred). If you want to evaluate only search traffic visitors for correlation, then in your account at Google Analytics dashboard go to Traffic Sources -> Sources -> Search -> Overview. Then at the Chart upper menu click Export -> Google Spreadsheets.
The raw data appear in Google Docs in this way:
Remove the header lines and preceding empty lines. Then manually enter the dates for the first and second week like 10/6/2012 and 10/13/2012 and, selecting both, auto fill them down. Don’t forget to delete the last sum value in the visitors column. Now your data should look like this:
Your data/time series is ready for evaluation.
2. Upload data
Now Ctrl+A (Cmd + A) and Ctrl + C to copy entire set and open Google Correlate in new tab. In Google Correlate Labs on the upper line click “Enter your own data” link to upload the time series.
Now you just choose the right time interval, whether Monthly Time Series or Weekly Time Series, and in the Edit area (at bottom) insert the copied data set. Unfortunately the Google Lab has not yet provided a direct way for data upload from the Analytics account. So now your data are in the correlation engine, which is ready to start.
3. Correlation computing
Google Correlate will compute the Pearson Correlation Coefficient between any given time series and the time series for every query in its database. The queries that Correlate engine shows you are the ones with the highest correlation coefficient (i.e. closest to r=1.0). Results with the highest coefficients give useful insight. Thus the search engine associates the data you provided with organic search queries from Google database. The result might not look very encouraging, yet it still gives food for brainstorming on how to relate with some public searches, improving requests for content, developing strategy and so on. In my case I got some information about seasonal traffic changes and new key words for exploring.
4. Correlation by State
Google Correlate also provides search term statistics for US States. Here the popularity of the search terms correlates with a data set related to a certain US area. From the Google tutorial page: “Search terms are often popular in some states and less popular in others. To find terms whose pattern of activity across the United States reflects your own US states dataset, enter your data using the link above”. In this case you need to choose ‘US States’ and enter or upload custom data to get the information about which search queries are popular in which states. Please read the tutorial to see how to do it.
In the next post I’ll share on finding the frequency correlation for any given search term with other database located search terms queries using the same Google Correlate.
This experimental tool in Google Labs has been a handy means for finding data relations in time series and even time series filtered by areas (limited by US states so far). This is by no means the most powerful web traffic information tool, but it might provide some hints for seo and business bloggers regarding other factors which influence web traffic in addition to traffic analytics data and search statistics from Google Webmaster Tools.
How To Use It For Marketing, SEO & Content
Google Correlate is a tool that Google rolled out in 2011. It’s a close cousin to Google Trends – it’s actually Google Trends in reverse (and just as powerful).
Since it’s launch it has received some buzz and become a useful tool among academics, but it’s never become a part of the standard toolset among marketers the way Google Trends has.
There are a lot of reasons for that. Google hasn’t publicized Correlate the way it has the general Trends toolset. Correlate even has features that sporadically break. But I think the slow adoption is because marketers don’t realize what the potential of Google Correlate – or even how it works.
According to Google, it is –
a tool on Google Trends which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter.
In other words, the pattern generates the keywords rather than the keywords generating the pattern. So you do have to think in reverse.
Before we go into specific use cases for Google Correlate – a couple notes on caveats.
First, Google Correlate does not pull absolute search volume. Just like Trends, it is based on share of total volume. All terms are relative to each other. You still need to use Keyword Planner to find search volume.
Second, correlation does not equal causation. Just because terms correlate with each other does not mean they share a causal relationship. There’s a lot of noise in the Correlate data, but plenty of hidden gems too. You’ll have to use best judgement.
Here are five ways to use Google Correlate and integrate it into your marketing research toolset.
Market & Persona Research
In Google Correlate, you can find correlations across time AND space. If you trying to define your target personas for a new product or service, you can use each correlation to get deep insights about your audience.
First – finding audience correlations across space.
Sounds generic and boring, right?
What if I told you that you could find out what US State to start rolling your product out with Google Correlate? That’s what the Compare US States feature is for.
Imagine you are trying to roll out a new Pu-erh tea brand. You want to focus your marketing efforts on a geographic area.
Put Pu-erh tea into Google Correlate. Select the term that you think most aligns with it (chinese herbal in this case), and Google Correlate will show what states most closely correlate with those interests.
Today you learned that Washington state has an outsized interest in Pu-erh Tea and chinese herbals – and that New Mexico looks like a very interesting test market as well.
Imagine you have a gardening blog and are writing about tomatoes. You need to know how to best customize growing directions. Head to Google Correlate and search “grow tomatoes” select a complementary term such as “how to grow tomatoes” and check the map.
You just learned that the southern Plains and western Southeast correlate most closely for growing tomatoes.
Second, from those same searches, you can use the correlations for a window into your personas.
Look at the tomatoes search again –
Note how highly the terms “Mid size truck,” “online homeschool,” “prophecy,” “lyrics amazing grace” all correlate with “grow tomatoes.”
Those terms alone can define a highly specific reader persona. If you can’t figure out the audience for your gardening website, it’s probably someone who drives a mid size pickup truck, lives in Oklahoma, homeschool’s her kids, is a Baptist Christian, and reads books like Left Behind. She has plenty of land, sun and water for growing tomatoes, has a DIY streak, and is conservative politically.
Now you know who you are writing for.
For your Pu-erh tea startup, let’s look at it again –
Alright so your target persona is someone who lives in the Western US. He loves Anthony Bourdain, high end photography, and experimenting with new tea brands. He is into astrology and all aspects of traditional Eastern thought. He’s very left-leaning politically and a bit conspiratorial. He’s also likes organic certification on all products from soaps to teas.
Don’t guess at personas and market research. Use Google Correlate.
Content Strategy & Buyer Journey Mapping
“Buyer Journey Mapping” is marketing jargon for “people do research before they buy.” In the traditional marketing funnel, customers become aware of your product, they move to interested, then to desire then to action. At each stage they have different questions and concerns.
Marketing campaigns and content strategies are generally built around one or a sequence of stages. If you want to target customers in the interested phase, then you’ll create content that focuses on how to use a product. If you are targeting customers in the desire phase, then you’ll create comparison or get a deal type content.
Either way, the goal is to always be there as customers are moving down the funnel. If you’re there from the time they become aware through interest and desire, then you’ll be the one they buy from.
Google’s Correlate Time Series shift is a perfect fit for this type of research. It works best for seasonal companies. However, any business can use it provided you have some sort of time cycle or event.
The simplest example is a costume store. People buy costumes a little before Halloween. What type of content could you publish to get in front of people in the weeks leading up to Halloween?
To do that we’ll shift the time series back by 1 week.
And let’s go 1 month.
If I were a Halloween retailer, I would invest in evergreen content around local festivals and helping plan parties & crafts. Both allow you to get in front of customers and place a retargeting pixel on their browser a few weeks before the sudden rush to buy costumes.
Note that Google Correlate’s data is very sparse. You have to look at it as a whole to create content ideas that make sense.
For industries that aren’t seasonal – you’re not out of luck. You just have to focus on industry events or anything in your industry that might have a time window.
Once you have that “hook,” it’s just a matter of cycling through time series to find ideas.
Google Correlate will not give you the exact keyword to target, but it will give you plenty of general ideas to execute on.
Suppose you are a phone retailer. People may buy phones in December or September more than other months, but the business has demand throughout the year.
To create a content strategy or a buyer journey map, you have to pick a term that would have a definite pattern that is related to your business.
Let’s look at iPhone 6. It launched and had a lot of buzz at a specific point in time.
The week of launch has the highest correlation have plenty of “bottom of funnel” ideas. You need to have plenty of content on specific questions.
But let’s shift back 2 weeks.
Now this type of data is very interesting. You could use this for content strategy or for timing a retargeting campaign.
Promote content around simple ways to replace your iPhone 5 battery, then run retargeting on that audience for 2 weeks later to buy an iPhone 6.
You can also shift the time series forward to get ideas on upsells or content to retarget to purchasers of your product.
There’s noise, but still plenty of opportunity with the right hooks in Google Correlate.
General Keyword Research
Keyword research is foundational to search engine optimization. But it’s also gotten more difficult in the past few years with (not provided), the shift to Keyword Planner and the death of Google Autosuggest API.
The best keyword researchers I’ve seen do a couple things differently than others in the industry.
First, they generate a ridiculously huge list of potentially useful keywords to curate.
Second, they research laterally, digging into topics and platforms that are related to but not exactly similar to the set of common sense target keywords.
Both techniques lead to finding high volume, high quality keywords to target while also providing an angle of attack so that you aren’t competing head to head against the Amazons and Wikipedias of the world.
Google Correlate helps on both.
First, Google Correlate generates lists that correlate with your target keyword by default. It won’t generate hundreds and hundreds that you can copy and paste. But, it does offer a CSV export.
What I like to do is to take my initial “seed” keyword list, and run several through Google Correlate to see what it generates. I’ll export all the keywords and combine it with other sources like Reddit, Wikipedia, and others to get a giant list to edit down.
Second, Google Correlate generates lists that, obviously, correlate. By definition, these are terms that have the same search trends as your target keyword, but are different.
It’s very useful in industries that you aren’t very familiar with. Again, enter your known target keyword into Google Correlate and see what shares a pattern.
Instead of exporting them right off though, look through the terms that complement or share some relation to your target keyword.
There will be a lot of noise, but also lots of hidden gems that you would have never found otherwise.
These are the type of suggestions you’re looking for in keyword research. Productivity tip – there’s a link to Google Search by each term.
If you’re the type of SEO to go beyond just keyword planner, start using Google Correlate in your research.
Find Trending Topics
If you are trying to develop content around a trending topic, Google Correlate can provide great insight, especially since you can “teach” it to give you the exact type of trending topic.
First, you need to define a trending pattern, so a pattern that goes rapidly up and to the right in a short timeframe.
Second, you upload your own data or upload a drawing for a custom made pattern, and let Google find keywords that match that correlation.
This will show terms that had a rapid rise, a plateau, then a surge of new interest. You’ll get some noise, but all the terms that you see will have the same trending growth pattern.
Other Complementary Uses
Google Correlate also makes a great complement to other tools. You can use it to make informed decisions on everything from PR outreach to cross-sells in your online store.
For example, take the ideas from the marketing funnel earlier. A lot of PR outreach has to do with planning and timing. You’re not going to get a Christmas PR placement if you do outreach in December. You can however, use Google Trends and Google Correlate together to pinpoint how interest moves through the year.
For cross selling products, use Google Correlate to see what products have the same search patterns as your main product. Factor those into your recommendation engine.
As a corollary to persona research, you can use Google Correlate for competitive analysis. Search for big brands or competitors and see what terms align with those branded searches.
Lastly, keep in mind that you can customize and search based on a single time series to get cleaner or different data. For example, search IKEA correlations only in the Fall:
Google Correlate is an incredibly powerful tool. It’s counter-intuitive to use, but can be as useful as more well-known tools like Google Trends or Google Autosuggest.
The best way to learn to use it is go use it and play around with it.
Have other interesting uses? Let me know via Twitter or email!
Google Correlate: New Google Tool Is Google Trends in Reverse
Google has just released a nifty new addition to Google Labs. Today we welcome Google Correlate.
Google Correlate – The Opposite of Trends
While Google Trends and Google Insights allow you to enter a search term and see the search trends, Google notes that:
Researchers told us they want to enter the trend of some real world activity and see which search terms best match that trend. In other words, they wanted a system that was like Google Trends but in reverse.
Google Correlate is the response, allowing you to upload your own data series and get back a list of search terms that correspond with the real world trend.
Taking Google Correlate for a Test Drive
Upon first learning about Google Correlate, I was a little more than confused; science has never been my strong suit. But the Google Correlate Tutorial is very helpful in explaining the details of how Google Correlate works in (mostly) layman’s terms.
Excited to experiment with Google’s latest plaything, I was at first put off to find that you are prompted to input your own time data series. Being quite allergic to anything more scientific than Bill Nye the Science Guy, naturally I don’t just have time data lying around.
No worries, Google thought of that.
What you can do instead is put in your own search query, and then Google Correlate creates time series data based off of the query. Google has a nifty little comic book form to explain:
Correlating with “Sunflower”
Testing this process out, I used the term “sunflower.” Google Correlate converts my query into a time series and pops back a list of queries that correlate when “sunflower” is searched.
It seems appropriate that terms like “dress for a wedding” and “a bike” correlate with “sunflower,” all reminders of spring time.
When highlighting the term “a bike” Google Correlate shows me a line chart below. Not only do I get to see how the two terms correlate, I also get to see how searching for these terms has generally increased quite a bit over the years as people go to Google more often to answer their questions.
Correlations by US State
Google Correlate also has the option to look at queries whose popularity correlates with a data set across space instead of time, using US states.
Google uses the example of creating a data set which is 1 for the New England states and 0 for all other states.
Using this data set, you are shown queries that correlate with New England. Google had the New England data all ready for me, so I used that and found that, unsurprisingly, “lobster cooking time” was a popular search query.
Selecting that query, Google Correlate gives a map on which you can scroll over and see which states are most connected with this query.
Known for their lobster paraphernalia, Maine was the New England state with the most “lobster cooking time” queries.
In addition, I can view that same data in a scatter plot if I choose.
Google Correlate will be a big help for researchers, and could be a handy tool for some marketing strategies as well.
Are there certain times of the year people are more likely to search for your product?
What queries are being searched alongside yours?