Polish specialist Max Cyrec conducted a three-month SEO experiment, analyzing the work of different types of internal links. We translated his article on the experiment, the conclusions, so that you can use these life hacks on your sites.
From the article you will learn:
3 months of experiments. What games did I play with Googlebot
Discussions on how Googlebot works – what it can and cannot see, which links it follows and how it affects SEO – often flare up on online forums and Facebook thematic groups. In this article I will tell you about the results of a three-month experiment.
Almost every day, Googlebot came to me like a friend for a beer.
Sometimes he was alone:
(02/09/2018 18:29:49): 126.96.36.199 /page1.html Mozilla / 5.0 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html)
(02/09/2018 19:45:23): 188.8.131.52 /page5.html Mozilla / 5.0 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html)
(02/09/2018 21:01:10): 184.108.40.206 /page3.html Mozilla / 5.0 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html)
(02/09/2018 21:01:11): 220.127.116.11 /page2.html Mozilla / 5.0 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html)
(02/09/2018 23:32:45): 18.104.22.168 /page6.html Mozilla / 5.0 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html)
Sometimes he brought friends with him:
(09/16/2018 19:16:56): 22.214.171.124 /page1.html Mozilla / 5.0 (X11; Linux x86_64) AppleWebKit / 537.36 (KHTML, like Gecko; Google Search Console) Chrome / 41.0.2272.118 Safari / 537.36 (09/16/2018 19:26:08): 126.96.36.199 /image.jpg Googlebot-Image / 1.0
(08/27/2018 23:37:54): 188.8.131.52 /page2.html Mozilla / 5.0 (Linux; Android 6.0.1; Nexus 5X Build / MMB29P) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 41.0. 2272.96 Mobile Safari / 537.36 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html) And we had fun playing different games:
Catch Me If You Can: I watched how Googlebot likes to follow 301 redirects, view pictures, and run away from canonical links.
Hide and Seek: Googlebot was hiding in hidden content – which, according to his parents, he disapproves of and avoids.
Survival Game: I prepared traps and waited for him to fall into them.
Obstacle race: I set up obstacles with different difficulty levels to see how my little friend handles them.
As you can guess, Googlebot did not disappoint me. We had a lot of fun and became good friends.
Let’s get down to business.
I created a site for a travel agency that offers interstellar flights to uncharted planets in our galaxy and beyond.
The content had many virtues, but in fact the text was a bunch of nonsense.
The structure of the experimental site looked like this:
I provided unique content and made sure that every anchor / header / alt tag, as well as other coefficients, were globally unique.
In the description, I used the words anchor1, or anchor1, etc. While you are reading the article, open the image with the structure of the site in a separate window – it will be more convenient for you.
Part 1: First Link Rule
I wanted to check the rule that out of several links to the same page, Google only considers the first. I wanted to know if this rule should not be followed, and how it affects optimization.
According to the rule, if you have two links to the same subpage on the same page, the second will be ignored. Googlebot will ignore the anchor in the second and each subsequent link, meanwhile, determining the position of the page for the search engine.
This is a problem that is often overlooked. It is often present in online stores, where the navigation menu greatly distorts the structure of the site.
Most stores have a static drop-down menu that gives, for example, 4 links to the main categories and 25 hidden links to subcategories. When marking the structure of the site, Googlebot sees all the links on each menu page. This leads to the fact that all pages are equally important, and their weight is distributed evenly. It looks something like this: The most common, but wrong, in my opinion, structure
The example in the picture cannot be called the correct structure, because all categories link to each other from all pages where there is a menu. Therefore, the home page, all categories and subcategories have an equal number of inbound links, and the weight of the entire site is distributed evenly between them. Thus, the weight of the homepage is divided into 24 categories and subcategories. Therefore, each of them receives only 4% of the weight of the home page.
What should the structure look like:
If you need to quickly test the structure of the page and view it like Google, use the service Screaming frog.
In this example, the weight of the homepage is divided by 4, and each category receives 25% of its weight. Then the categories distribute the weight between subcategories. So internal linking also gets better.
For example, if you write an article on an online store’s blog and want to link it from one of the subcategories, Googlebot will notice a link while browsing the site. In the first example, he will not do this because of the rule of the first link. If the link to the subcategory was in the site menu, the link in the article will be ignored.
I started this SEO experiment by doing the following:
First, on page1.html, I added a link to the subpage page2.html as a classic dofollow link with anchor1.
Then, in the text on the same page, I added a slightly modified link to check whether Googlebot would view them.
I tested the following solutions:
On the service’s home page, I assigned one external link dofollow to the phrase with the URL in the anchor, and this accelerated the indexing.
I waited until the page page2.html started ranking by the phrase from the first dofollow (anchor1) link coming from page1.html. This fake phrase or any other that I tested was not found on the landing page. I assumed that if other links worked, then page2.html would also rank in the search results for other phrases from other links. It took about 45 days. And then I managed to make the first important conclusion.
A site where the keyword is neither in the content nor in the title meta tag, but is tied to the requested anchor, can be ranked higher in the search results than the site that contains this word, but does not link to it.
Moreover, the homepage page1.html, which contained the requested phrase, was the strongest page in the web service. 78% of subpages referred to it. But it was ranked lower for the requested phrase than the page2.html subpage to which the search phrase refers.
Below are 4 kinds of links that I tested. All of them follow the first dofollow link leading to page2.html.
Link to a site with an anchor
The first of the additional links in the dofollow link code was a link with an anchor, or hashtag. I wanted to see if Googlebot looked at the link and indexed page2.html under the phrase anchor2. This is despite the fact that the link leads to page2.html, but the link that changes to page2.html # testhash uses anchor2.
Unfortunately, Googlebot did not remember this connection and did not pass the weight to the page2.html subpage for this phrase. As a result, in the search results for the phrase anchor2 there was only one subpage page1.html, where the word could be found in the link anchor.
Link to the site with the parameter
page2.html? parameter = 1
At the beginning, Googlebot was interested in the part of the link after the query mark and anchor inside the anchor3 link.
Googlebot was intrigued and tried to figure out what was meant. To avoid duplicate content indexing under other links, the canonical page page2.html referred to itself. The logs registered 8 views at this address, but the conclusions from here are rather sad: After 2 weeks, Googlebot began to go there much less often. In the end, he left and never returned.
Neither the page page2.html, nor the parameter with the link parameter parameter1 were indexed by the phrase anchor3. According to Search Console, this link does not exist – it does not count among incoming links. But at the same time, the phrase anchor3 is considered an anchor phrase.
Link to the site from a redirect
I wanted to get Googlebot to take a closer look at my site. As a result, every few days Googlebot clicked on the dofollow link with anchor4 on page1.html leading to page3.html. Page page3.html redirected with code 301 to page2.html. Unfortunately, in the case of a page with a parameter, after a month and a half the page page2.html has not yet been ranked by the phrase anchor4, which appeared in the redirect link on page1.html.
However, in the Google Search Console, in the Anchor Texts section, anchor4 is visible and indexed. This may mean that after some time the redirect began to work as expected. Therefore, page2.html will be ranked in anchor4 search results, despite the fact that this is the second link to the same landing page on the site.
Link to a page using the canonical tag
On page1.html, I posted a link to page5.html with a follow link using anchor5. At the same time, page5.html had unique content. It had a canonical tag on page2.html.
< link rel=“canonical” href=”https://example.com/page2.html” />
The page5 page.html was indexed despite the canonical tag.
Page5.html did not rank in search results for anchor5.
The page page5.html is ranked by the phrases used in the text of the page. This means that Googlebot completely ignored the canonical tags.
It seems that you cannot use rel = canonical to avoid indexing some content.
Part 2. Crowling budget
A crawling budget is a certain volume of pages that a Googlebot can crawl at a time.
When I thought over my SEO strategy, I wanted to make Googlebot dance to my tune. I checked SEO processes at the level of server logs, which helped me a lot. Thanks to this, I knew about the smallest bodily movements of the bot, and how it reacted to changes – the restructuring of the site, the complete alteration of internal linking, the display of information.
One of the tasks during the SEO campaign was to rebuild the site so that Googlebot only visited links that it could index and that we would like to see indexed. That is, the Google index should contain only those pages that are important to us in terms of SEO. Googlebot, on the other hand, should only browse the sites we want to index. This is not obvious to everyone, for example, when an online store introduces filtering by color, size and price by manipulating link parameters: example.com/women/shoes/?color=red&size=40&price=200-250
It may turn out that the solution that allows Googlebot to view dynamic links forces him to devote time to carefully checking and indexing, instead of just browsing.
Such dynamic links are not only useless, but also potentially harmful to SEO. This is because they can be mistakenly perceived as useless content. This can lead to the site losing ground.
During the experiment, I also wanted to test content structuring methods without using rel = ”nofollow”. To do this, I blocked Googlebot in the robots.txt file or placed part of the HTML code in frames that are invisible to the bot.
Googlebot easily navigated to page4.html and indexed the whole page. The subpage is not ranked in search results for the phrase anchor6 and cannot be found in the Anchor Texts section of the Google Search Console. Conclusion: the link did not transfer weight.
The link does not transfer weight – it is neutral.
I decided to raise my bids, but to my surprise, Googlebot overcame the obstacle in less than 2 hours after the link was posted.
To manage this link, I used an external function. This function was supposed to read the link from the data and the redirect – only from the user’s redirect to the landing page page9.html, as I had hoped. As in previous cases, page9.html was fully indexed.
Interestingly, despite the lack of inbound links, page9.html was the third most popular page on Googlebot after page1.html and page2.html. I used this method before to structure web services. However, as we can see, it no longer works. In SEO, nothing works forever.
Part 3. Hidden content
In the last test, I decided to check whether the bot indexed hidden content, or Google displayed such a page without hidden text, as some experts say.
I wanted to confirm or refute this hypothesis. To do this, I placed the text on more than 2000 characters on the page12.html page, hid about 20% of the text in CSS, and added a Show More button. Inside the hidden text was a link to page13.html with anchor9.
There is no doubt that the bot can display the page. We can see this in the Google Search Console and Google Insight Speed. However, my experiment showed that the hidden block of text was completely indexed. The phrases hidden in the text ranged in the search results, and Googlebot followed the links hidden in the text. Moreover, link anchors from a hidden block of text were visible in the Google Search Console in the anchor text section. The page13.html page also began to rank in the search results for the anchor9 keyword.
This is very important for online stores, where content is often located in hidden tabs. Now we are sure that Googlebot sees the content in hidden tabs, indexes it and transfers the weight of the links hidden in it.
The most important conclusion I made from this experiment is that there is no direct way to get around the first link rule using modified links — parameter links, 301 redirects, canonical and anchor links.
Googlebot can see and index the content hidden in the tabs, and follow the links in it.
A source: Search Engine Land article