Google Detecting Hidden Content Resulting in Filter Penalties
A patent was issued in 2013 to Google’s very own Matt Cutts, in regards to methods used to discover hidden content, and hidden links. While many of you may be thinking “This doesn’t affect me, I don’t hide text or links!”, the problem is that many templates and themes for WordPress and other CMS systems have built-in functionality for hiding content.
Of course, Google wants what you show to Google to exactly match what the end user’s experience to be when they visit your page. If you are hiding text and content, or hiding links — Google would consider this as manipulating your rankings, by providing stuffed data that isn’t actually available on your page. If someone finds your page, based on content that is supposedly there, yet that content is hidden when they arrive on the page, that can provide a negative user experience. As such, Google would reduce your rankings based on the perceived manipulation.
I’m listing this one first, because it is something that many SEOs violate unwittingly. An extremely common tactic that many templates use is a “tabbing” structure, where there are several tabs displayed. Each tab has a block of content behind it, that is hidden until the tab is clicked, exposing the content. While this can be a helpful method to paginate content into more easily assimilated smaller chunks of text, it is a method that is used by webspammers to stuff a page full of content in the hopes of ranking higher.
Google cannot tell the difference,
whether your intention of hiding the content is to benefit a user or not. Using this sort of tabbed content sends a negative signal to Google. Certainly, the page can rank if it has enough positive signals — but it is best if you avoid all possible negative signals you can. Following are some wordings found in Matt Cutt’s patent:
Make sure Google has access to your CSS and JS files.
Google examines your .css and .js files to make certain text is not being hidden. If you inadvertently block Google from accessing these resources, by placing them in a directory where Google is prohibited from visiting via the robots.txt file, Google would then not be able to render how your page looks. This would be bad from the perspective of Google not knowing whether you’re trying to hide text, but furthermore — it wouldn’t know if your page is compliant for mobile traffic. Making .js and .css files inaccessible is not recommended, and make sure your robots file isn’t blocking the /wp-content/ directory (where many .css and .js files are located), because that is where your WordPress theme is typically found.
Another part of the patent involves links that are virtually hidden by the fact that the anchor text is so small. For instance, if a link is fashioned as such: <a href=”http://example.com/”>.</a>, Google would determine that link is being hidden. The patent also mentions a dash being used, and one could also assume that a 1 pixel image would also trip their filter.
Another method of link hiding involves using images that are the same color as the background. The patent states:
One technique for hiding links involves the use of a very small image (e.g., a 1.times.1 pixel graphic interchange format (GIF)) that is used as a hyperlink. The image can be made to be so small that the image is not visible to users viewing the document, but may still be considered by search engines when ranking documents. In other situations, large images (e.g., 300 pixels wide and 200 pixels high) that are hyperlinks may be used that are the same color or similar color to the background.
To avoid having a problem with this portion of Google’s filter, make all your links obvious. The text in them should be of sufficient length, and if linking with an image, the image color and background color should not be the same.
Image processing compared to document layout.
I found this next part amazing — that Google would go through such extents to try to find hidden text. This is what Google is capable of:
As an example, assume that a document has a blue background, a white background image placed on the blue background, and white text written in a table having a transparent background that that is placed on the background image.
That blockquoted section continues, explaining how Google will know that if you have a blue background, and a section of your layout has a white image set as the background of a div, and that div has white text, Google can detect it. I find it amazing that they process documents to that high of a degree.
Can Google determine whether you’re cloaking? Some sophisticated cloakers use various methods beyond simply looking for “Googlebot” as a user agent, going so far as to record all the IP addresses Google is known to visit through. Following is a portion of the patent pertaining to this.
FIG. 1 is an exemplary diagram of a system 100 in which systems and methods consistent with the principles of the invention may be implemented. System 100 may include multiple clients 110 connected to servers 120 and 130 via a network 140. Network 140 may include a local area network (LAN), a wide area network (WAN), a telephone network, such as the Public Switched Telephone Network (PSTN), an intranet, the Internet, a similar or dissimilar network, or a combination of networks. Two clients 110 and three servers 120/130 have been illustrated as connected to network 140 in FIG. 1 for simplicity. In practice, there may be more or fewer clients 110 and/or servers 120/130. Also, in some instances, a client 110 may perform the functions of a server 120/130 and a server 120/130 may perform the functions of a client 110.
Clients 110 may include devices, such as wireless telephones, personal computers, personal digital assistants (PDAs), lap tops, etc., threads or processes running on these devices, and/or objects executable by these devices. Servers 120/130 may include server devices, threads, and/or objects that operate upon, search, or maintain documents in a manner consistent with the principles of the invention. Clients 110 and servers 120/130 may connect to network 140 via wired, wireless, or optical connections.
Google’s hidden text and cloaking patent involves reading of web pages through a variety of networks, and through a variety of devices, in order to test whether you’re serving the same types of content to all of them.
In summary, everyone should verify that their templates do not hide content via menu or tabs. This is the one that potentially affects everyone. If you are someone who considers various blackhat methods to stuff text into a document, via content hidden through .js/.css or through cloaking, Google has infrastructure in place to detect it. These are negative signals. They may, or may not affect your rankings depending on whether your site has other positive or negative signals going on, which would either protect you from, or exacerbate penalties for various forms of hidden text on your pages.