Tag Archive | "Javascript"

SearchCap: Delisted from Google, Cyber Monday paid search & JavaScript SEO

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in Latest NewsComments Off

The Minimum Viable Knowledge You Need to Work with JavaScript & SEO Today

Posted by sergeystefoglo

If your work involves SEO at some level, you’ve most likely been hearing more and more about JavaScript and the implications it has on crawling and indexing. Frankly, Googlebot struggles with it, and many websites utilize modern-day JavaScript to load in crucial content today. Because of this, we need to be equipped to discuss this topic when it comes up in order to be effective.

The goal of this post is to equip you with the minimum viable knowledge required to do so. This post won’t go into the nitty gritty details, describe the history, or give you extreme detail on specifics. There are a lot of incredible write-ups that already do this — I suggest giving them a read if you are interested in diving deeper (I’ll link out to my favorites at the bottom).

In order to be effective consultants when it comes to the topic of JavaScript and SEO, we need to be able to answer three questions:

  1. Does the domain/page in question rely on client-side JavaScript to load/change on-page content or links?
  2. If yes, is Googlebot seeing the content that’s loaded in via JavaScript properly?
  3. If not, what is the ideal solution?

With some quick searching, I was able to find three examples of landing pages that utilize JavaScript to load in crucial content.

I’m going to be using Sitecore’s Symposium landing page through each of these talking points to illustrate how to answer the questions above.

We’ll cover the “how do I do this” aspect first, and at the end I’ll expand on a few core concepts and link to further resources.

Question 1: Does the domain in question rely on client-side JavaScript to load/change on-page content or links?

The first step to diagnosing any issues involving JavaScript is to check if the domain uses it to load in crucial content that could impact SEO (on-page content or links). Ideally this will happen anytime you get a new client (during the initial technical audit), or whenever your client redesigns/launches new features of the site.

How do we go about doing this?

Ask the client

Ask, and you shall receive! Seriously though, one of the quickest/easiest things you can do as a consultant is contact your POC (or developers on the account) and ask them. After all, these are the people who work on the website day-in and day-out!

“Hi [client], we’re currently doing a technical sweep on the site. One thing we check is if any crucial content (links, on-page content) gets loaded in via JavaScript. We will do some manual testing, but an easy way to confirm this is to ask! Could you (or the team) answer the following, please?

1. Are we using client-side JavaScript to load in important content?

2. If yes, can we get a bulleted list of where/what content is loaded in via JavaScript?”

Check manually

Even on a large e-commerce website with millions of pages, there are usually only a handful of important page templates. In my experience, it should only take an hour max to check manually. I use the Chrome Web Developers plugin, disable JavaScript from there, and manually check the important templates of the site (homepage, category page, product page, blog post, etc.)

In the example above, once we turn off JavaScript and reload the page, we can see that we are looking at a blank page.

As you make progress, jot down notes about content that isn’t being loaded in, is being loaded in wrong, or any internal linking that isn’t working properly.

At the end of this step we should know if the domain in question relies on JavaScript to load/change on-page content or links. If the answer is yes, we should also know where this happens (homepage, category pages, specific modules, etc.)

Crawl

You could also crawl the site (with a tool like Screaming Frog or Sitebulb) with JavaScript rendering turned off, and then run the same crawl with JavaScript turned on, and compare the differences with internal links and on-page elements.

For example, it could be that when you crawl the site with JavaScript rendering turned off, the title tags don’t appear. In my mind this would trigger an action to crawl the site with JavaScript rendering turned on to see if the title tags do appear (as well as checking manually).

Example

For our example, I went ahead and did a manual check. As we can see from the screenshot below, when we disable JavaScript, the content does not load.

In other words, the answer to our first question for this pages is “yes, JavaScript is being used to load in crucial parts of the site.”

Question 2: If yes, is Googlebot seeing the content that’s loaded in via JavaScript properly?

If your client is relying on JavaScript on certain parts of their website (in our example they are), it is our job to try and replicate how Google is actually seeing the page(s). We want to answer the question, “Is Google seeing the page/site the way we want it to?”

In order to get a more accurate depiction of what Googlebot is seeing, we need to attempt to mimic how it crawls the page.

How do we do that?

Use Google’s new mobile-friendly testing tool

At the moment, the quickest and most accurate way to try and replicate what Googlebot is seeing on a site is by using Google’s new mobile friendliness tool. My colleague Dom recently wrote an in-depth post comparing Search Console Fetch and Render, Googlebot, and the mobile friendliness tool. His findings were that most of the time, Googlebot and the mobile friendliness tool resulted in the same output.

In Google’s mobile friendliness tool, simply input your URL, hit “run test,” and then once the test is complete, click on “source code” on the right side of the window. You can take that code and search for any on-page content (title tags, canonicals, etc.) or links. If they appear here, Google is most likely seeing the content.

Search for visible content in Google

It’s always good to sense-check. Another quick way to check if GoogleBot has indexed content on your page is by simply selecting visible text on your page, and doing a site:search for it in Google with quotations around said text.

In our example there is visible text on the page that reads…

“Whether you are in marketing, business development, or IT, you feel a sense of urgency. Or maybe opportunity?”

When we do a site:search for this exact phrase, for this exact page, we get nothing. This means Google hasn’t indexed the content.

Crawling with a tool

Most crawling tools have the functionality to crawl JavaScript now. For example, in Screaming Frog you can head to configuration > spider > rendering > then select “JavaScript” from the dropdown and hit save. DeepCrawl and SiteBulb both have this feature as well.

From here you can input your domain/URL and see the rendered page/code once your tool of choice has completed the crawl.

Example:

When attempting to answer this question, my preference is to start by inputting the domain into Google’s mobile friendliness tool, copy the source code, and searching for important on-page elements (think title tag, <h1>, body copy, etc.) It’s also helpful to use a tool like diff checker to compare the rendered HTML with the original HTML (Screaming Frog also has a function where you can do this side by side).

For our example, here is what the output of the mobile friendliness tool shows us.

After a few searches, it becomes clear that important on-page elements are missing here.

We also did the second test and confirmed that Google hasn’t indexed the body content found on this page.

The implication at this point is that Googlebot is not seeing our content the way we want it to, which is a problem.

Let’s jump ahead and see what we can recommend the client.

Question 3: If we’re confident Googlebot isn’t seeing our content properly, what should we recommend?

Now we know that the domain is using JavaScript to load in crucial content and we know that Googlebot is most likely not seeing that content, the final step is to recommend an ideal solution to the client. Key word: recommend, not implement. It’s 100% our job to flag the issue to our client, explain why it’s important (as well as the possible implications), and highlight an ideal solution. It is 100% not our job to try to do the developer’s job of figuring out an ideal solution with their unique stack/resources/etc.

How do we do that?

You want server-side rendering

The main reason why Google is having trouble seeing Sitecore’s landing page right now, is because Sitecore’s landing page is asking the user (us, Googlebot) to do the heavy work of loading the JavaScript on their page. In other words, they’re using client-side JavaScript.

Googlebot is literally landing on the page, trying to execute JavaScript as best as possible, and then needing to leave before it has a chance to see any content.

The fix here is to instead have Sitecore’s landing page load on their server. In other words, we want to take the heavy lifting off of Googlebot, and put it on Sitecore’s servers. This will ensure that when Googlebot comes to the page, it doesn’t have to do any heavy lifting and instead can crawl the rendered HTML.

In this scenario, Googlebot lands on the page and already sees the HTML (and all the content).

There are more specific options (like isomorphic setups)

This is where it gets to be a bit in the weeds, but there are hybrid solutions. The best one at the moment is called isomorphic.

In this model, we’re asking the client to load the first request on their server, and then any future requests are made client-side.

So Googlebot comes to the page, the client’s server has already executed the initial JavaScript needed for the page, sends the rendered HTML down to the browser, and anything after that is done on the client-side.

If you’re looking to recommend this as a solution, please read this post from the AirBNB team which covers isomorphic setups in detail.

AJAX crawling = no go

I won’t go into details on this, but just know that Google’s previous AJAX crawling solution for JavaScript has since been discontinued and will eventually not work. We shouldn’t be recommending this method.

(However, I am interested to hear any case studies from anyone who has implemented this solution recently. How has Google responded? Also, here’s a great write-up on this from my colleague Rob.)

Summary

At the risk of severely oversimplifying, here’s what you need to do in order to start working with JavaScript and SEO in 2018:

  1. Know when/where your client’s domain uses client-side JavaScript to load in on-page content or links.
    1. Ask the developers.
    2. Turn off JavaScript and do some manual testing by page template.
    3. Crawl using a JavaScript crawler.
  2. Check to see if GoogleBot is seeing content the way we intend it to.
    1. Google’s mobile friendliness checker.
    2. Doing a site:search for visible content on the page.
    3. Crawl using a JavaScript crawler.
  3. Give an ideal recommendation to client.
    1. Server-side rendering.
    2. Hybrid solutions (isomorphic).
    3. Not AJAX crawling.

Further resources

I’m really interested to hear about any of your experiences with JavaScript and SEO. What are some examples of things that have worked well for you? What about things that haven’t worked so well? If you’ve implemented an isomorphic setup, I’m curious to hear how that’s impacted how Googlebot sees your site.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience

Posted by alexis-sanders

Understanding JavaScript and its potential impact on search performance is a core skillset of the modern SEO professional. If search engines can’t crawl a site or can’t parse and understand the content, nothing is going to get indexed and the site is not going to rank.

The most important questions for an SEO relating to JavaScript: Can search engines see the content and grasp the website experience? If not, what solutions can be leveraged to fix this?


Fundamentals

What is JavaScript?

When creating a modern web page, there are three major components:

  1. HTML – Hypertext Markup Language serves as the backbone, or organizer of content, on a site. It is the structure of the website (e.g. headings, paragraphs, list elements, etc.) and defining static content.
  2. CSS – Cascading Style Sheets are the design, glitz, glam, and style added to a website. It makes up the presentation layer of the page.
  3. JavaScript – JavaScript is the interactivity and a core component of the dynamic web.

Learn more about webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

How Does Google Handle CSS + Javascript "Hidden" Text? – Whiteboard Friday

Posted by randfish

Does Google treat text kept behind “read more” links with the same importance as non-hidden text? The short answer is “no,” but there’s more nuance to it than that. In today’s Whiteboard Friday, Rand explains just how the search engine giant weighs text hidden from view using CSS and JavaScript.

How Google handles CSS and Javascript

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we’re going to chat a little bit about hidden text, hidden text of several kinds. I really don’t mean the spammy, black on a black background, white on a white background-like, hidden text type of keyword stuffing from the ’90s and early 2000s. I’m talking about what we do with CSS and JavaScript with overlays and with folders inside a page, that kind of hidden text.

It’s become very popular in modern web design to basically use CSS or to use JavaScript to load text after a user has taken some action on a page. So perhaps they’ve clicked on a separate section of your e-commerce page about your product to see other information, or maybe they’ve clicked a “read more” link in an article to read the rest of the article. This actually creates problems with Google and with SEO, and they’re not obvious problems, because when you use something like Google’s fetch and render tool or when you look at Google’s cache, Google appears to be able to crawl and parse all of that text. But they’re not treating all of it equally.

So here’s an example. I’ve got this text about coconut marble furnishings, which is just a ridiculous test phrase that I’m going to use for this purpose. But let’s say I’ve got page A, which essentially shows the first paragraph of this text, and then I have page B, which only shows part of the first sentence and then a “read more” link, which is very common in lots of articles.

Many folks do this, by the way, because they want to get engagement data about how many people actually read the rest of the piece. Others are using it for serving advertising, or they’re using it to track something, and some people are using it just because of the user experience it provides. Maybe the page is crowded with other types of content. They want to make sure that if someone wants to display this particular piece or that particular piece, that it’s available to them in as convenient a format as possible for design purposes or what have you.

What’s true in these instances is that Google is not going to treat what happens after this “read more” link is clicked, which is that the rest of this text would become visible here, they’re not going to treat that with the same weight that they otherwise would.

All other things being equal

So they’re on similar domains, they have similar link profiles, all that other kind of stuff.

  • A is going to outrank B for “coconut marble furnishings” even though this is in the title here. Because this text is relevant to that keyword and is serving to create greater relevance, Google is going to weight this one higher.
  • It’s also true that the content that’s hidden behind this “read more” here, it doesn’t matter. If it’s CSS-based, JavaScript-based, post load or loaded when the HTML is, it doesn’t matter, it’s going to be weighted less by Google. It will be treated as though that text were not as important.
  • Interestingly, fascinatingly, perhaps, Bing and Yahoo do not appear to discern between these. So they’ll treat these more equally. Google is the only one who seems to be, at least right now, from some test data — I’ll talk about that in a sec — who is treating these differently, who is basically weighting this hidden content less.

Best practices for SEO and “hidden” text

So what can we discern from this? What should SEOs do when we’re working with our web design teams and with our content teams around these types of issues?

I. We have to expect that any time we hide text with CSS, with JavaScript, what have you, that it will have less ranking influence. It’s not that it won’t be counted at all. If I were to search for “hardwood-like material creates beautiful shine,” like that exact phrase in Google with quotes, both of these pages would come up, this one almost certainly first, but both of these pages would come up.

So Google knows the text is there. It just isn’t counting it as highly. It’s like content that isn’t carrying the same weight as it would if it were visible by default. So, given that we know that, we have to decide in the tradeoff situation whether it’s worth it to lose the ranking value and the potential visitors in exchange for whatever we’re gaining by having this element.

II. We’ve got to consider some creative alternatives. It is possible to make text visible by default and to instead have something like an overlay element. We could have a brief overlay here that’s easily close-able with a message. Maybe that could give us the same types of engagement statistics, because 95% of people are going to close that before they scroll down, or they’re going to receive a popover message or those kinds of things. Granted, as we’ve discussed previously on Whiteboard Friday, overlays have their own issues that we need to be aware of, but these are possible. We can also measure scroll depth by doing some JavaScript tracking. There’s lots of software that does that by default and plenty of GitHub repositories, that are open source, that we could use to track that. So there might be other ways to get the same goals.

III. If it is the case that you have to use the “read more” or any other text hiding elements, I would urge you to go ahead and place the crucial information, including the keyword phrases and the most related terms and phrases that you know are going to be very important to rankings, up in that most visible top portion of the page so that you maximize the ranking weight of the most important pieces rather than losing those below or behind whatever sorts of post-loading situation you’ve got. Make those the default visible portions of text.

I do want to give special thanks. One of the reasons that we know this, certainly Google has mentioned it on occasion, but over the course of the last few years there’s been a lot of skepticism, especially from folks in the web design community who have sort of said, “Look, it seems like Google can see this. It doesn’t seem to be a problem. When I search in quotes for this text, Google is bringing it back.” That has been correct.

But thanks to Shai — I’m sorry if I mispronounce your name — Aharony from Reboot Online, RebootOnline.com, and I’ll link to the specific test that they performed, but they performed some wonderful, large-scale, long-term tests of CSS, of text area, of visible text, and of JavaScript hiding across many domains over a long period and basically proved to us that what Google says is in fact true, that they are treating this text behind here with less weight. So we really appreciate the efforts of folks like that, who go through intense effort to give us the truth about how Google works.

That said, we will hopefully see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off


Advert