Tag Archive | "Technical"

Rewriting the Beginner’s Guide to SEO, Chapter 5: Technical Optimization

Posted by BritneyMuller

After a short break, we’re back to share our working draft of Chapter 5 of the Beginner’s Guide to SEO with you! This one was a whopper, and we’re really looking forward to your input. Giving beginner SEOs a solid grasp of just what technical optimization for SEO is and why it matters — without overwhelming them or scaring them off the subject — is a tall order indeed. We’d love to hear what you think: did we miss anything you think is important for beginners to know? Leave us your feedback in the comments!

And in case you’re curious, check back on our outline, Chapter One, Chapter Two, Chapter Three, and Chapter Four to see what we’ve covered so far.


Chapter 5: Technical Optimization

Basic technical knowledge will help you optimize your site for search engines and establish credibility with developers.

Now that you’ve crafted valuable content on the foundation of solid keyword research, it’s important to make sure it’s not only readable by humans, but by search engines too!

You don’t need to have a deep technical understanding of these concepts, but it is important to grasp what these technical assets do so that you can speak intelligently about them with developers. Speaking your developers’ language is important because you will likely need them to carry out some of your optimizations. They’re unlikely to prioritize your asks if they can’t understand your request or see its importance. When you establish credibility and trust with your devs, you can begin to tear away the red tape that often blocks crucial work from getting done.

Pro tip: SEOs need cross-team support to be effective

It’s vital to have a healthy relationship with your developers so that you can successfully tackle SEO challenges from both sides. Don’t wait until a technical issue causes negative SEO ramifications to involve a developer. Instead, join forces for the planning stage with the goal of avoiding the issues altogether. If you don’t, it can cost you in time and money later.

Beyond cross-team support, understanding technical optimization for SEO is essential if you want to ensure that your web pages are structured for both humans and crawlers. To that end, we’ve divided this chapter into three sections:

  1. How websites work
  2. How search engines understand websites
  3. How users interact with websites

Since the technical structure of a site can have a massive impact on its performance, it’s crucial for everyone to understand these principles. It might also be a good idea to share this part of the guide with your programmers, content writers, and designers so that all parties involved in a site’s construction are on the same page.

1. How websites work

If search engine optimization is the process of optimizing a website for search, SEOs need at least a basic understanding of the thing they’re optimizing!

Below, we outline the website’s journey from domain name purchase all the way to its fully rendered state in a browser. An important component of the website’s journey is the critical rendering path, which is the process of a browser turning a website’s code into a viewable page.

Knowing this about websites is important for SEOs to understand for a few reasons:

  • The steps in this webpage assembly process can affect page load times, and speed is not only important for keeping users on your site, but it’s also one of Google’s ranking factors.
  • Google renders certain resources, like JavaScript, on a “second pass.” Google will look at the page without JavaScript first, then a few days to a few weeks later, it will render JavaScript, meaning SEO-critical elements that are added to the page using JavaScript might not get indexed.

Imagine that the website loading process is your commute to work. You get ready at home, gather your things to bring to the office, and then take the fastest route from your home to your work. It would be silly to put on just one of your shoes, take a longer route to work, drop your things off at the office, then immediately return home to get your other shoe, right? That’s sort of what inefficient websites do. This chapter will teach you how to diagnose where your website might be inefficient, what you can do to streamline, and the positive ramifications on your rankings and user experience that can result from that streamlining.

Before a website can be accessed, it needs to be set up!

  1. Domain name is purchased. Domain names like moz.com are purchased from a domain name registrar such as GoDaddy or HostGator. These registrars are just organizations that manage the reservations of domain names.
  2. Domain name is linked to IP address. The Internet doesn’t understand names like “moz.com” as website addresses without the help of domain name servers (DNS). The Internet uses a series of numbers called an Internet protocol (IP) address (ex: 127.0.0.1), but we want to use names like moz.com because they’re easier for humans to remember. We need to use a DNS to link those human-readable names with machine-readable numbers.

How a website gets from server to browser

  1. User requests domain. Now that the name is linked to an IP address via DNS, people can request a website by typing the domain name directly into their browser or by clicking on a link to the website.
  2. Browser makes requests. That request for a web page prompts the browser to make a DNS lookup request to convert the domain name to its IP address. The browser then makes a request to the server for the code your web page is constructed with, such as HTML, CSS, and JavaScript.
  3. Server sends resources. Once the server receives the request for the website, it sends the website files to be assembled in the searcher’s browser.
  4. Browser assembles the web page. The browser has now received the resources from the server, but it still needs to put it all together and render the web page so that the user can see it in their browser. As the browser parses and organizes all the web page’s resources, it’s creating a Document Object Model (DOM). The DOM is what you can see when you right click + “inspect element” on a web page in your Chrome browser (learn how to inspect elements in other browsers).
  5. Browser makes final requests. The browser will only show a web page after all the page’s necessary code is downloaded, parsed, and executed, so at this point, if the browser needs any additional code in order to show your website, it will make an additional request from your server.
  6. Website appears in browser. Whew! After all that, your website has now been transformed (rendered) from code to what you see in your browser.

Pro tip: Talk to your developers about async!

Something you can bring up with your developers is shortening the critical rendering path by setting scripts to “async” when they’re not needed to render content above the fold, which can make your web pages load faster. Async tells the DOM that it can continue to be assembled while the browser is fetching the scripts needed to display your web page. If the DOM has to pause assembly every time the browser fetches a script (called “render-blocking scripts”), it can substantially slow down your page load.

It would be like going out to eat with your friends and having to pause the conversation every time one of you went up to the counter to order, only resuming once they got back. With async, you and your friends can continue to chat even when one of you is ordering. You might also want to bring up other optimizations that devs can implement to shorten the critical rendering path, such as removing unnecessary scripts entirely, like old tracking scripts.

Now that you know how a website appears in a browser, we’re going to focus on what a website is made of — in other words, the code (programming languages) used to construct those web pages.

The three most common are:

  • HTML – What a website says (titles, body content, etc.)
  • CSS – How a website looks (color, fonts, etc.)
  • JavaScript – How it behaves (interactive, dynamic, etc.)

HTML: What a website says

HTML stands for hypertext markup language, and it serves as the backbone of a website. Elements like headings, paragraphs, lists, and content are all defined in the HTML.

Here’s an example of a webpage, and what its corresponding HTML looks like:

HTML is important for SEOs to know because it’s what lives “under the hood” of any page they create or work on. While your CMS likely doesn’t require you to write your pages in HTML (ex: selecting “hyperlink” will allow you to create a link without you having to type in “a href=”), it is what you’re modifying every time you do something to a web page such as adding content, changing the anchor text of internal links, and so on. Google crawls these HTML elements to determine how relevant your document is to a particular query. In other words, what’s in your HTML plays a huge role in how your web page ranks in Google organic search!

CSS: How a website looks

CSS stands for cascading style sheets, and this is what causes your web pages to take on certain fonts, colors, and layouts. HTML was created to describe content, rather than to style it, so when CSS entered the scene, it was a game-changer. With CSS, web pages could be “beautified” without requiring manual coding of styles into the HTML of every page — a cumbersome process, especially for large sites.

It wasn’t until 2014 that Google’s indexing system began to render web pages more like an actual browser, as opposed to a text-only browser. A black-hat SEO practice that tried to capitalize on Google’s older indexing system was hiding text and links via CSS for the purpose of manipulating search engine rankings. This “hidden text and links” practice is a violation of Google’s quality guidelines.

Components of CSS that SEOs, in particular, should care about:

  • Since style directives can live in external stylesheet files (CSS files) instead of your page’s HTML, it makes your page less code-heavy, reducing file transfer size and making load times faster.
  • Browsers still have to download resources like your CSS file, so compressing them can make your web pages load faster, and page speed is a ranking factor.
  • Having your pages be more content-heavy than code-heavy can lead to better indexing of your site’s content.
  • Using CSS to hide links and content can get your website manually penalized and removed from Google’s index.

JavaScript: How a website behaves

In the earlier days of the Internet, web pages were built with HTML. When CSS came along, webpage content had the ability to take on some style. When the programming language JavaScript entered the scene, websites could now not only have structure and style, but they could be dynamic.

JavaScript has opened up a lot of opportunities for non-static web page creation. When someone attempts to access a page that is enhanced with this programming language, that user’s browser will execute the JavaScript against the static HTML that the server returned, resulting in a web page that comes to life with some sort of interactivity.

You’ve definitely seen JavaScript in action — you just may not have known it! That’s because JavaScript can do almost anything to a page. It could create a pop up, for example, or it could request third-party resources like ads to display on your page.

JavaScript can pose some problems for SEO, though, since search engines don’t view JavaScript the same way human visitors do. That’s because of client-side versus server-side rendering. Most JavaScript is executed in a client’s browser. With server-side rendering, on the other hand, the files are executed at the server and the server sends them to the browser in their fully rendered state.

SEO-critical page elements such as text, links, and tags that are loaded on the client’s side with JavaScript, rather than represented in your HTML, are invisible from your page’s code until they are rendered. This means that search engine crawlers won’t see what’s in your JavaScript — at least not initially.

Google says that, as long as you’re not blocking Googlebot from crawling your JavaScript files, they’re generally able to render and understand your web pages just like a browser can, which means that Googlebot should see the same things as a user viewing a site in their browser. However, due to this “second wave of indexing” for client-side JavaScript, Google can miss certain elements that are only available once JavaScript is executed.

There are also some other things that could go wrong during Googlebot’s process of rendering your web pages, which can prevent Google from understanding what’s contained in your JavaScript:

  • You’ve blocked Googlebot from JavaScript resources (ex: with robots.txt, like we learned about in Chapter 2)
  • Your server can’t handle all the requests to crawl your content
  • The JavaScript is too complex or outdated for Googlebot to understand
  • JavaScript doesn’t “lazy load” content into the page until after the crawler has finished with the page and moved on.

Needless to say, while JavaScript does open a lot of possibilities for web page creation, it can also have some serious ramifications for your SEO if you’re not careful. Thankfully, there is a way to check whether Google sees the same thing as your visitors. To see a page how Googlebot views your page, use Google Search Console’s “Fetch and Render” tool. From your site’s Google Search Console dashboard, select “Crawl” from the left navigation, then “Fetch as Google.”

From this page, enter the URL you want to check (or leave blank if you want to check your homepage) and click the “Fetch and Render” button. You also have the option to test either the desktop or mobile version.

In return, you’ll get a side-by-side view of how Googlebot saw your page versus how a visitor to your website would have seen the page. Below, Google will also show you a list of any resources they may not have been able to get for the URL you entered.

Understanding the way websites work lays a great foundation for what we’ll talk about next, which is technical optimizations to help Google understand the pages on your website better.

2. How search engines understand websites

Search engines have gotten incredibly sophisticated, but they can’t (yet) find and interpret web pages quite like a human can. The following sections outline ways you can better deliver content to search engines.

Help search engines understand your content by structuring it with Schema

Imagine being a search engine crawler scanning down a 10,000-word article about how to bake a cake. How do you identify the author, recipe, ingredients, or steps required to bake a cake? This is where schema (Schema.org) markup comes in. It allows you to spoon-feed search engines more specific classifications for what type of information is on your page.

Schema is a way to label or organize your content so that search engines have a better understanding of what certain elements on your web pages are. This code provides structure to your data, which is why schema is often referred to as “structured data.” The process of structuring your data is often referred to as “markup” because you are marking up your content with organizational code.

JSON-LD is Google’s preferred schema markup (announced in May ‘16), which Bing also supports. To view a full list of the thousands of available schema markups, visit Schema.org or view the Google Developers Introduction to Structured Data for additional information on how to implement structured data. After you implement the structured data that best suits your web pages, you can test your markup with Google’s Structured Data Testing Tool.

In addition to helping bots like Google understand what a particular piece of content is about, schema markup can also enable special features to accompany your pages in the SERPs. These special features are referred to as “rich snippets,” and you’ve probably seen them in action. They’re things like:

  • Top Stories carousel
  • Review stars
  • Sitelinks search boxes
  • Recipes

Remember, using structured data can help enable a rich snippet to be present, but does not guarantee it. Other types of rich snippets will likely be added in the future as the use of schema markup increases.

Some last words of advice for schema success:

  • You can use multiple types of schema markup on a page. However, if you mark up one element, like a product for example, and there are other products listed on the page, you must also mark up those products.
  • Don’t mark up content that is not visible to visitors and follow Google’s Quality Guidelines. For example, if you add review structured markup to a page, make sure those reviews are actually visible on that page.
  • If you have duplicate pages, Google asks that you mark up each duplicate page with your structured markup, not just the canonical version.
  • Provide original and updated (if applicable) content on your structured data pages.
  • Structured markup should be an accurate reflection of your page.
  • Try to use the most specific type of schema markup for your content.
  • Marked-up reviews should not be written by the business. They should be genuine unpaid business reviews from actual customers.

Tell search engines about your preferred pages with canonicalization

When Google crawls the same content on different web pages, it sometimes doesn’t know which page to index in search results. This is why the tag was invented: to help search engines better index the preferred version of content and not all its duplicates.

The rel=”canonical” tag allows you to tell search engines where the original, master version of a piece of content is located. You’re essentially saying, “Hey search engine! Don’t index this; index this source page instead.” So, if you want to republish a piece of content, whether exactly or slightly modified, but don’t want to risk creating duplicate content, the canonical tag is here to save the day.

Proper canonicalization ensures that every unique piece of content on your website has only one URL. To prevent search engines from indexing multiple versions of a single page, Google recommends having a self-referencing canonical tag on every page on your site. Without a canonical tag telling Google which version of your web page is the preferred one, http://www.example.com could get indexed separately from http://example.com, creating duplicates.

“Avoid duplicate content” is an Internet truism, and for good reason! Google wants to reward sites with unique, valuable content — not content that’s taken from other sources and repeated across multiple pages. Because engines want to provide the best searcher experience, they will rarely show multiple versions of the same content, opting instead to show only the canonicalized version, or if a canonical tag does not exist, whichever version they deem most likely to be the original.

Pro tip: Distinguishing between content filtering & content penalties
There is no such thing as a duplicate content penalty. However, you should try to keep duplicate content from causing indexing issues by using the rel=”canonical” tag when possible. When duplicates of a page exist, Google will choose a canonical and filter the others out of search results. That doesn’t mean you’ve been penalized. It just means that Google only wants to show one version of your content.

It’s also very common for websites to have multiple duplicate pages due to sort and filter options. For example, on an e-commerce site, you might have what’s called a faceted navigation that allows visitors to narrow down products to find exactly what they’re looking for, such as a “sort by” feature that reorders results on the product category page from lowest to highest price. This could create a URL that looks something like this: example.com/mens-shirts?sort=price_ascending. Add in more sort/filter options like color, size, material, brand, etc. and just think about all the variations of your main product category page this would create!

To learn more about different types of duplicate content, this post by Dr. Pete helps distill the different nuances.

3. How users interact with websites

In Chapter 1, we said that despite SEO standing for search engine optimization, SEO is as much about people as it is about search engines themselves. That’s because search engines exist to serve searchers. This goal helps explain why Google’s algorithm rewards websites that provide the best possible experiences for searchers, and why some websites, despite having qualities like robust backlink profiles, might not perform well in search.

When we understand what makes their web browsing experience optimal, we can create those experiences for maximum search performance.

Ensuring a positive experience for your mobile visitors

Being that well over half of all web traffic today comes from mobile, it’s safe to say that your website should be accessible and easy to navigate for mobile visitors. In April 2015, Google rolled out an update to its algorithm that would promote mobile-friendly pages over non-mobile-friendly pages. So how can you ensure that your website is mobile friendly? Although there are three main ways to configure your website for mobile, Google recommends responsive web design.

Responsive design

Responsive websites are designed to fit the screen of whatever type of device your visitors are using. You can use CSS to make the web page “respond” to the device size. This is ideal because it prevents visitors from having to double-tap or pinch-and-zoom in order to view the content on your pages. Not sure if your web pages are mobile friendly? You can use Google’s mobile-friendly test to check!

AMP

AMP stands for Accelerated Mobile Pages, and it is used to deliver content to mobile visitors at speeds much greater than with non-AMP delivery. AMP is able to deliver content so fast because it delivers content from its cache servers (not the original site) and uses a special AMP version of HTML and JavaScript. Learn more about AMP.

Mobile-first indexing

As of 2018, Google started switching websites over to mobile-first indexing. That change sparked some confusion between mobile-friendliness and mobile-first, so it’s helpful to disambiguate. With mobile-first indexing, Google crawls and indexes the mobile version of your web pages. Making your website compatible to mobile screens is good for users and your performance in search, but mobile-first indexing happens independently of mobile-friendliness.

This has raised some concerns for websites that lack parity between mobile and desktop versions, such as showing different content, navigation, links, etc. on their mobile view. A mobile site with different links, for example, will alter the way in which Googlebot (mobile) crawls your site and sends link equity to your other pages.

Breaking up long content for easier digestion

When sites have very long pages, they have the option of breaking them up into multiple parts of a whole. This is called pagination and it’s similar to pages in a book. In order to avoid giving the visitor too much all at once, you can break up your single page into multiple parts. This can be great for visitors, especially on e-commerce sites where there are a lot of product results in a category, but there are some steps you should take to help Google understand the relationship between your paginated pages. It’s called rel=”next” and rel=”prev.”

You can read more about pagination in Google’s official documentation, but the main takeaways are that:

  • The first page in a sequence should only have rel=”next” markup
  • The last page in a sequence should only have rel=”prev” markup
  • Pages that have both a preceding and following page should have both rel=”next” and rel=”prev”
  • Since each page in the sequence is unique, don’t canonicalize them to the first page in the sequence. Only use a canonical tag to point to a “view all” version of your content, if you have one.
  • When Google sees a paginated sequence, it will typically consolidate the pages’ linking properties and send searchers to the first page

Pro tip: rel=”next/prev” should still have anchor text and live within an <a> link
This helps Google ensure that they pick up the rel=”next/prev”.

Improving page speed to mitigate visitor frustration

Google wants to serve content that loads lightning-fast for searchers. We’ve come to expect fast-loading results, and when we don’t get them, we’ll quickly bounce back to the SERP in search of a better, faster page. This is why page speed is a crucial aspect of on-site SEO. We can improve the speed of our web pages by taking advantage of tools like the ones we’ve mentioned below. Click on the links to learn more about each.

Images are one of the main culprits of slow pages!

As discussed in Chapter 4, images are one of the number-one reasons for slow-loading web pages! In addition to image compression, optimizing image alt text, choosing the right image format, and submitting image sitemaps, there are other technical ways to optimize the speed and way in which images are shown to your users. Some primary ways to improve image delivery are as follows:

SRCSET: How to deliver the best image size for each device

The SRCSET attribute allows you to have multiple versions of your image and then specify which version should be used in different situations. This piece of code is added to the <img> tag (where your image is located in the HTML) to provide unique images for specific-sized devices.

This is like the concept of responsive design that we discussed earlier, except for images!

This doesn’t just speed up your image load time, it’s also a unique way to enhance your on-page user experience by providing different and optimal images to different device types.

Pro tip: There are more than just three image size versions!
It’s a common misconception that you just need a desktop, tablet, and mobile-sized version of your image. There are a huge variety of screen sizes and resolutions. Learn more about SRCSET.

Show visitors image loading is in progress with lazy loading

Lazy loading occurs when you go to a webpage and, instead of seeing a blank white space for where an image will be, a blurry lightweight version of the image or a colored box in its place appears while the surrounding text loads. After a few seconds, the image clearly loads in full resolution. The popular blogging platform Medium does this really well.

The low resolution version is initially loaded, and then the full high resolution version. This also helps to optimize your critical rendering path! So while all of your other page resources are being downloaded, you’re showing a low-resolution teaser image that helps tell users that things are happening/being loaded. For more information on how you should lazy load your images, check out Google’s Lazy Loading Guidance.

Improve speed by condensing and bundling your files

Page speed audits will often make recommendations such as “minify resource,” but what does that actually mean? Minification condenses a code file by removing things like line breaks and spaces, as well as abbreviating code variable names wherever possible.

“Bundling” is another common term you’ll hear in reference to improving page speed. The process of bundling combines a bunch of the same coding language files into one single file. For example, a bunch of JavaScript files could be put into one larger file to reduce the amount of JavaScript files for a browser.

By both minifying and bundling the files needed to construct your web page, you’ll speed up your website and reduce the number of your HTTP (file) requests.

Improving the experience for international audiences

Websites that target audiences from multiple countries should familiarize themselves with international SEO best practices in order to serve up the most relevant experiences. Without these optimizations, international visitors might have difficulty finding the version of your site that caters to them.

There are two main ways a website can be internationalized:

  • Language
    Sites that target speakers of multiple languages are considered multilingual websites. These sites should add something called an hreflang tag to show Google that your page has copy for another language. Learn more about hreflang.
  • Country
    Sites that target audiences in multiple countries are called multi-regional websites and they should choose a URL structure that makes it easy to target their domain or pages to specific countries. This can include the use of a country code top level domain (ccTLD) such as “.ca” for Canada, or a generic top-level domain (gTLD) with a country-specific subfolder such as “example.com/ca” for Canada. Learn more about locale-specific URLs.

You’ve researched, you’ve written, and you’ve optimized your website for search engines and user experience. The next piece of the SEO puzzle is a big one: establishing authority so that your pages will rank highly in search results.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

An 8-Point Checklist for Debugging Strange Technical SEO Problems

Posted by Dom-Woodman

Occasionally, a problem will land on your desk that’s a little out of the ordinary. Something where you don’t have an easy answer. You go to your brain and your brain returns nothing.

These problems can’t be solved with a little bit of keyword research and basic technical configuration. These are the types of technical SEO problems where the rabbit hole goes deep.

The very nature of these situations defies a checklist, but it’s useful to have one for the same reason we have them on planes: even the best of us can and will forget things, and a checklist will provvide you with places to dig.


Fancy some examples of strange SEO problems? Here are four examples to mull over while you read. We’ll answer them at the end.

1. Why wasn’t Google showing 5-star markup on product pages?

  • The pages had server-rendered product markup and they also had Feefo product markup, including ratings being attached client-side.
  • The Feefo ratings snippet was successfully rendered in Fetch & Render, plus the mobile-friendly tool.
  • When you put the rendered DOM into the structured data testing tool, both pieces of structured data appeared without errors.

2. Why wouldn’t Bing display 5-star markup on review pages, when Google would?

  • The review pages of client & competitors all had rating rich snippets on Google.
  • All the competitors had rating rich snippets on Bing; however, the client did not.
  • The review pages had correctly validating ratings schema on Google’s structured data testing tool, but did not on Bing.

3. Why were pages getting indexed with a no-index tag?

  • Pages with a server-side-rendered no-index tag in the head were being indexed by Google across a large template for a client.

4. Why did any page on a website return a 302 about 20–50% of the time, but only for crawlers?

  • A website was randomly throwing 302 errors.
  • This never happened in the browser and only in crawlers.
  • User agent made no difference; location or cookies also made no difference.

Finally, a quick note. It’s entirely possible that some of this checklist won’t apply to every scenario. That’s totally fine. It’s meant to be a process for everything you could check, not everything you should check.

The full checklist

You can download the checklist template here (just make a copy of the Google Sheet):

Get the checklist spreadsheet

The pre-checklist check

Does it actually matter?

Does this problem only affect a tiny amount of traffic? Is it only on a handful of pages and you already have a big list of other actions that will help the website? You probably need to just drop it.

I know, I hate it too. I also want to be right and dig these things out. But in six months’ time, when you’ve solved twenty complex SEO rabbit holes and your website has stayed flat because you didn’t re-write the title tags, you’re still going to get fired.

But hopefully that’s not the case, in which case, onwards!

Where are you seeing the problem?

We don’t want to waste a lot of time. Have you heard this wonderful saying?: “If you hear hooves, it’s probably not a zebra.”

The process we’re about to go through is fairly involved and it’s entirely up to your discretion if you want to go ahead. Just make sure you’re not overlooking something obvious that would solve your problem. Here are some common problems I’ve come across that were mostly horses.

  1. You’re underperforming from where you should be.
    1. When a site is under-performing, people love looking for excuses. Weird Google nonsense can be quite a handy thing to blame. In reality, it’s typically some combination of a poor site, higher competition, and a failing brand. Horse.
  2. You’ve suffered a sudden traffic drop.
    1. Something has certainly happened, but this is probably not the checklist for you. There are plenty of common-sense checklists for this. I’ve written about diagnosing traffic drops recently — check that out first.
  3. The wrong page is ranking for the wrong query.
    1. In my experience (which should probably preface this entire post), this is usually a basic problem where a site has poor targeting or a lot of cannibalization. Probably a horse.

Factors which make it more likely that you’ve got a more complex problem which require you to don your debugging shoes:

  • A website that has a lot of client-side JavaScript.
  • Bigger, older websites with more legacy.
  • Your problem is related to a new Google property or feature where there is less community knowledge.

1. Start by picking some example pages.

Pick a couple of example pages to work with — ones that exhibit whatever problem you’re seeing. No, this won’t be representative, but we’ll come back to that in a bit.

Of course, if it only affects a tiny number of pages then it might actually be representative, in which case we’re good. It definitely matters, right? You didn’t just skip the step above? OK, cool, let’s move on.

2. Can Google crawl the page once?

First we’re checking whether Googlebot has access to the page, which we’ll define as a 200 status code.

We’ll check in four different ways to expose any common issues:

  1. Robots.txt: Open up Search Console and check in the robots.txt validator.
  2. User agent: Open Dev Tools and verify that you can open the URL with both Googlebot and Googlebot Mobile.
    1. To get the user agent switcher, open Dev Tools.
    2. Check the console drawer is open (the toggle is the Escape key)
    3. Hit the … and open “Network conditions”
    4. Here, select your user agent!

  1. IP Address: Verify that you can access the page with the mobile testing tool. (This will come from one of the IPs used by Google; any checks you do from your computer won’t.)
  2. Country: The mobile testing tool will visit from US IPs, from what I’ve seen, so we get two birds with one stone. But Googlebot will occasionally crawl from non-American IPs, so it’s also worth using a VPN to double-check whether you can access the site from any other relevant countries.
    1. I’ve used HideMyAss for this before, but whatever VPN you have will work fine.

We should now have an idea whether or not Googlebot is struggling to fetch the page once.

Have we found any problems yet?

If we can re-create a failed crawl with a simple check above, then it’s likely Googlebot is probably failing consistently to fetch our page and it’s typically one of those basic reasons.

But it might not be. Many problems are inconsistent because of the nature of technology. ;)

3. Are we telling Google two different things?

Next up: Google can find the page, but are we confusing it by telling it two different things?

This is most commonly seen, in my experience, because someone has messed up the indexing directives.

By “indexing directives,” I’m referring to any tag that defines the correct index status or page in the index which should rank. Here’s a non-exhaustive list:

  • No-index
  • Canonical
  • Mobile alternate tags
  • AMP alternate tags

An example of providing mixed messages would be:

  • No-indexing page A
  • Page B canonicals to page A

Or:

  • Page A has a canonical in a header to A with a parameter
  • Page A has a canonical in the body to A without a parameter

If we’re providing mixed messages, then it’s not clear how Google will respond. It’s a great way to start seeing strange results.

Good places to check for the indexing directives listed above are:

  • Sitemap
    • Example: Mobile alternate tags can sit in a sitemap
  • HTTP headers
    • Example: Canonical and meta robots can be set in headers.
  • HTML head
    • This is where you’re probably looking, you’ll need this one for a comparison.
  • JavaScript-rendered vs hard-coded directives
    • You might be setting one thing in the page source and then rendering another with JavaScript, i.e. you would see something different in the HTML source from the rendered DOM.
  • Google Search Console settings
    • There are Search Console settings for ignoring parameters and country localization that can clash with indexing tags on the page.

A quick aside on rendered DOM

This page has a lot of mentions of the rendered DOM on it (18, if you’re curious). Since we’ve just had our first, here’s a quick recap about what that is.

When you load a webpage, the first request is the HTML. This is what you see in the HTML source (right-click on a webpage and click View Source).

This is before JavaScript has done anything to the page. This didn’t use to be such a big deal, but now so many websites rely heavily on JavaScript that the most people quite reasonably won’t trust the the initial HTML.

Rendered DOM is the technical term for a page, when all the JavaScript has been rendered and all the page alterations made. You can see this in Dev Tools.

In Chrome you can get that by right clicking and hitting inspect element (or Ctrl + Shift + I). The Elements tab will show the DOM as it’s being rendered. When it stops flickering and changing, then you’ve got the rendered DOM!

4. Can Google crawl the page consistently?

To see what Google is seeing, we’re going to need to get log files. At this point, we can check to see how it is accessing the page.

Aside: Working with logs is an entire post in and of itself. I’ve written a guide to log analysis with BigQuery, I’d also really recommend trying out Screaming Frog Log Analyzer, which has done a great job of handling a lot of the complexity around logs.

When we’re looking at crawling there are three useful checks we can do:

  1. Status codes: Plot the status codes over time. Is Google seeing different status codes than you when you check URLs?
  2. Resources: Is Google downloading all the resources of the page?
    1. Is it downloading all your site-specific JavaScript and CSS files that it would need to generate the page?
  3. Page size follow-up: Take the max and min of all your pages and resources and diff them. If you see a difference, then Google might be failing to fully download all the resources or pages. (Hat tip to @ohgm, where I first heard this neat tip).

Have we found any problems yet?

If Google isn’t getting 200s consistently in our log files, but we can access the page fine when we try, then there is clearly still some differences between Googlebot and ourselves. What might those differences be?

  1. It will crawl more than us
  2. It is obviously a bot, rather than a human pretending to be a bot
  3. It will crawl at different times of day

This means that:

  • If our website is doing clever bot blocking, it might be able to differentiate between us and Googlebot.
  • Because Googlebot will put more stress on our web servers, it might behave differently. When websites have a lot of bots or visitors visiting at once, they might take certain actions to help keep the website online. They might turn on more computers to power the website (this is called scaling), they might also attempt to rate-limit users who are requesting lots of pages, or serve reduced versions of pages.
  • Servers run tasks periodically; for example, a listings website might run a daily task at 01:00 to clean up all it’s old listings, which might affect server performance.

Working out what’s happening with these periodic effects is going to be fiddly; you’re probably going to need to talk to a back-end developer.

Depending on your skill level, you might not know exactly where to lead the discussion. A useful structure for a discussion is often to talk about how a request passes through your technology stack and then look at the edge cases we discussed above.

  • What happens to the servers under heavy load?
  • When do important scheduled tasks happen?

Two useful pieces of information to enter this conversation with:

  1. Depending on the regularity of the problem in the logs, it is often worth trying to re-create the problem by attempting to crawl the website with a crawler at the same speed/intensity that Google is using to see if you can find/cause the same issues. This won’t always be possible depending on the size of the site, but for some sites it will be. Being able to consistently re-create a problem is the best way to get it solved.
  2. If you can’t, however, then try to provide the exact periods of time where Googlebot was seeing the problems. This will give the developer the best chance of tying the issue to other logs to let them debug what was happening.

If Google can crawl the page consistently, then we move onto our next step.

5. Does Google see what I can see on a one-off basis?

We know Google is crawling the page correctly. The next step is to try and work out what Google is seeing on the page. If you’ve got a JavaScript-heavy website you’ve probably banged your head against this problem before, but even if you don’t this can still sometimes be an issue.

We follow the same pattern as before. First, we try to re-create it once. The following tools will let us do that:

  • Fetch & Render
    • Shows: Rendered DOM in an image, but only returns the page source HTML for you to read.
  • Mobile-friendly test
    • Shows: Rendered DOM and returns rendered DOM for you to read.
    • Not only does this show you rendered DOM, but it will also track any console errors.

Is there a difference between Fetch & Render, the mobile-friendly testing tool, and Googlebot? Not really, with the exception of timeouts (which is why we have our later steps!). Here’s the full analysis of the difference between them, if you’re interested.

Once we have the output from these, we compare them to what we ordinarily see in our browser. I’d recommend using a tool like Diff Checker to compare the two.

Have we found any problems yet?

If we encounter meaningful differences at this point, then in my experience it’s typically either from JavaScript or cookies

Why?

We can isolate each of these by:

  • Loading the page with no cookies. This can be done simply by loading the page with a fresh incognito session and comparing the rendered DOM here against the rendered DOM in our ordinary browser.
  • Use the mobile testing tool to see the page with Chrome 41 and compare against the rendered DOM we normally see with Inspect Element.

Yet again we can compare them using something like Diff Checker, which will allow us to spot any differences. You might want to use an HTML formatter to help line them up better.

We can also see the JavaScript errors thrown using the Mobile-Friendly Testing Tool, which may prove particularly useful if you’re confident in your JavaScript.

If, using this knowledge and these tools, we can recreate the bug, then we have something that can be replicated and it’s easier for us to hand off to a developer as a bug that will get fixed.

If we’re seeing everything is correct here, we move on to the next step.

6. What is Google actually seeing?

It’s possible that what Google is seeing is different from what we recreate using the tools in the previous step. Why? A couple main reasons:

  • Overloaded servers can have all sorts of strange behaviors. For example, they might be returning 200 codes, but perhaps with a default page.
  • JavaScript is rendered separately from pages being crawled and Googlebot may spend less time rendering JavaScript than a testing tool.
  • There is often a lot of caching in the creation of web pages and this can cause issues.

We’ve gotten this far without talking about time! Pages don’t get crawled instantly, and crawled pages don’t get indexed instantly.

Quick sidebar: What is caching?

Caching is often a problem if you get to this stage. Unlike JS, it’s not talked about as much in our community, so it’s worth some more explanation in case you’re not familiar. Caching is storing something so it’s available more quickly next time.

When you request a webpage, a lot of calculations happen to generate that page. If you then refreshed the page when it was done, it would be incredibly wasteful to just re-run all those same calculations. Instead, servers will often save the output and serve you the output without re-running them. Saving the output is called caching.

Why do we need to know this? Well, we’re already well out into the weeds at this point and so it’s possible that a cache is misconfigured and the wrong information is being returned to users.

There aren’t many good beginner resources on caching which go into more depth. However, I found this article on caching basics to be one of the more friendly ones. It covers some of the basic types of caching quite well.

How can we see what Google is actually working with?

  • Google’s cache
    • Shows: Source code
    • While this won’t show you the rendered DOM, it is showing you the raw HTML Googlebot actually saw when visiting the page. You’ll need to check this with JS disabled; otherwise, on opening it, your browser will run all the JS on the cached version.
  • Site searches for specific content
    • Shows: A tiny snippet of rendered content.
    • By searching for a specific phrase on a page, e.g. inurl:example.com/url “only JS rendered text”, you can see if Google has manage to index a specific snippet of content. Of course, it only works for visible text and misses a lot of the content, but it’s better than nothing!
    • Better yet, do the same thing with a rank tracker, to see if it changes over time.
  • Storing the actual rendered DOM
    • Shows: Rendered DOM
    • Alex from DeepCrawl has written about saving the rendered DOM from Googlebot. The TL;DR version: Google will render JS and post to endpoints, so we can get it to submit the JS-rendered version of a page that it sees. We can then save that, examine it, and see what went wrong.

Have we found any problems yet?

Again, once we’ve found the problem, it’s time to go and talk to a developer. The advice for this conversation is identical to the last one — everything I said there still applies.

The other knowledge you should go into this conversation armed with: how Google works and where it can struggle. While your developer will know the technical ins and outs of your website and how it’s built, they might not know much about how Google works. Together, this can help you reach the answer more quickly.

The obvious source for this are resources or presentations given by Google themselves. Of the various resources that have come out, I’ve found these two to be some of the more useful ones for giving insight into first principles:

But there is often a difference between statements Google will make and what the SEO community sees in practice. All the SEO experiments people tirelessly perform in our industry can also help shed some insight. There are far too many list here, but here are two good examples:

7. Could Google be aggregating your website across others?

If we’ve reached this point, we’re pretty happy that our website is running smoothly. But not all problems can be solved just on your website; sometimes you’ve got to look to the wider landscape and the SERPs around it.

Most commonly, what I’m looking for here is:

  • Similar/duplicate content to the pages that have the problem.
    • This could be intentional duplicate content (e.g. syndicating content) or unintentional (competitors’ scraping or accidentally indexed sites).

Either way, they’re nearly always found by doing exact searches in Google. I.e. taking a relatively specific piece of content from your page and searching for it in quotes.

Have you found any problems yet?

If you find a number of other exact copies, then it’s possible they might be causing issues.

The best description I’ve come up with for “have you found a problem here?” is: do you think Google is aggregating together similar pages and only showing one? And if it is, is it picking the wrong page?

This doesn’t just have to be on traditional Google search. You might find a version of it on Google Jobs, Google News, etc.

To give an example, if you are a reseller, you might find content isn’t ranking because there’s another, more authoritative reseller who consistently posts the same listings first.

Sometimes you’ll see this consistently and straightaway, while other times the aggregation might be changing over time. In that case, you’ll need a rank tracker for whatever Google property you’re working on to see it.

Jon Earnshaw from Pi Datametrics gave an excellent talk on the latter (around suspicious SERP flux) which is well worth watching.

Once you’ve found the problem, you’ll probably need to experiment to find out how to get around it, but the easiest factors to play with are usually:

  • De-duplication of content
  • Speed of discovery (you can often improve by putting up a 24-hour RSS feed of all the new content that appears)
  • Lowering syndication

8. A roundup of some other likely suspects

If you’ve gotten this far, then we’re sure that:

  • Google can consistently crawl our pages as intended.
  • We’re sending Google consistent signals about the status of our page.
  • Google is consistently rendering our pages as we expect.
  • Google is picking the correct page out of any duplicates that might exist on the web.

And your problem still isn’t solved?

And it is important?

Well, shoot.

Feel free to hire us…?

As much as I’d love for this article to list every SEO problem ever, that’s not really practical, so to finish off this article let’s go through two more common gotchas and principles that didn’t really fit in elsewhere before the answers to those four problems we listed at the beginning.

Invalid/poorly constructed HTML

You and Googlebot might be seeing the same HTML, but it might be invalid or wrong. Googlebot (and any crawler, for that matter) has to provide workarounds when the HTML specification isn’t followed, and those can sometimes cause strange behavior.

The easiest way to spot it is either by eye-balling the rendered DOM tools or using an HTML validator.

The W3C validator is very useful, but will throw up a lot of errors/warnings you won’t care about. The closest I can give to a one-line of summary of which ones are useful is to:

  • Look for errors
  • Ignore anything to do with attributes (won’t always apply, but is often true).

The classic example of this is breaking the head.

An iframe isn’t allowed in the head code, so Chrome will end the head and start the body. Unfortunately, it takes the title and canonical with it, because they fall after it — so Google can’t read them. The head code should have ended in a different place.

Oliver Mason wrote a good post that explains an even more subtle version of this in breaking the head quietly.

When in doubt, diff

Never underestimate the power of trying to compare two things line by line with a diff from something like Diff Checker. It won’t apply to everything, but when it does it’s powerful.

For example, if Google has suddenly stopped showing your featured markup, try to diff your page against a historical version either in your QA environment or from the Wayback Machine.


Answers to our original 4 questions

Time to answer those questions. These are all problems we’ve had clients bring to us at Distilled.

1. Why wasn’t Google showing 5-star markup on product pages?

Google was seeing both the server-rendered markup and the client-side-rendered markup; however, the server-rendered side was taking precedence.

Removing the server-rendered markup meant the 5-star markup began appearing.

2. Why wouldn’t Bing display 5-star markup on review pages, when Google would?

The problem came from the references to schema.org.

        <div itemscope="" itemtype="https://schema.org/Movie">
        </div>
        <p>  <h1 itemprop="name">Avatar</h1>
        </p>
        <p>  <span>Director: <span itemprop="director">James Cameron</span> (born August 16, 1954)</span>
        </p>
        <p>  <span itemprop="genre">Science fiction</span>
        </p>
        <p>  <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
        </p>
        <p></div>
        </p>

We diffed our markup against our competitors and the only difference was we’d referenced the HTTPS version of schema.org in our itemtype, which caused Bing to not support it.

C’mon, Bing.

3. Why were pages getting indexed with a no-index tag?

The answer for this was in this post. This was a case of breaking the head.

The developers had installed some ad-tech in the head and inserted an non-standard tag, i.e. not:

  • <title>
  • <style>
  • <base>
  • <link>
  • <meta>
  • <script>
  • <noscript>

This caused the head to end prematurely and the no-index tag was left in the body where it wasn’t read.

4. Why did any page on a website return a 302 about 20–50% of the time, but only for crawlers?

This took some time to figure out. The client had an old legacy website that has two servers, one for the blog and one for the rest of the site. This issue started occurring shortly after a migration of the blog from a subdomain (blog.client.com) to a subdirectory (client.com/blog/…).

At surface level everything was fine; if a user requested any individual page, it all looked good. A crawl of all the blog URLs to check they’d redirected was fine.

But we noticed a sharp increase of errors being flagged in Search Console, and during a routine site-wide crawl, many pages that were fine when checked manually were causing redirect loops.

We checked using Fetch and Render, but once again, the pages were fine.

Eventually, it turned out that when a non-blog page was requested very quickly after a blog page (which, realistically, only a crawler is fast enough to achieve), the request for the non-blog page would be sent to the blog server.

These would then be caught by a long-forgotten redirect rule, which 302-redirected deleted blog posts (or other duff URLs) to the root. This, in turn, was caught by a blanket HTTP to HTTPS 301 redirect rule, which would be requested from the blog server again, perpetuating the loop.

For example, requesting https://www.client.com/blog/ followed quickly enough by https://www.client.com/category/ would result in:

  • 302 to http://www.client.com – This was the rule that redirected deleted blog posts to the root
  • 301 to https://www.client.com – This was the blanket HTTPS redirect
  • 302 to http://www.client.com – The blog server doesn’t know about the HTTPS non-blog homepage and it redirects back to the HTTP version. Rinse and repeat.

This caused the periodic 302 errors and it meant we could work with their devs to fix the problem.

What are the best brainteasers you’ve had?

Let’s hear them, people. What problems have you run into? Let us know in the comments.

Also credit to @RobinLord8, @TomAnthonySEO, @THCapper, @samnemzer, and @sergeystefoglo_ for help with this piece.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

SearchCap: Bing Ads quality policy and conversion changes, Apple search ads & technical SEO

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Bing Ads quality policy and conversion changes, Apple search ads & technical SEO appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in Latest NewsComments Off

How to Find and Fix 14 Technical SEO Problems That Can Be Damaging Your Site Now

Posted by Joe.Robison

Who doesn’t love working on low-hanging fruit SEO problems that can dramatically improve your site?

Across all businesses and industries, the low-effort, high-reward projects should jump to the top of the list of things to implement. And it’s nowhere more relevant than tackling technical SEO issues on your site.

Let’s focus on easy-to-identify, straightforward-to-fix problems. Most of these issues can be uncovered in an afternoon, and it’s possible they can solve months’ worth of traffic problems. While there may not be groundbreaking, complex issues that will fix SEO once and for all, there are easy things to check right now. If your site already checks out for all of these, then you can go home today and start decrypting RankBrain tomorrow.

thatwaseasy.gif

Source

Real quick: The definition of technical SEO is a bit fuzzy. Does it include everything that happens on a site except for content production? Or is it just limited to code and really technical items?

I’ll define technical SEO here as aspects of a site comprising more technical problems that the average marketer wouldn’t identify and take a bit of experience to uncover. Technical SEO problems are also generally, but not always, site-wide problems rather than specific page issues. Their fixes can help improve your site as a whole, rather than just isolated pages.

You’d think that, with all the information out there on the web, many of these would be common knowledge. I’m sure my car mechanic thought the same thing when I busted my engine because I forgot to put oil in it for months. Simple oversights can destroy your machine.

Source

The target audience for this post is beginning to intermediate SEOs and site owners that haven’t inspected their technical SEO for a while, or are doing it for the first time. If just one of these 14 technical SEO problems below is harming your site, I think you’d consider this a valuable read.

This is not a complete technical SEO audit checklist, but a summary of some of the most common and damaging technical SEO problems that you can fix now. I highlighted these based on my own real-world experience analyzing dozens of client and internal websites. Some of these issues I thought I’d never run into… until I did.

This is not a replacement for a full audit, but looking at these right now can actually save you thousands of dollars in lost sales, or worse.

1. Check indexation immediately

Have you ever heard (or asked) the question: “Why aren’t we ranking for our brand name?”

To the website owner, it’s a head-scratcher. To the seasoned SEO, it’s an eye-roll.

Can you get organic traffic to your site if it doesn’t show up in Google search? No.

I love it when complex problems are simplified at a higher level. Sergey Stefoglo at Distilled wrote an article that broke down the complex process of a technical SEO audit into two buckets: indexing and ranking.

The concept is that, instead of going crazy with a 239-point checklist with varying priorities, you sit back and ask the first question: Are the pages on our site indexing?

You can get those answers pretty quickly with a quick site search directly in Google.

What to do: Type site:{yoursitename.com} into Google search and you’ll immediately see how many pages on your site are ranking.

site-moz.png

What to ask:

  • Is that approximately the amount of pages that we’d expect to be indexing?
  • Are we seeing pages in the index that we don’t want?
  • Are we missing pages in the index that we want to rank?

What to do next:

  • Go deeper and check different buckets of pages on your site, such as product pages and blog posts
  • Check subdomains to make sure they’re indexing (or not)
  • Check old versions of your site to see if they’re mistakenly being indexed instead of redirected
  • Look out for spam in case your site was hacked, going deep into the search result to look for anything uncommon (like pharmaceutical or gambling SEO site-hacking spam)
  • Figure out exactly what’s causing indexing problems.

2. Robots.txt

Perhaps the single most damaging character in all of SEO is a simple “/” improperly placed in the robots.txt file.

Everybody knows to check the robots.txt, right? Unfortunately not.

One of the biggest offenders of ruining your site’s organic traffic is a well-meaning developer who forgot to change the robots.txt file after redeveloping your website.

You would think this would be solved by now, but I’m still repeatedly running into random sites that have their entire site blocked because of this one problem

What to do: Go to yoursitename.com/robots.txt and make sure it doesn’t show “User-agent: * Disallow: /”.

Here’s a fancy screenshot:

Screenshot 2017-01-04 17.58.30.png

And this is what it looks like in Google’s index:

2-robots-1.png

What to do next:

  • If you see “Disallow: /”, immediately talk to your developer. There could be a good reason it’s set up that way, or it may be an oversight.
  • If you have a complex robots.txt file, like many ecommerce sites, you should review it line-by-line with your developer to make sure it’s correct.

3. Meta robots NOINDEX

NOINDEX can be even more damaging than a misconfigured robots.txt at times. A mistakenly configured robots.txt won’t pull your pages out of Google’s index if they’re already there, but a NOINDEX directive will remove all pages with this configuration.

Most commonly, the NOINDEX is set up when a website is in its development phase. Since so many web development projects are running behind schedule and pushed to live at the last hour, this is where the mistake can happen.

A good developer will make sure this is removed from your live site, but you must verify that’s the case.

What to do:

  • Manually do a spot-check by viewing the source code of your page, and looking for one of these:
    4-noindex.png
  • 90% of the time you’ll want it to be either “INDEX, FOLLOW” or nothing at all. If you see one of the above, you need to take action.
  • It’s best to use a tool like Screaming Frog to scan all the pages on your site at once

What to do next:

  • If your site is constantly being updated and improved by your development team, set a reminder to check this weekly or after every new site upgrade
  • Even better, schedule site audits with an SEO auditor software tool, like the Moz Pro Site Crawl

4. One version per URL: URL Canonicalization

The average user doesn’t really care if your home page shows up as all of these separately:

But the search engines do, and this configuration can dilute link equity and make your work harder.

Google will generally decide which version to index, but they may index a mixed assortment of your URL versions, which can cause confusion and complexity.

Moz’s canonicalization guide sums it up perfectly:

For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up.”

It’s likely that no one but an SEO would flag this as something to fix, but it can be an easy fix that has a huge impact on your site.

What to do:

  • Manually enter in multiple versions of your home page in the browser to see if they all resolve to the same URL
  • Look also for HTTP vs HTTPS versions of your URLs — only one should exist
  • If they don’t, you’ll want to work with your developer to set up 301 redirects to fix this
  • Use the “site:” operator in Google search to find out which versions of your pages are actually indexing

What to do next:

  • Scan your whole site at once with a scalable tool like Screaming Frog to find all pages faster
  • Set up a schedule to monitor your URL canonicalization on a weekly or monthly basis

5. Rel=canonical

Although the rel=canonical tag is closely related with the canonicalization mentioned above, it should be noted differently because it’s used for more than resolving the same version of a slightly different URL.

It’s also useful for preventing page duplication when you have similar content across different pages — often an issue with ecommerce sites and managing categories and filters.

I think the best example of using this properly is how Shopify’s platform uses rel=canonical URLs to manage their product URLs as they relate to categories. When a product is a part of multiple categories, there are as many URLs as there are categories that product is a part of.

For example, Boll & Branch is on the Shopify platform, and on their Cable Knit Blanket product page we see that from the navigation menu, the user is taken to https://www.bollandbranch.com/collections/baby-blankets/products/cable-knit-baby-blanket.

But looking at the rel=canonical, we see it’s configured to point to the main URL:

<link  href="https://www.bollandbranch.com/products/cable-knit-baby-blanket" />

And this is the default across all Shopify sites.

Every ecommerce and CMS platform comes with a different default setting on how they handle and implement the rel=canonical tag, so definitely look at the specifics for your platform.

What to do:

  • Spot-check important pages to see if they’re using the rel=canonical tag
  • Use a site scanning software to list out all the URLs on your site and determine if there are duplicate page problems that can be solved with a rel=canonical tag
  • Read more on the different use cases for canonical tags and when best to use them

6. Text in images

Text in images — it’s such a simple concept, but out in the wild many, many sites are hiding important content behind images.

Yes, Google can somewhat understand text on images, but it’s not nearly as sophisticated as we would hope in 2017. The best practice for SEO is to keep important text not embedded in an image.

Google’s Gary Illyes confirmed that it’s unlikely Google’s crawler can recognize text well:


CognitiveSEO ran a great test on Google’s ability to extract text from images, and there’s evidence of some stunning accuracy from Google’s technology:

6-text-google-extracts-pdf.jpg

Source: Cognitive SEO

Yet, the conclusion from the test is that image-to-text extraction technology is not being used for ranking search queries:

6-text-google-doesnt-extract-search.jpg

Source: Cognitive SEO

The conclusion from CognitiveSEO is that “this search was proof that the search engine does not, in fact, extract text from images to use it in its search queries. At least not as a general rule.”

And although H1 tags are not as important as they once were, it’s still an on-site SEO best practice to prominently display.

This is actually most important for large sites with many, many pages such as massive ecommerce sites. It’s most important for these sites because they can realistically rank their product or category pages with just a simple keyword-targeted main headline and a string of text.

What to do:

  • Manually inspect the most important pages on your site, checking if you’re hiding important text in your images
  • At scale, use an SEO site crawler to scan all the pages on your site. Look for whether H1 and H2 tags are being found on pages across your site. Also look for the word count as an indication.

What to do next:

  • Create a guide for content managers and developers so that they know the best practice in your organization is to not hide text behind images
  • Collaborate with your design and development team to get the same design look that you had with text embedded in images, but using CSS instead for image overlays

7. Broken backlinks

If not properly overseen by a professional SEO, a website migration or relaunch project can spew out countless broken backlinks from other websites. This is a golden opportunity for recovering link equity.

Some of the top pages on your site may have become 404 pages after a migration, so the backlinks pointing back to these 404 pages are effectively broken.

Two types of tools are great for finding broken backlinks — Google Search Console, and a backlink checker such as Moz, Majestic, or Ahrefs.

In Search Console, you’ll want to review your top 404 errors and it will prioritize the top errors by broken backlinks:

broken-backlinks.png

What to do:

  • After identifying your top pages with backlinks that are dead, 301 redirect these to the best pages
  • Also look for broken links because the linking site typed in your URL wrong or messed up the link code on their end, this is another rich source of link opportunities

What to do next:

  • Use other tools such as Mention or Google Alerts to keep an eye on unlinked mentions that you can reach out to for an extra link
  • Set up a recurring site crawl or manual check to look out for new broken links

8. HTTPS is less optional

What was once only necessary for ecommerce sites is now becoming more of a necessity for all sites.

Google just recently announced that they would start marking any non-HTTPS site as non-secure if the site accepts passwords or credit cards:

“To help users browse the web safely, Chrome indicates connection security with an icon in the address bar. Historically, Chrome has not explicitly labelled HTTP connections as non-secure. Beginning in January 2017 (Chrome 56), we’ll mark HTTP pages that collect passwords or credit cards as non-secure, as part of a long-term plan to mark all HTTP sites as non-secure.”

What’s even more shocking is Google’s plan to label all HTTP URLs as non-secure:

“Eventually, we plan to label all HTTP pages as non-secure, and change the HTTP security indicator to the red triangle that we use for broken HTTPS.”

https-non-secure.png

Going even further, it’s not out of the realm to imagine that Google will start giving HTTPS sites even more of an algorithmic ranking benefit over HTTP.

It’s also not unfathomable that not secure site warnings will start showing up for sites directly in the search results, before a user even clicks through to the site. Google currently displays this for hacked sites, so there’s a precedent set.

This goes beyond just SEO, as this overlaps heavily with web development, IT, and conversion rate optimization.

What to do:

  • If your site currently has HTTPS deployed, run your site through Screaming Frog to see how the pages are resolving
  • Ensure that all pages are resolving to the HTTPS version of the site (same as URL canonicalization mentioned earlier)

What to do next:

  • If your site is not on HTTPS, start mapping out the transition, as Google has made it clear how important it is to them
  • Properly manage a transition to HTTPS by enlisting an SEO migration strategy so as not to lose rankings

9. 301 & 302 redirects

Redirects are an amazing tool in an SEO’s arsenal for managing and controlling dead pages, for consolidating multiple pages, and for making website migrations work without a hitch.

301 redirects are permanent and 302 redirects are temporary. The best practice is to always use 301 redirects when permanently redirecting a page.

301 redirects can be confusing for those new to SEO trying to properly use them:

  • Should you use them for all 404 errors? (Not always.)
  • Should you use them instead of the rel=canonical tag? (Sometimes, not always.)
  • Should you redirect all the old URLs from your previous site to the home page? (Almost never, it’s a terrible idea.)

They’re a lifesaver when used properly, but a pain when you have no idea what to with them.

With great power comes great responsibility, and it’s vitally important to have someone on your team who really understands how to properly strategize the usage and implementation of 301 redirects across your whole site. I’ve seen sites lose up to 60% of their revenue for months, just because these were not properly implemented during a site relaunch.

Despite some statements released recently about 302 redirects being as efficient at passing authority as 301s, it’s not advised to do so. Recent studies have tested this and shown that 301s are the gold standard. Mike King’s striking example shows that the power of 301s over 302s remains:

What to do:

  • Do a full review of all the URLs on your site and look at a high level
  • If using 302 redirects incorrectly for permanent redirects, change these to 301 redirects
  • Don’t go redirect-crazy on all 404 errors — use them for pages receiving links or traffic only to minimize your redirects list

What to do next:

  • If using 302 redirects, discuss with your development team why your site is using them
  • Build out a guide for your organization on the importance of using 301s over 302s
  • Review the redirects implementation from your last major site redesign or migration; there are often tons of errors
  • Never redirect all the pages from an old site to the home page unless there’s a really good reason
  • Include redirect checking in your monthly or weekly site scan process

10. Meta refresh

I though meta refreshes were gone for good and would never be a problem, until they were. I ran into a client using them on their brand-new, modern site when migrating from an old platform, and I quickly recommended that we turn these off and use 301 redirects instead.

The meta refresh is a client-side (as opposed to server-side) redirect and is not recommended by Google or professional SEOs.

If implemented, it would look like this:

Screenshot 2017-01-05 15.46.13.png

Source: Wikipedia

It’s a fairly simple one to check — either you have it or you don’t, and by and large there’s no debate that you shouldn’t be using these.

Google’s John Mu said:

“I would strongly recommend not using meta refresh-type or JavaScript redirects like that if you have changed your URLs. Instead of using those kinds of redirects, try to have your server do a normal 301 redirect. Search engines might recognize the JavaScript or meta refresh-type redirects, but that’s not something I would count on — a clear 301 redirect is always much better.”

And Moz’s own redirection guide states:

“They are most commonly associated with a five-second countdown with the text ‘If you are not redirected in five seconds, click here.’ Meta refreshes do pass some link juice, but are not recommended as an SEO tactic due to poor usability and the loss of link juice passed.”

What to do:

What to do next:

  • Communicate to your developers the importance of using 301 redirects as a standard and never using meta refreshes unless there’s a really good reason
  • Schedule a monthly check to monitor redirect type usage

11. XML sitemaps

XML sitemaps help Google and other search engine spiders crawl and understand your site. Most often they have the biggest impact for large and complex sites that need to give extra direction to the crawlers.

Google’s Search Console Help Guide is quite clear on the purpose and helpfulness of XML sitemaps:

“If your site’s pages are properly linked, our web crawlers can usually discover most of your site. Even so, a sitemap can improve the crawling of your site, particularly if your site meets one of the following criteria:

- Your site is really large.

- Your site has a large archive of content pages that are isolated or well not linked to each other.

- Your site is new and has few external links to it.”

A few of the biggest problems I’ve seen with XML sitemaps while working on clients’ sites:

  • Not creating it in the first place
  • Not including the location of the sitemap in the robots.txt
  • Allowing multiple versions of the sitemap to exist
  • Allowing old versions of the sitemap to exist
  • Not keeping Search Console updated with the freshest copy
  • Not using sitemap indexes for large sites

What to do:

  • Use the above list to review that you’re not violating any of these problems
  • Check the number of URLs submitted and indexed from your sitemap within Search Console to get an idea of the quality of your sitemap and URLs

What to do next:

  • Monitor indexation of URLs submitted in XML sitemap frequently from within Search Console
  • If your site grows more complex, investigate ways to use XML sitemaps and sitemap indexes to your advantage, as Google limits each sitemap to 10MB and 50,000 URLs

12. Unnatural word count & page size

I recently ran into this issue while reviewing a site: Most pages on the site didn’t have more than a few hundred words, but in a scan of the site using Screaming Frog, it showed nearly every page having 6,000–9,000 words:

Screenshot 2017-01-05 16.25.58.png

It made no sense. But upon viewing the source code, I saw that there were some Terms and Conditions text that was meant to be displayed on only a single page, but embedded on every page of the site with a “Display: none;” CSS style.

This can slow down the load speed of your page and could possibly trigger some penalty issues if seen as intentional cloaking.

In addition to word count, there can be other code bloat on the page, such as inline Javascript and CSS. Although fixing these problems would fall under the purview of the development team, you shouldn’t rely on the developers to be proactive in identifying these types of issues.

What to do:

  • Scan your site and compare calculated word count and page size with what you expect
  • Review the source code of your pages and recommend areas to reduce bloat
  • Ensure that there’s no hidden text that can trip algorithmic penalties

What to do next:

  • There could be a good reason for hidden text in the source code from a developer’s perspective, but it can cause speed and other SEO issues if not fixed.
  • Review page size and word count across all URLs on your site periodically to keep tabs on any issues

13. Speed

You’ve heard it a million times, but speed is key — and definitely falls under the purview of technical SEO.

Google has clearly stated that speed is a small part of the algorithm:

“Like us, our users place a lot of value in speed — that’s why we’ve decided to take site speed into account in our search rankings. We use a variety of sources to determine the speed of a site relative to other sites.”

Even with this clear SEO directive, and obvious UX and CRO benefits, speed is at the bottom of the priority list for many site managers. With mobile search clearly cemented as just as important as desktop search, speed is even more important and can no longer be ignored.

On his awesome Technical SEO Renaissance post, Mike King said speed is the most important thing to focus on in 2017 for SEO:

“I feel like Google believes they are in a good place with links and content so they will continue to push for speed and mobile-friendliness. So the best technical SEO tactic right now is making your site faster.”

Moz’s page speed guide is a great resource for identifying and fixing speed issues on your site.

What to do:

  • Audit your site speed and page speed using SEO auditing tools
  • Unless you’re operating a smaller site, you’ll want to work closely with your developer on this one. Make your site as fast as possible.
  • Continuously push for resources to focus on site speed across your organization.

14. Internal linking structure

Your internal linking structure can have a huge impact on your site’s crawlability from search spiders.

Where does it fall on your list of priorities? It depends. If you’re optimizing a massive site with isolated pages that don’t fall within a clean site architecture a few clicks from the home page, you’ll need to put a lot of effort into it. If you’re managing a simple site on a standard platform like WordPress, it’s not going to be at the top of your list.

You want to think about these things when building out your internal linking plan:

  • Scalable internal linking with plugins
  • Using optimized anchor text without over-optimizing
  • How internal linking relates to your main site navigation

I built out this map of a fictional site to demonstrate how different pages on a site can connect to each other through both navigational site links and internal links:

Website navigation with internal links diagram.

Source: Green Flag Digital

Even with a rock-solid site architecture, putting a focus on internal links can push some sites higher up the search rankings.

What to do:

  • Test out manually how you can move around your site by clicking on in-content, editorial-type links on your blog posts, product pages, and important site pages. Note where you see opportunity.
  • Use site auditor tools to find and organize the pages on your site by internal link count. Are your most important pages receiving sufficient internal links?

What to do next:

  • Even if you build out the perfect site architecture, there’s more opportunity for internal link flow — so always keep internal linking in mind when producing new pages
  • Train content creators and page publishers on the importance of internal linking and how to implement links effectively.

Conclusion

Here’s a newsflash for site owners: It’s very likely that your developer is not monitoring and fixing your technical SEO problems, and doesn’t really care about traffic to your site or fixing your SEO issues. So if you don’t have an SEO helping you with technical issues, don’t assume your developer is handling it. They have enough on their plate and they’re not incentivized to fix SEO problems.

I’ve run into many technical SEO issues during and after website migrations when not properly managed with SEO in mind. I’m compelled to highlight the disasters that can go wrong if this isn’t looked after closely by an expert. Case studies of site migrations gone terribly wrong is a topic for another day, but I implore you to take technical SEO seriously for the benefit of your company.

Hopefully this post has helped clarify some of the most important technical SEO issues that may be harming your site today and how to start fixing them. For those who have never taken a look at the technical side of things, some of these really are easy fixes and can have a hugely positive impact on your site.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

Automating Technical Reporting for SEO

Posted by petewailes

As the web gets more complex, with JavaScript framework and library front ends on websites, progressive web apps, single-page apps, JSON-LD, and so on, we’re increasingly seeing an ever-greater surface area for things to go wrong. When all you’ve got is HTML and CSS and links, there’s only so much you can mess up. However, in today’s world of dynamically generated websites with universal JS interfaces, there’s a lot of room for errors to creep in.

The second problem we face with much of this is that it’s hard to know when something’s gone wrong, or when Google’s changed how they’re handling something. This is only compounded when you account for situations like site migrations or redesigns, where you might suddenly archive a lot of old content, or re-map a URL structure. How do we address these challenges then?

The old way

Historically, the way you’d analyze things like this is through looking at your log files using Excel or, if you’re hardcore, Log Parser. Those are great, but they require you to know you’ve got an issue, or that you’re looking and happen to grab a section of logs that have the issues you need to address in them. Not impossible, and we’ve written about doing this fairly extensively both in our blog and our log file analysis guide.

The problem with this, though, is fairly obvious. It requires that you look, rather than making you aware that there’s something to look for. With that in mind, I thought I’d spend some time investigating whether there’s something that could be done to make the whole process take less time and act as an early warning system.

A helping hand

The first thing we need to do is to set our server to send log files somewhere. My standard solution to this has become using log rotation. Depending on your server, you’ll use different methods to achieve this, but on Nginx it looks like this:

# time_iso8601 looks like this: 2016-08-10T14:53:00+01:00
if ($  time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})") {
        set $  year $  1;
        set $  month $  2;
        set $  day $  3;
}
<span class="redactor-invisible-space">
</span>access_log /var/log/nginx/$  year-$  month-$  day-access.log;

This allows you to view logs for any specific date or set of dates by simply pulling the data from files relating to that period. Having set up log rotation, we can then set up a script, which we’ll run at midnight using Cron, to pull the log file that relates to yesterday’s data and analyze it. Should you want to, you can look several times a day, or once a week, or at whatever interval best suits your level of data volume.

The next question is: What would we want to look for? Well, once we’ve got the logs for the day, this is what I get my system to report on:

30* status codes

Generate a list of all pages hit by users that resulted in a redirection. If the page linking to that resource is on your site, redirect it to the actual end point. Otherwise, get in touch with whomever is linking to you and get them to sort the link to where it should go.

404 status codes

Similar story. Any 404ing resources should be checked to make sure they’re supposed to be missing. Anything that should be there can be investigated for why it’s not resolving, and links to anything actually missing can be treated in the same way as a 301/302 code.

50* status codes

Something bad has happened and you’re not going to have a good day if you’re seeing many 50* codes. Your server is dying on requests to specific resources, or possibly your entire site, depending on exactly how bad this is.

Crawl budget

A list of every resource Google crawled, how many times it was requested, how many bytes were transferred, and time taken to resolve those requests. Compare this with your site map to find pages that Google won’t crawl, or that it’s hammering, and fix as needed.

Top/least-requested resources

Similar to the above, but detailing the most and least requested things by search engines.

Bad actors

Many bots looking for vulnerabilities will make requests to things like wp_admin, wp_login, 404s, config.php, and other similar common resource URLs. Any IP address that makes repeated requests to these sorts of URLs can be added automatically to an IP blacklist.

Pattern-matched URL reporting

It’s simple to use regex to match requested URLs against pre-defined patterns, to report on specific areas of your site or types of pages. For example, you could report on image requests, Javascript files being called, pagination, form submissions (via looking for POST requests), escaped fragments, query parameters, or virtually anything else. Provided it’s in a URL or HTTP request, you can set it up as a segment to be reported on.

Spiky search crawl behavior

Log the number of requests made by Googlebot every day. If it increases by more than x%, that’s something of interest. As a side note, with most number series, a calculation to spot extreme outliers isn’t hard to create, and is probably worth your time.

Outputting data

Depending on what the importance is of any particular section, you can then set the data up to be logged in a couple of ways. Firstly, large amounts of 40* and 50* status codes or bad actor requests would be worth triggering an email for. This can let you know in a hurry if something’s happening which potentially indicates a large issue. You can then get on top of whatever that may be and resolve it as a matter of priority.

The data as a whole can also be set up to be reported on via a dashboard. If you don’t have that much data in your logs on a daily basis, you may simply want to query the files at runtime and generate the report fresh each time you view it. On the other hand, sites with a lot of traffic and thus larger log files may want to cache the output of each day to a separate file, so the data doesn’t have to be computed. Obviously the type of approach you use to do that depends a lot on the scale you’ll be operating at and how powerful your server hardware is.

Conclusion

Thanks to server logs and basic scripting, there’s no reason you should ever have a situation where something’s amiss on your site and you don’t know about it. Proactive notifications of technical issues is a necessary thing in a world where Google crawls at an ever-faster rate, meaning that they could start pulling your rankings down thanks to site downtime or errors within a matter of hours.

Set up proper monitoring and make sure you’re not caught short!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

Struggling to Write for Technical Experts? Try These 3 Powerful Content Marketing Practices

tips for top-notch technical content

Engineers and other technical experts take to the web to educate themselves on their options now more than ever before.

When sifting through online content, engineers and other experts in their fields want facts, not a hard sell. They’re conducting serious research.

In fact, according to a study by CEB in partnership with Google, 57 percent of the B2B purchasing process has been completed by the time someone contacts a salesperson.

So, as content marketers, we need to give them the information they need to make smart purchasing decisions. But engineers have already studied for years to accrue their subject matter expertise. Can marketers actually talk intelligently to them online?

A marketer’s challenge lies in extracting the best information and translating it into relatable content, while not sacrificing accuracy in the process.

Journalists like Ed Yong and Carl Zimmer bring cutting-edge science to the masses on a regular basis, and content marketers can follow their leads.

Here are three content marketing tips that non-experts can use when writing about technical subjects.

1. Gather facts from experts

When you interview experts within your clients’ companies and mine their heads for their hard-earned knowledge, you’ll find that many of them love to be asked about their fields.

It’s not every day a layperson asks a metallurgist about induction furnaces or an architect about designing aircraft hangars.

Fair warning: At first these interviews might be overwhelming or intimidating. Marketers often feel afraid to ask “dumb” questions.

However, a state of “non-knowledge” is a great place to start. Admit to your expert that you’re not too familiar with their topic, and they’ll realize that they need to start at the beginning.

Ask them to use long-form phrases instead of acronyms, and never let them gloss over something you don’t understand just to keep the conversation flowing. Asking for clarification shows how closely you’re following along.

Pro tip

Get approval from a company’s communications or marketing department before talking to any technical experts. You want to make sure they don’t divulge any proprietary or protected technical information that could get them in trouble.

Once you’ve written your content, have the company’s legal team review and approve any information that will be published outside the company.

2. Supplement your interviews with your own research

Read what’s already been written so that your target audience doesn’t have to, and synthesize that content in a way that is straightforward and easy to understand.

Google Scholar and government websites are resources you could use to conduct your own research.

For example, if new EPA rules affect how your client engineers their generators, go right to the source. Government agencies will have published those rules, so familiarize yourself with them and learn how they’ll affect your client’s customers.

3. Clarify and satisfy

It’s a content marketer’s job to simplify ideas so that they’re accessible, but not so much that they’re inaccurate.

Metaphors and storytelling are great techniques to incorporate into your content.

Can you make a connection between something complicated and something that’s encountered by most people on a regular basis?

You also want to consider the different types of people who may read your content. Will highly technical engineering content resonate with your target audience? Or do you need to produce content for a decision-maker who’s considering how an investment will affect the bottom line or deliver ROI?

If both types of people are part of your audience, consider how your content marketing strategy can satisfy both perspectives.

The power of education

When writing online content for a technical audience, it’s imperative to keep your overall goal in mind.

You want to cultivate trust by providing education — not by being a pushy salesperson.

B2B content marketing that informs helps marketers give technical audiences the content they’re looking for. And they’re likely to remember where they got that help when it comes time to buy.

Do you write for a technical audience? Share your methods in the comments below.

The post Struggling to Write for Technical Experts? Try These 3 Powerful Content Marketing Practices appeared first on Copyblogger.


Copyblogger

Posted in Latest NewsComments Off

The Technical SEO Renaissance: The Whys and Hows of SEO’s Forgotten Role in the Mechanics of the Web

Posted by iPullRank

Web technologies and their adoption are advancing at a frenetic pace. Content is a game that every type of team and agency plays, so we’re all competing for a piece of that pie. Meanwhile, technical SEO is more complicated and more important than ever before and much of the SEO discussion has shied away from its growing technical components in favor of content marketing.

As a result, SEO is going through a renaissance wherein the technical components are coming back to the forefront and we need to be prepared. At the same time, a number of thought leaders have made statements that modern SEO is not technical. These statements misrepresent the opportunities and problems that have sprouted on the backs of newer technologies. They also contribute to an ever-growing technical knowledge gap within SEO as a marketing field and make it difficult for many SEOs to solve our new problems.

That resulting knowledge gap that’s been growing for the past couple of years influenced me to, for the first time, “tour” a presentation. I’d been giving my Technical SEO Renaissance talk in one form or another since January because I thought it was important to stoke a conversation around the fact that things have shifted and many organizations and websites may be behind the curve if they don’t account for these shifts. A number of things have happened that prove I’ve been on the right track since I began giving this presentation, so I figured it’s worth bringing the discussion to continue the discussion. Shall we?

An abridged history of SEO (according to me)

It’s interesting to think that the technical SEO has become a dying breed in recent years. There was a time when it was a prerequisite.

Image via PCMag

Personally, I started working on the web in 1995 as a high school intern at Microsoft. My title, like everyone else who worked on the web then, was “webmaster.” This was well before the web profession splintered into myriad disciplines. There was no Front End vs. Backend. There was no DevOps or UX person. You were just a Webmaster.

Back then, before Yahoo, AltaVista, Lycos, Excite, and WebCrawler entered their heyday, we discovered the web by clicking linkrolls, using Gopher, Usenet, IRC, from magazines, and via email. Around the same time, IE and Netscape were engaged in the Browser Wars and you had more than one client-side scripting language to choose from. Frames were the rage.

Then the search engines showed up. Truthfully, at this time, I didn’t really think about how search engines worked. I just knew Lycos gave me what I believed to be the most trustworthy results to my queries. At that point, I had no idea that there was this underworld of people manipulating these portals into doing their bidding.

Enter SEO.

Image via Fox

SEO was born of a cross-section of these webmasters, the subset of computer scientists that understood the otherwise esoteric field of information retrieval and those “Get Rich Quick on the Internet” folks. These Internet puppeteers were essentially magicians who traded tips and tricks in the almost dark corners of the web. They were basically nerds wringing dollars out of search engines through keyword stuffing, content spinning, and cloaking.

Then Google showed up to the party.

Image via droidforums.net

Early Google updates started the cat-and-mouse game that would shorten some perpetual vacations. To condense the last 15 years of search engine history into a short paragraph, Google changed the game from being about content pollution and link manipulation through a series of updates starting with Florida and more recently Panda and Penguin. After subsequent refinements of Panda and Penguin, the face of the SEO industry changed pretty dramatically. Many of the most arrogant “I can rank anything” SEOs turned white hat, started software companies, or cut their losses and did something else. That’s not to say that hacks and spam links don’t still work, because they certainly often do. Rather, Google’s sophistication finally discouraged a lot of people who no longer have the stomach for the roller coaster.

Simultaneously, people started to come into SEO from different disciplines. Well, people always came into SEO from very different professional histories, but it started to attract a lot more more actual “marketing” people. This makes a lot of sense because SEO as an industry has shifted heavily into a content marketing focus. After all, we’ve got to get those links somehow, right?

Image via Entrepreneur

Naturally, this begat a lot of marketers marketing to marketers about marketing who made statements like “Modern SEO Requires Almost No Technical Expertise.”

Or one of my favorites, that may have attracted even more ire: “SEO is Makeup.”

Image via Search Engine Land

While I, naturally, disagree with these statements, I understand why these folks would contribute these ideas in their thought leadership. Irrespective of the fact that I’ve worked with both gentlemen in the past in some capacity and know their predispositions towards content, the core point they’re making is that many modern Content Management Systems do account for many of our time-honored SEO best practices. Google is pretty good at understanding what you’re talking about in your content. Ultimately, your organization’s focus needs to be on making something meaningful for your user base so you can deliver competitive marketing.

If you remember the last time I tried to make the case for a paradigm shift in the SEO space, you’d be right in thinking that I agree with that idea fundamentally. However, not at the cost of ignoring the fact that the technical landscape has changed. Technical SEO is the price of admission. Or, to quote Adam Audette, “SEO should be invisible,” not makeup.

Changes in web technology are causing a technical renaissance

In SEO, we often criticize developers for always wanting to deploy the new shiny thing. Moving forward, it’s important that we understand the new shiny things so we can be more effective in optimizing them.

SEO has always had a healthy fear of JavaScript, and with good reason. Despite the fact that search engines have had the technology to crawl the web the same way we see it in a browser for at least 10 years, it has always been a crapshoot as to whether that content actually gets crawled and, more importantly, indexed.

When we’d initially examined the idea of headless browsing in 2011, the collective response was that the computational expense prohibited it at scale. But it seems that even if that is the case, Google believes enough of the web is rendered using JavaScript that it’s a worthy investment.

Over time more and more folks would examine this idea; ultimately, a comment from this ex-Googler on Hacker News would indicate that this has long been something Google understood needed conquering:

This was actually my primary role at Google from 2006 to 2010.

One of my first test cases was a certain date range of the Wall Street Journal’s archives of their Chinese language pages, where all of the actual text was in a JavaScript string literal, and before my changes, Google thought all of these pages had identical content… just the navigation boilerplate. Since the WSJ didn’t do this for its English language pages, my best guess is that they weren’t trying to hide content from search engines, but rather trying to work around some old browser bug that incorrectly rendered (or made ugly) Chinese text, but somehow rendering text via JavaScript avoided the bug.

The really interesting parts were (1) trying to make sure that rendering was deterministic (so that identical pages always looked identical to Google for duplicate elimination purposes) (2) detecting when we deviated significantly from real browser behavior (so we didn’t generate too many nonsense URLs for the crawler or too many bogus redirects), and (3) making the emulated browser look a bit like IE and Firefox (and later Chrome) at the some time, so we didn’t get tons of pages that said “come back using IE” er “please download Firefox”.

I ended up modifying SpiderMonkey’s bytecode dispatch to help detect when the simulated browser had gone off into the weeds and was likely generating nonsense.

I went through a lot of trouble figuring out the order that different JavaScript events were fired off in IE, FireFox, and Chrome. It turns out that some pages actually fire off events in different orders between a freshly loaded page and a page if you hit the refresh button. (This is when I learned about holding down shift while hitting the browser’s reload button to make it act like it was a fresh page fetch.)

At some point, some SEO figured out that random() was always returning 0.5. I’m not sure if anyone figured out that JavaScript always saw the date as sometime in the Summer of 2006, but I presume that has changed. I hope they now set the random seed and the date using a keyed cryptographic hash of all of the loaded javascript and page text, so it’s deterministic but very difficult to game. (You can make the date determistic for a month and dates of different pages jump forward at different times by adding an HMAC of page content (mod number of seconds in a month) to the current time, rounding down that time to a month boundary, and then subtracting back the value you added earlier. This prevents excessive index churn from switching all dates at once, and yet gives each page a unique date.)

Now, consider these JavaScript usage statistics across the web from BuiltWith:

JavaScript is obviously here to stay. Most of the web is using it to render content in some form or another. This means there’s potential for search quality to plummet over time if Google couldn’t make sense of what content is on pages rendered with JavaScript.

Additionally, Google’s own JavaScript MVW framework, AngularJS, has seen pretty strong adoption as of late. When I attended Google’s I/O conference a few months ago, the recent advancements of Progressive Web Apps and Firebase were being harped upon due to the speed and flexibility they bring to the web. You can only expect that developers will make a stronger push.

Image via Builtwith

Sadly, despite BuiltVisible’s fantastic contributions to the subject, there hasn’t been enough discussion around Progressive Web Apps, Single-Page Applications, and JavaScript frameworks in the SEO space. Instead, there are arguments about 301s vs 302s. Perhaps the latest spike in adoption and the proliferation of PWAs, SPAs, and JS frameworks across different verticals will change that. At iPullRank, we’ve worked with a number of companies who have made the switch to Angular; there’s a lot worth discussing on this specific topic.

Additionally, Facebook’s contribution to the JavaScript MVW frameworks, React, is being adopted for the very similar speed and benefits of flexibility in the development process.

However, regarding SEO, the key difference between Angular and React is that, from the beginning, React had a renderToString function built in which allows the content to render properly from the server side. This makes the question of indexation of React pages rather trivial.

AngularJS 1.x, on the other hand, has birthed an SEO best practice wherein you pre-render pages using headless browser-driven snapshot appliance such as Prerender.io, Brombone, etc. This is somewhat ironic, as it’s Google’s own product. More on that later.

View Source is dead

As a result of the adoption of these JavaScript frameworks, using View Source to examine the code of a website is an obsolete practice. What you’re seeing in View Source is not the computed Document Object Model (DOM). Rather, you’re seeing the code before it’s processed by the browser. The lack of understanding around why you might need to view a page’s code differently is another instance where having a more detailed understanding of the technical components of how the web works is more effective.

Depending on how the page is coded, you may see variables in the place of actual content, or you may not see the completed DOM tree that’s there once the page has loaded completely. This is the fundamental reason why, as soon as an SEO hears that there’s JavaScript on the page, the recommendation is to make sure all content is visible without JavaScript.

To illustrate the point further, consider this View Source view of Seamless.com. If you look for the meta description or the rel-canonical on this page, you’ll find variables in the place of the actual copy:

If instead you look at the code in the Elements section of Chrome DevTools or Inspect Element in other browsers, you’ll find the fully executed DOM. You’ll see the variables are now filled in with copy. The URL for the rel-canonical is on the page, as is the meta description:

Since search engines are crawling this way, you may be missing out on the complete story of what’s going on if you default to just using View Source to examine the code of the site.

HTTP/2 is on the way

One of Google’s largest points of emphasis is page speed. An understanding of how networking impacts page speed is definitely a must-have to be an effective SEO.

Before HTTP/2 was announced, the HyperText Transfer Protocol specification had not been updated in a very long time. In fact, we’ve been using HTTP/1.1 since 1999. HTTP/2 is a large departure from HTTP/1.1, and I encourage you to read up on it, as it will make a dramatic contribution to the speed of the web.

Image via Slideshare

Quickly though, one of the biggest differences is that HTTP/2 will make use of one TCP (Transmission Control Protocol) connection per origin and “multiplex” the stream. If you’ve ever taken a look at the issues that Google PageSpeed Insights highlights, you’ll notice that one of the primary things that always comes up is limiting the number of HTTP requests/ This is what multiplexing helps eliminate; HTTP/2 opens up one connection to each server, pushing assets across it at the same time, often making determinations of required resources based on the initial resource. With browsers requiring Transport Layer Security (TLS) to leverage HTTP/2, it’s very likely that Google will make some sort of push in the near future to get websites to adopt it. After all, speed and security have been common threads throughout everything in the past five years.

Image via Builtwith

As of late, more hosting providers have been highlighting the fact that they are making HTTP/2 available, which is probably why there’s been a significant jump in its usage this year. The beauty of HTTP/2 is that most browsers already support it and you don’t have to do much to enable it unless your site is not secure.

Image via CanIUse.com

Definitely keep HTTP/2 on your radar, as it may be the culmination of what Google has been pushing for.

SEO tools are lagging behind search engines

When I think critically about this, SEO tools have always lagged behind the capabilities of search engines. That’s to be expected, though, because SEO tools are built by smaller teams and the most important things must be prioritized. A lack of technical understanding may lead to you believe the information from the tools you use when they are inaccurate.

When you review some of Google’s own documentation, you’ll find that some of my favorite tools are not in line with Google’s specifications. For instance, Google allows you to specify hreflang, rel-canonical, and x-robots in HTTP headers. There’s a huge lack of consistency in SEO tools’ ability to check for those directives.

It’s possible that you’ve performed an audit of a site and found it difficult to determine why a page has fallen out of the index. It very well could be because a developer was following Google’s documentation and specifying a directive in an HTTP header, but your SEO tool did not surface it. In fact, it’s generally better to set these at the HTTP header level than to add bytes to your download time by filling up every page’s <head> with them.

Google is crawling headless, despite the computational expense, because they recognize that so much of the web is being transformed by JavaScript. Recently, Screaming Frog made the shift to render the entire page using JS:

To my knowledge, none of the other crawling tools are doing this yet. I do recognize the fact that it would be considerably more expensive for all SEO tools to make this shift because cloud server usage is time-based and it takes significantly more time to render a page in a browser than to just download the main HTML file. How much time?

A ton more time, actually. I just wrote a simple script that just loads the HTML using both cURL and HorsemanJS. cURL took an average of 5.25 milliseconds to download the HTML of the Yahoo homepage. HorsemanJS, on the other hand, took an average of 25,839.25 milliseconds or roughly 26 seconds to render the page. It’s the difference between crawling 686,000 URLs an hour and 138.

Ideally, SEO tools would extract the technologies in use on the site or perform some sort of DIFF operation on a few pages and then offer the option to crawl headless if it’s deemed worthwhile.

Finally, Google’s specs on mobile also say that you can use client-side redirects. I’m not aware of a tool that tracks this. Now, I’m not saying leveraging JavaScript redirects for mobile is the way you should do it. Rather that Google allows it, so we should be able to inspect it easily.

Luckily, until SEO tools catch up, Chrome DevTools does handle a lot of these things. For instance, the HTTP Request and Response headers section will show you x-robots, hreflang, and rel-canonical HTTP headers.

You can also use DevTools’ GeoLocation Emulator to get view the web as though you are in a different location. For those of you who have fond memories of the nearEquals query parameter, this is another way you can get a sense of where you rank in precise locations.

Chrome DevTools also allows you to plug in your Android device and control it from your browser. There’s any number of use cases for this from an SEO perspective, but Simo Ahava wrote a great instructional post on how you can use it to debug your mobile analytics setup. You can do the same on iOS devices in Safari if you have a Mac.

What truly are rankings in 2016?

Rankings are a funny thing and, truthfully, have been for some time now. I, myself, was resistant to the idea of averaged rankings when Google rolled them out in Webmaster Tools/Search Console, but average rankings actually make a lot more sense than what we look at in standard ranking tools. Let me explain.

SEO tools pull rankings based on a situation that doesn’t actually exist in the real world. The machines that scrape Google are meant to be clean and otherwise agnostic unless you explicitly specify a location. Effectively, these tools look to understand how rankings would look to users searching for the first time with no context or history with Google. Ranking software emulates a user who is logging onto the web for the first time ever and the first thing they think to do is search for “4ft fishing rod.” Then they continually search for a series of other related and/or unrelated queries without ever actually clicking on a result. Granted. some software may do other things to try and emulate that user, but either way they collect data that is not necessarily reflective of what real users see. And finally, with so many people tracking many of the same keywords so frequently, you have to wonder how much these tools inflate search volume.

The bottom line is that we are ignoring true user context, especially in the mobile arena.

Rankings tools that allow you to track mobile rankings usually let you define one context or they will simply specify “mobile phone” as an option. Cindy Krum’s research indicates that SERP features and rankings will be different based on the combination of user agent, phone make and model, browser, and even the content on their phone.

Rankings tools also ignore the user’s reality of choice. We’re in an era where there are simply so many elements that comprise the SERP, that #1 is simply NOT #1. In some cases, #1 is the 8th choice on the page and far below the fold.

With AdWords having a 4th ad slot, organic being pushed far below the fold, and users not being sure of the difference between organic and paid, being #1 in organic doesn’t mean what it used to. So when we look at rankings reports that tell us we’re number one, we’re often deluding ourselves as to what outcome that will drive. When we report that to clients, we’re not focusing on actionability or user context. Rather, we are focusing entirely on vanity.

Of course, rankings are not a business goal; they’re a measure of potential or opportunity. No matter how much we talk about how they shouldn’t be the main KPI, rankings are still something that SEOs point at to show they’re moving the needle. Therefore we should consider thinking of organic rankings as being relative to the SERP features that surround them.

In other words, I’d like to see rankings include both the standard organic 1–10 ranking as well as the absolute position with regard to Paid, local packs, and featured snippets. Anything else is ignoring the impact of the choices that are overwhelmingly available to the user.

Recently, we’ve seen some upgrades to this effect with Moz making a big change to how they are surfacing features of rankings and I know a number of other tools have highlighted the organic features as well. Who will be the first to highlight the Integrated Search context? After all, many users don’t know the difference.

What is cloaking in 2016?

Cloaking is officially defined as showing search engines something different from the user. What does that mean when Google allows adaptive and responsive sites and crawls both headless and text-based? What does that mean when Googlebot respects 304 response codes?

Under adaptive and responsive models, it’s often the case that more or less content is shown for different contexts. This is rare for responsive, as it’s meant to reposition and size content by definition, but some implementations may instead reduce content components to make the viewing context work.

In the case when a site responds to screen resolution by changing what content is shown and more content is shown beyond the resolution that Googlebot renders, how do they distinguish that from cloaking?

Similarly, the 304 response code is way to indicate to the client that the content has not been modified since the last time it visited; therefore, there’s no reason to download it again.

Googlebot adheres to this response code to keep from being a bandwidth hog. So what’s to stop a webmaster from getting one version of the page indexed, changing it, and then returning a 304?

I don’t know that there are definitive answers to those questions at this point. However, based on what I’m seeing in the wild, these have proven to be opportunities for technical SEOs that are still dedicated to testing and learning.

Crawling

Accessibility of content as a fundamental component that SEOs must examine has not changed. What has changed is the type of analytical effort that needs to go into it. It’s been established that Google’s crawling capabilities have improved dramatically and people like Eric Wu have done a great job of surfacing the granular detail of those capabilities with experiments like JSCrawlability.com

Similarly, I wanted to try an experiment to see how Googlebot behaves once it loads a page. Using LuckyOrange, I attempted to capture a video of Googlebot once it gets to the page:

I installed the LuckyOrange script on a page that hadn’t been indexed yet and set it up so that it only only fires if the user agent contains “googlebot.” Once I was set up, I then invoked Fetch and Render from Search Console. I’d hoped to see mouse scrolling or an attempt at a form fill. Instead, the cursor never moved and Googlebot was only on the page for a few seconds. Later on, I saw another hit from Googlebot to that URL and then the page appeared in the index shortly thereafter. There was no record of the second visit in LuckyOrange.

While I’d like to do more extensive testing on a bigger site to validate this finding, my hypothesis from this anecdotal experience is that Googlebot will come to the site and make a determination of whether a page/site needs to be crawled using the headless crawler. Based on that, they’ll come back to the site using the right crawler for the job.

I encourage you to give it a try as well. You don’t have to use LuckyOrange — you could use HotJar or anything else like it — but here’s my code for LuckyOrange:

jQuery(function() {
    Window.__lo_site_id = XXXX;
    if (navigator.userAgent.toLowerCase().indexOf(‘googlebot’) >)
    {
        var wa = document.createElement(‘script’);
        wa.type = ‘text/javascript’;
        wa.async = true;
        wa.src = (‘https’ == document.location.protocol ? ‘<a href="https://ssl">https://ssl</a>’ : ’<a href="http://cdn">http://cdn</a>’) + ‘.luckyorange.com/w.js’;
        var s = document.getElementByTagName(‘script’)[0];
        s.parentNode.insertBefore(wa,s);
        // Tag it with Googlebot
        window._loq = window._low || [];
        window._loq .push([“tag”, “Googlebot”]);
    }
));

The moral of the story, however, is that what Google sees, how often they see it, and so on are still primary questions that we need to answer as SEOs. While it’s not sexy, log file analysis is an absolutely necessary exercise, especially for large-site SEO projects — perhaps now more than ever, due to the complexities of sites. I’d encourage you to listen to everything Marshall Simmonds says in general, but especially on this subject.

To that end, Google’s Crawl Stats in Search Console are utterly useless. These charts tell me what, exactly? Great, thanks Google, you crawled a bunch of pages at some point in February. Cool!

There are any number of log file analysis tools out there, from Kibana in the ELK stack to other tools such as Logz.io. However, the Screaming Frog team has made leaps and bounds in this arena with the recent release of their Log File Analyzer.

Of note with this tool is how easily it handles millions of records, which I hope is an indication of things to come with their Spider tool as well. Irrespective of who makes the tool, the insights that it helps you unlock are incredibly valuable in terms of what’s actually happening.

We had a client last year that was adamant that their losses in organic were not the result of the Penguin update. They believed that it might be due to turning off other traditional and digital campaigns that may have contributed to search volume, or perhaps seasonality or some other factor. Pulling the log files, I was able to layer all of the data from when all of their campaigns were running and show that it was none of those things; rather, Googlebot activity dropped tremendously right after the Penguin update and at the same time as their organic search traffic. The log files made it definitively obvious.

It follows conventionally held SEO wisdom that Googlebot crawls based on the pages that have the highest quality and/or quantity of links pointing to them. In layering the the number of social shares, links, and Googlebot visits for our latest clients, we’re finding that there’s more correlation between social shares and crawl activity than links. In the data below, the section of the site with the most links actually gets crawled the least!

These are important insights that you may just be guessing at without taking the time to dig into your log files.

How log files help you understand AngularJS

Like any other web page or application, every request results in a record in the logs. But depending on how the server is setup, there are a ton of lessons that can come out of it with regard to AngularJS setups, especially if you’re pre-rendering using one of the snapshot technologies.

For one of our clients, we found that oftentimes when the snapshot system needed to refresh its cache, it took too long and timed out. Googlebot understands these as 5XX errors.

This behavior leads to those pages falling out of the index, and over time we saw pages jump back and forth between ranking very highly and disappearing altogether, or another page on the site taking its place.

Additionally, we found that there were many instances wherein Googlebot was being misidentified as a human user. In turn, Googlebot was served the AngularJS live page rather than the HTML snapshot. However, despite the fact that Googlebot was not seeing the HTML snapshots for these pages, these pages were still making it into the index and ranking just fine. So we ended up working with the client on a test to remove the snapshot system on sections of the site, and organic search traffic actually improved.

This is directly in line with what Google is saying in their deprecation announcement of the AJAX Crawling scheme. They are able to access content that is rendered using JavaScript and will index anything that is shown at load.

That’s not to say that HTML snapshot systems are not worth using. The Googlebot behavior for pre-rendered pages is that they tend to be crawled more quickly and more frequently. My best guess is that this is due to the crawl being less computationally expensive for them to execute. All in all, I’d say using HTML snapshots is still the best practice, but definitely not the only way for Google see these types of sites.

According to Google, you shouldn’t serve snapshots just for them, but for the speed enhancements that the user gets as well.

In general, websites shouldn’t pre-render pages only for Google — we expect that you might pre-render pages for performance benefits for users and that you would follow progressive enhancement guidelines. If you pre-render pages, make sure that the content served to Googlebot matches the user’s experience, both how it looks and how it interacts. Serving Googlebot different content than a normal user would see is considered cloaking, and would be against our Webmaster Guidelines.

These are highly technical decisions that have a direct influence on organic search visibility. From my experience in interviewing SEOs to join our team at iPullRank over the last year, very few of them understand these concepts or are capable of diagnosing issues with HTML snapshots. These issues are now commonplace and will only continue to grow as these technologies continue to be adopted.

However, if we’re to serve snapshots to the user too, it begs the question: Why would we use the framework in the first place? Naturally, tech stack decisions are ones that are beyond the scope of just SEO, but you might consider a framework that doesn’t require such an appliance, like MeteorJS.

Alternatively, if you definitely want to stick with Angular, consider Angular 2, which supports the new Angular Universal. Angular Universal serves “isomorphic” JavaScript, which is another way to say that it pre-renders its content on the server side.

Angular 2 has a whole host of improvements over Angular 1.x, but I’ll let these Googlers tell you about them.

Before all of the crazy frameworks reared their confusing heads, Google has had one line of thought about emerging technologies — and that is “progressive enhancement.” With many new IoT devices on the horizon, we should be building websites to serve content for the lowest common denominator of functionality and save the bells and whistles for the devices that can render them.

If you’re starting from scratch, a good approach is to build your site’s structure and navigation using only HTML. Then, once you have the site’s pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses.

In other words, make sure your content is accessible to everyone. Shoutout to Fili Weise for reminding me of that.

Scraping is the fundamental flawed core of SEO analysis

Scraping is fundamental to everything that our SEO tools do. cURL is a library for making and handling HTTP requests. Most popular programming languages have bindings for the library and, as such, most SEO tools leverage the library or something similar to download web pages.

Think of cURL as working similar to downloading a single file from an FTP; in terms of web pages, it doesn’t mean that the page can be viewed in its entirety, because you’re not downloading all of the required files.

This is a fundamental flaw of most SEO software for the very same reason View Source is not a valuable way to view a page’s code anymore. Because there are a number of JavaScript and/or CSS transformations that happen at load, and Google is crawling with headless browsers, you need to look at the Inspect (element) view of the code to get a sense of what Google can actually see.

This is where headless browsing comes into play.

One of the more popular headless browsing libraries is PhantomJS. Many tools outside of the SEO world are written using this library for browser automation. Netflix even has one for scraping and taking screenshots called Sketchy. PhantomJS is built from a rendering engine called QtWebkit, which is to say it’s forked from the same code that Safari (and Chrome before Google forked it into Blink) is based on. While PhantomJS is missing the features of the latest browsers, it has enough features to support most things we need for SEO analysis.

As you can see from the GitHub repository, HTML snapshot software such as Prerender.io is written using this library as well.

PhantomJS has a series of wrapper libraries that make it quite easy to use in a variety of different languages. For those of you interested in using it with NodeJS, check out HorsemanJS.

For those of you that are more familiar with PHP, check out PHP PhantomJS.

A more recent and better qualified addition to the headless browser party is Headless Chromium. As you might have guessed, this is a headless version of the Chrome browser. If I were a betting man, I’d say what we’re looking at here is a some sort of toned-down fork of Googlebot.

To that end, this is probably something that SEO companies should consider when rethinking their own crawling infrastructure in the future, if only for a premium tier of users. If you want to know more about Headless Chrome, check out what Sami Kyostila and Alex Clarke (both Googlers) had to say at BlinkOn 6:

Using in-browser scraping to do what your tools can’t

Although many SEO tools cannot examine the fully rendered DOM, that doesn’t mean that you, as an an individual SEO, have to miss out. Even without leveraging a headless browser, Chrome can be turned into a scraping machine with just a little bit of JavaScript. I’ve talked about this at length in my “How to Scrape Every Single Page on the Web” post. Using a little bit of jQuery, you can effectively select and print anything from a page to the JavaScript Console and then export it to a file in whatever structure you prefer.

Scraping this way allows you to skip a lot of the coding that’s required to make sites believe you’re a real user, like authentication and cookie management that has to happen on the server side. Of course, this way of scraping is good for one-offs rather than building software around.

ArtooJS is a bookmarklet made to support in-browser scraping and automating scraping across a series of pages and saving the results to a file as JSON.

A more fully featured solution for this is the Chrome Extension, WebScraper.io. It requires no code and makes the whole process point-and-click.

How to approach content and linking from the technical context

Much of what SEO has been doing for the past few years has devolved into the creation of more content for more links. I don’t know that adding anything to the discussion around how to scale content or build more links is of value at this point, but I suspect there are some opportunities for existing links and content that are not top-of-mind for many people.

Google Looks at Entities First

Googlers announced recently that they look at entities first when reviewing a query. An entity is Google’s representation of proper nouns in their system to distinguish persons, places, and things, and inform their understanding of natural language. At this point in the talk, I ask people to put their hands up if they have an entity strategy. I’ve given the talk a dozen times at this point and there have only been two people to raise their hands.

Bill Slawski is the foremost thought leader on this topic, so I’m going to defer to his wisdom and encourage you to read:

I would also encourage you to use a natural language processing tool like AlchemyAPI or MonkeyLearn. Better still, use Google’s own Natural Language Processing API to extract entities. The difference between your standard keyword research and entity strategies is that your entity strategy needs to be built from your existing content. So in identifying entities, you’ll want to do your keyword research first and then run those landing pages through an entity extraction tool to see how they line up. You’ll also want to run your competitor landing pages through those same entity extraction APIs to identify what entities are being targeted for those keywords.

TF*IDF

Similarly, Term Frequency/Inverse Document Frequency or TF*IDF is a natural language processing technique that doesn’t get much discussion on this side of the pond. In fact, topic modeling algorithms have been the subject of much-heated debates in the SEO community in the past. The issue of concern is that topic modeling tools have the tendency to push us back towards the Dark Ages of keyword density, rather than considering the idea of creating content that has utility for users. However, in many European countries they swear by TF*IDF (or WDF*IDF — Within Document Frequency/Inverse Document Frequency) as a key technique that drives up organic visibility even without links.

After hanging out in Germany a bit last year, some folks were able to convince me that taking another look at TF*IDF was worth it. So, we did and then we started working it into our content optimization process.

In Searchmetrics’ 2014 study of ranking factors they found that while TF*IDF specifically actually had a negative correlation with visibility, relevant and proof terms have strong positive correlations.

Image via Searchmetrics

Based on their examination of these factors, Searchmetrics made the call to drop TF*IDF from their analysis altogether in 2015 in favor of the proof terms and relevant terms. Year over year the positive correlation holds for those types of terms, albeit not as high.

Images via Searchmetrics

In Moz’s own 2015 ranking factors, we find that LDA and TF*IDF related items remain in the highest on-page content factors.

In effect, no matter what model you look at, the general idea is to use related keywords in your copy in order to rank better for your primary target keyword, because it works.

Now, I can’t say we’ve examined the tactic in isolation, but I can say that the pages that we’ve optimized using TF*IDF have seen bigger jumps in rankings than those without it. While we leverage OnPage.org’s TF*IDF tool, we don’t follow it using hard and fast numerical rules. Instead, we allow the related keywords to influence ideation and then use them as they make sense.

At the very least, this order of technical optimization of content needs to revisited. While you’re at it, you should consider the other tactics that Cyrus Shepard called out as well in order to get more mileage out of your content marketing efforts.

302s vs 301s — seriously?

As of late, a reexamination of the 301 vs. 302 redirect has come back up in the SEO echo chamber. I get the sense that Webmaster Trends Analysts in the public eye either like attention or are just bored, so they’ll issue vague tweets just to see what happens.

For those of you who prefer to do work rather than wait for Gary Illyes to tweet, all I’ve got is some data to share.

Once upon a time, we worked with a large media organization. As is par for the course with these types of organizations, their tech team was resistant to implementing much of our recommendations. Yet they had millions of links both internally and externally pointing to URLs that returned 302 response codes.

After many meetings, and a more compelling business case, the one substantial thing that we were able to convince them to do was switch those 302s into 301s. Nearly overnight there was an increase in rankings in the 1–3 rank zone.

Despite seasonality, there was a jump in organic Search traffic as well.

To reiterate, the only substantial change at this point was the 302 to 301 switch. It resulted in a few million more organic search visits month over month. Granted, this was a year ago, but until someone can show me the same happening or no traffic loss when you switch from 301s to 302s, there’s no discussion for us to have.

Internal linking, the technical approach

Under the PageRank model, it’s an axiom that the flow of link equity through the site is an incredibly important component to examine. Unfortunately, so much of the discussion with clients is only on the external links and not about how to better maximize the link equity that a site already has.

There are a number of tools out there that bring this concept to the forefront. For instance, Searchmetrics calculates and visualizes the flow of link equity throughout the site. This gives you a sense of where you can build internal links to make other pages stronger.

Additionally, Paul Shapiro put together a compelling post on how you can calculate a version of internal PageRank for free using the statistical computing software R.

Either of these approaches is incredibly valuable to offering more visibility to content and very much fall in the bucket of what technical SEO can offer.

Structured data is the future of organic search

The popular one-liner is that Google is looking to become the presentation layer of the web. I say, help them do it!

There has been much discussion about how Google is taking our content and attempting to cut our own websites out of the picture. With the traffic boon that the industry has seen from sites making it into the featured snippet, it’s pretty obvious that, in many cases, there’s more value for you in Google taking your content than in them not.

With Vocal Search appliances on mobile devices and the forthcoming Google Home, there’s only one answer that the user receives. That is to say that the Star Trek computer Google is building is not going to read every result — just one. These answers are fueled by rich cards and featured snippets, which are in turn fueled by structured data.

Google has actually done us a huge favor regarding structured data in updating the specifications that allow JSON-LD. Before this, Schema.org was a matter of making very tedious and specific changes to code with little ROI. Now structured data powers a number of components of the SERP and can simply be placed at the <HEAD> of a document quite easily. Now is the time to revisit implementing the extra markup. Builtvisible’s guide to Structured Data remains the gold standard.

Page speed is still Google’s obsession

Google has very aggressive expectations around page speed, especially for the mobile context. They want the above-the-fold content to load within one second. However, 800 milliseconds of that time is pretty much out of your control.

Image via Google

Based on what you can directly affect, as an SEO, you have 200 milliseconds to make content appear on the screen. A lot of what can be done on-page to influence the speed at which things load is optimizing the page for critical rendering path.

Image via Nianpeng Li

To understand this concept, first we have to take a bit of a step back to get a sense of how browsers construct a web page.

  1. The browser takes the uniform resource locator (URL) that you specify in your address bar and performs a DNS lookup on the domain name.
  2. Once a socket is open and a connection is negotiated, it then asks the server for the HTML of the page you’ve requested.
  3. The browser begins to parse the HTML into the Document Object Model until it encounters CSS, then it starts to parse the CSS into the CSS Object Model.
  4. If at any point it runs into JavaScript, it will pause the DOM and/or CSSOM construction until the JavaScript completes execution, unless it is asynchronous.
  5. Once all of this is complete, the browser constructs the Render Tree, which then builds the layout of the page and finally the elements of the page are painted.

In the Timeline section of Chrome DevTools, you can see the individual operations as they happen and how they contribute to load time. In the timeline at the top, you’ll always see the visualization as mostly yellow because JavaScript execution takes the most time out of any part of page construction. JavaScript causes page construction to halt until the the script execution is complete. This is called “render-blocking” JavaScript.

That term may sound familiar to you because you’ve poked around in PageSpeed Insights looking for answers on how to make improvements and “Eliminate Render-blocking JavaScript” is a common one. The tool is primarily built to support optimization for the Critical Rendering Path. A lot of the recommendations involve issues like sizing resources statically, using asynchronous scripts, and specifying image dimensions.

Additionally, external resources contribute significantly to page load time. For instance, I always see Chartbeat’s library taking 3 or more seconds just to resolve the DNS. These are all things that need to be reviewed when considering how to make a page load faster.

If you know much about the Accelerated Mobile Pages (AMP) specification, a lot of what I just highlighted might sound very familiar to you.

Essentially, AMP exists because Google believes the general public is bad at coding. So they made a subset of HTML and threw a global CDN behind it to make your pages hit the 1 second mark. Personally, I have a strong aversion to AMP, but as many of us predicted at the top of the year, Google has rolled AMP out beyond just the media vertical and into all types of pages in the SERP. The roadmap indicates that there is a lot more coming, so it’s definitely something we should dig into and look to capitalize on.

Using pre-browsing directives to speed things up

To support site speed improvements, most browsers have pre-browsing resource hints. These hints allow you to indicate to the browser that a file will be needed later in the page, so while the components of the browser are idle, it can download or connect to those resources now. Chrome specifically looks to do these things automatically when it can, and may ignore your specification altogether. However, these directives operate much like the rel-canonical tag — you’re more likely to get value out of them than not.

Image via Google

  • Rel-preconnect – This directive allows you to resolve the DNS, initiate the TCP handshake, and negotiate the TLS tunnel between the client and server before you need to. When you don’t do this, these things happen one after another for each resource rather than simultaneously. As the diagram below indicates, in some cases you can shave nearly half a second off just by doing this. Alternatively, if you just want to resolve the DNS in advance, you could use rel-dns-prefetch.

    If you see a lot of idle time in your Timeline in Chrome DevTools, rel-preconnect can help you shave some of that off.

    You can specify rel-preconnect with

    <link rel=”preconnect” href=”https://domain.com”>
    	

    or rel-dns-prefetch with

    <link rel=”dns-prefetch” href=”domain.com”>
    	

  • Rel-prefetch – This directive allows you to download a resource for a page that will be needed in the future. For instance, if you want to pull the stylesheet of the next page or download the HTML for the next page, you can do so by specifying it as
    <link rel=”prefetch” href=”nextpage.html”>
    	
  • Rel-prerender – Not to be confused with the aforementioned Prerender.io, rel-prerender is a directive that allows you to load an entire page and all of its resources in an invisible tab. Once the user clicks a link to go to that URL, the page appears instantly. If the user instead clicks on a link that you did not specify as the rel-prerender, the prerendered page is deleted from memory. You specify the rel-prerender as follows:
    <link rel=”prerender” href=”nextpage.html”>
    	

    I’ve talked about rel-prerender in the past in my post about how I improved our site’s speed 68.35% with one line of code.

    There are a number of caveats that come with rel-prerender, but the most important one is that you can only specify one page at a time and only one rel-prerender can be specified across all Chrome threads. In my post I talk about how to leverage the Google Analytics API to make the best guess at the URL the user is likely going to visit next.

    If you’re using an analytics package that isn’t Google Analytics, or if you have ads on your pages, it will falsely count prerender hits as actual views to the page. What you’ll want to do is wrap any JavaScript that you don’t want to fire until the page is actually in view in the Page Visibility API. Effectively, you’ll only fire analytics or show ads when the page is actually visible.

    Finally, keep in mind that rel-prerender does not work with Firefox, iOS Safari, Opera Mini, or Android’s browser. Not sure why they didn’t get invited to the pre-party, but I wouldn’t recommend using it on a mobile device anyway.

  • Rel-preload and rel-subresource – Following the same pattern as above, rel-preload and rel-subresource allow you to load things within the same page before they are needed. Rel-subresource is Chrome-specific, while rel-preload works for Chrome, Android, and Opera.

Finally, keep in mind that Chrome is sophisticated enough to make attempts at all of these things. Your resource hints help them develop the 100% confidence level to act on them. Chrome is making a series of predictions based on everything you type into the address bar and it keeps track of whether or not it’s making the right predictions to determine what to preconnect and prerender for you. Check out chrome://predictors to see what Chrome has been predicting based on your behavior.

Image via Google

Where does SEO go from here?

Being a strong SEO requires a series of skills that’s difficult for a single person to be great at. For instance, an SEO with strong technical skills may find it difficult to perform effective outreach or vice-versa. Naturally, SEO is already stratified between on- and off-page in that way. However, the technical skill requirement has continued to grow dramatically in the past few years.

There are a number of skills that have always given technical SEOs an unfair advantage, such as web and software development skills or even statistical modeling skills. Perhaps it’s time to officially further stratify technical SEO from traditional content-driven on-page optimizations, since much of the skillset required is more that of a web developer and network administrator than that of what is typically thought of as SEO (at least at this stage in the game). As an industry, we should consider a role of an SEO Engineer, as some organizations already have.

At the very least, the SEO Engineer will need to have a grasp of all of the following to truly capitalize on these technical opportunities:

  • Document Object Model – An understanding of the building blocks of web browsers is fundamental to the understanding how how we front-end developers manipulate the web as they build it.
  • Critical Rendering Path – An understanding of how a browser constructs a page and what goes into the rendering of the page will help with the speed enhancements that Google is more aggressively requiring.
  • Structured Data and Markup – An understanding of how metadata can be specified to influence how Google understands the information being presented.
  • Page Speed – An understanding of the rest of the coding and networking components that impact page load times is the natural next step to getting page speed up. Of course, this is a much bigger deal than SEO, as it impacts the general user experience.
  • Log File Analysis – An understanding of how search engines traverse websites and what they deem as important and accessible is a requirement, especially with the advent of new front-end technologies.
  • SEO for JavaScript Frameworks – An understanding of the implications of leveraging one of the popular frameworks for front-end development, as well as a detailed understanding of how, why, and when an HTML snapshot appliance may be required and what it takes to implement them is critical. Just the other day, Justin Briggs collected most of the knowledge on this topic in one place and broke it down to its components. I encourage you to check it out.
  • Chrome DevTools – An understanding of one of the most the powerful tools in the SEO toolkit, the Chrome web browser itself. Chrome DevTools’ features coupled with a few third-party plugins close the gaps for many things that SEO tools cannot currently analyze. The SEO Engineer needs to be able to build something quick to get the answers to questions that were previously unasked by our industry.
  • Acclerated Mobile Pages & Facebook Instant Pages – If the AMP Roadmap is any indication, Facebook Instant Pages is a similar specification and I suspect it will be difficult for them to continue to exist exclusively.
  • HTTP/2 – An understanding of how this protocol will dramatically change the speed of the web and the SEO implications of migrating from HTTP/1.1.

Let’s Make SEO Great Again

One of the things that always made SEO interesting and its thought leaders so compelling was that we tested, learned, and shared that knowledge so heavily. It seems that that culture of testing and learning was drowned in the content deluge. Perhaps many of those types of folks disappeared as the tactics they knew and loved were swallowed by Google’s zoo animals. Perhaps our continually eroding data makes it more and more difficult to draw strong conclusions.

Whatever the case, right now, there are far fewer people publicly testing and discovering opportunities. We need to demand more from our industry, our tools, our clients, our agencies, and ourselves.

Let’s stop chasing the content train and get back to making experiences that perform.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

More Website Articles

Posted in Latest NewsComments Off

Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking – Whiteboard Friday

Posted by randfish

There’s no doubt that quite a bit has changed about SEO, and that the field is far more integrated with other aspects of online marketing than it once was. In today’s Whiteboard Friday, Rand pushes back against the idea that effective modern SEO doesn’t require any technical expertise, outlining a fantastic list of technical elements that today’s SEOs need to know about in order to be truly effective.

Why Effective, Modern SEO Requires Technical, Creative, and Strategic Thinking - Whiteboard Friday

For reference, here’s a still of this week’s whiteboard. Click on it to open a high resolution image in a new tab!

Video transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week I’m going to do something unusual. I don’t usually point out these inconsistencies or sort of take issue with other folks’ content on the web, because I generally find that that’s not all that valuable and useful. But I’m going to make an exception here.

There is an article by Jayson DeMers, who I think might actually be here in Seattle — maybe he and I can hang out at some point — called “Why Modern SEO Requires Almost No Technical Expertise.” It was an article that got a shocking amount of traction and attention. On Facebook, it has thousands of shares. On LinkedIn, it did really well. On Twitter, it got a bunch of attention.

Some folks in the SEO world have already pointed out some issues around this. But because of the increasing popularity of this article, and because I think there’s, like, this hopefulness from worlds outside of kind of the hardcore SEO world that are looking to this piece and going, “Look, this is great. We don’t have to be technical. We don’t have to worry about technical things in order to do SEO.”

Look, I completely get the appeal of that. I did want to point out some of the reasons why this is not so accurate. At the same time, I don’t want to rain on Jayson, because I think that it’s very possible he’s writing an article for Entrepreneur, maybe he has sort of a commitment to them. Maybe he had no idea that this article was going to spark so much attention and investment. He does make some good points. I think it’s just really the title and then some of the messages inside there that I take strong issue with, and so I wanted to bring those up.

First off, some of the good points he did bring up.

One, he wisely says, “You don’t need to know how to code or to write and read algorithms in order to do SEO.” I totally agree with that. If today you’re looking at SEO and you’re thinking, “Well, am I going to get more into this subject? Am I going to try investing in SEO? But I don’t even know HTML and CSS yet.”

Those are good skills to have, and they will help you in SEO, but you don’t need them. Jayson’s totally right. You don’t have to have them, and you can learn and pick up some of these things, and do searches, watch some Whiteboard Fridays, check out some guides, and pick up a lot of that stuff later on as you need it in your career. SEO doesn’t have that hard requirement.

And secondly, he makes an intelligent point that we’ve made many times here at Moz, which is that, broadly speaking, a better user experience is well correlated with better rankings.

You make a great website that delivers great user experience, that provides the answers to searchers’ questions and gives them extraordinarily good content, way better than what’s out there already in the search results, generally speaking you’re going to see happy searchers, and that’s going to lead to higher rankings.

But not entirely. There are a lot of other elements that go in here. So I’ll bring up some frustrating points around the piece as well.

First off, there’s no acknowledgment — and I find this a little disturbing — that the ability to read and write code, or even HTML and CSS, which I think are the basic place to start, is helpful or can take your SEO efforts to the next level. I think both of those things are true.

So being able to look at a web page, view source on it, or pull up Firebug in Firefox or something and diagnose what’s going on and then go, “Oh, that’s why Google is not able to see this content. That’s why we’re not ranking for this keyword or term, or why even when I enter this exact sentence in quotes into Google, which is on our page, this is why it’s not bringing it up. It’s because it’s loading it after the page from a remote file that Google can’t access.” These are technical things, and being able to see how that code is built, how it’s structured, and what’s going on there, very, very helpful.

Some coding knowledge also can take your SEO efforts even further. I mean, so many times, SEOs are stymied by the conversations that we have with our programmers and our developers and the technical staff on our teams. When we can have those conversations intelligently, because at least we understand the principles of how an if-then statement works, or what software engineering best practices are being used, or they can upload something into a GitHub repository, and we can take a look at it there, that kind of stuff is really helpful.

Secondly, I don’t like that the article overly reduces all of this information that we have about what we’ve learned about Google. So he mentions two sources. One is things that Google tells us, and others are SEO experiments. I think both of those are true. Although I’d add that there’s sort of a sixth sense of knowledge that we gain over time from looking at many, many search results and kind of having this feel for why things rank, and what might be wrong with a site, and getting really good at that using tools and data as well. There are people who can look at Open Site Explorer and then go, “Aha, I bet this is going to happen.” They can look, and 90% of the time they’re right.

So he boils this down to, one, write quality content, and two, reduce your bounce rate. Neither of those things are wrong. You should write quality content, although I’d argue there are lots of other forms of quality content that aren’t necessarily written — video, images and graphics, podcasts, lots of other stuff.

And secondly, that just doing those two things is not always enough. So you can see, like many, many folks look and go, “I have quality content. It has a low bounce rate. How come I don’t rank better?” Well, your competitors, they’re also going to have quality content with a low bounce rate. That’s not a very high bar.

Also, frustratingly, this really gets in my craw. I don’t think “write quality content” means anything. You tell me. When you hear that, to me that is a totally non-actionable, non-useful phrase that’s a piece of advice that is so generic as to be discardable. So I really wish that there was more substance behind that.

The article also makes, in my opinion, the totally inaccurate claim that modern SEO really is reduced to “the happier your users are when they visit your site, the higher you’re going to rank.”

Wow. Okay. Again, I think broadly these things are correlated. User happiness and rank is broadly correlated, but it’s not a one to one. This is not like a, “Oh, well, that’s a 1.0 correlation.”

I would guess that the correlation is probably closer to like the page authority range. I bet it’s like 0.35 or something correlation. If you were to actually measure this broadly across the web and say like, “Hey, were you happier with result one, two, three, four, or five,” the ordering would not be perfect at all. It probably wouldn’t even be close.

There’s a ton of reasons why sometimes someone who ranks on Page 2 or Page 3 or doesn’t rank at all for a query is doing a better piece of content than the person who does rank well or ranks on Page 1, Position 1.

Then the article suggests five and sort of a half steps to successful modern SEO, which I think is a really incomplete list. So Jayson gives us;

  • Good on-site experience
  • Writing good content
  • Getting others to acknowledge you as an authority
  • Rising in social popularity
  • Earning local relevance
  • Dealing with modern CMS systems (which he notes most modern CMS systems are SEO-friendly)

The thing is there’s nothing actually wrong with any of these. They’re all, generally speaking, correct, either directly or indirectly related to SEO. The one about local relevance, I have some issue with, because he doesn’t note that there’s a separate algorithm for sort of how local SEO is done and how Google ranks local sites in maps and in their local search results. Also not noted is that rising in social popularity won’t necessarily directly help your SEO, although it can have indirect and positive benefits.

I feel like this list is super incomplete. Okay, I brainstormed just off the top of my head in the 10 minutes before we filmed this video a list. The list was so long that, as you can see, I filled up the whole whiteboard and then didn’t have any more room. I’m not going to bother to erase and go try and be absolutely complete.

But there’s a huge, huge number of things that are important, critically important for technical SEO. If you don’t know how to do these things, you are sunk in many cases. You can’t be an effective SEO analyst, or consultant, or in-house team member, because you simply can’t diagnose the potential problems, rectify those potential problems, identify strategies that your competitors are using, be able to diagnose a traffic gain or loss. You have to have these skills in order to do that.

I’ll run through these quickly, but really the idea is just that this list is so huge and so long that I think it’s very, very, very wrong to say technical SEO is behind us. I almost feel like the opposite is true.

We have to be able to understand things like;

  • Content rendering and indexability
  • Crawl structure, internal links, JavaScript, Ajax. If something’s post-loading after the page and Google’s not able to index it, or there are links that are accessible via JavaScript or Ajax, maybe Google can’t necessarily see those or isn’t crawling them as effectively, or is crawling them, but isn’t assigning them as much link weight as they might be assigning other stuff, and you’ve made it tough to link to them externally, and so they can’t crawl it.
  • Disabling crawling and/or indexing of thin or incomplete or non-search-targeted content. We have a bunch of search results pages. Should we use rel=prev/next? Should we robots.txt those out? Should we disallow from crawling with meta robots? Should we rel=canonical them to other pages? Should we exclude them via the protocols inside Google Webmaster Tools, which is now Google Search Console?
  • Managing redirects, domain migrations, content updates. A new piece of content comes out, replacing an old piece of content, what do we do with that old piece of content? What’s the best practice? It varies by different things. We have a whole Whiteboard Friday about the different things that you could do with that. What about a big redirect or a domain migration? You buy another company and you’re redirecting their site to your site. You have to understand things about subdomain structures versus subfolders, which, again, we’ve done another Whiteboard Friday about that.
  • Proper error codes, downtime procedures, and not found pages. If your 404 pages turn out to all be 200 pages, well, now you’ve made a big error there, and Google could be crawling tons of 404 pages that they think are real pages, because you’ve made it a status code 200, or you’ve used a 404 code when you should have used a 410, which is a permanently removed, to be able to get it completely out of the indexes, as opposed to having Google revisit it and keep it in the index.

Downtime procedures. So there’s specifically a… I can’t even remember. It’s a 5xx code that you can use. Maybe it was a 503 or something that you can use that’s like, “Revisit later. We’re having some downtime right now.” Google urges you to use that specific code rather than using a 404, which tells them, “This page is now an error.”

Disney had that problem a while ago, if you guys remember, where they 404ed all their pages during an hour of downtime, and then their homepage, when you searched for Disney World, was, like, “Not found.” Oh, jeez, Disney World, not so good.

  • International and multi-language targeting issues. I won’t go into that. But you have to know the protocols there. Duplicate content, syndication, scrapers. How do we handle all that? Somebody else wants to take our content, put it on their site, what should we do? Someone’s scraping our content. What can we do? We have duplicate content on our own site. What should we do?
  • Diagnosing traffic drops via analytics and metrics. Being able to look at a rankings report, being able to look at analytics connecting those up and trying to see: Why did we go up or down? Did we have less pages being indexed, more pages being indexed, more pages getting traffic less, more keywords less?
  • Understanding advanced search parameters. Today, just today, I was checking out the related parameter in Google, which is fascinating for most sites. Well, for Moz, weirdly, related:oursite.com shows nothing. But for virtually every other sit, well, most other sites on the web, it does show some really interesting data, and you can see how Google is connecting up, essentially, intentions and topics from different sites and pages, which can be fascinating, could expose opportunities for links, could expose understanding of how they view your site versus your competition or who they think your competition is.

Then there are tons of parameters, like in URL and in anchor, and da, da, da, da. In anchor doesn’t work anymore, never mind about that one.

I have to go faster, because we’re just going to run out of these. Like, come on. Interpreting and leveraging data in Google Search Console. If you don’t know how to use that, Google could be telling you, you have all sorts of errors, and you don’t know what they are.

  • Leveraging topic modeling and extraction. Using all these cool tools that are coming out for better keyword research and better on-page targeting. I talked about a couple of those at MozCon, like MonkeyLearn. There’s the new Moz Context API, which will be coming out soon, around that. There’s the Alchemy API, which a lot of folks really like and use.
  • Identifying and extracting opportunities based on site crawls. You run a Screaming Frog crawl on your site and you’re going, “Oh, here’s all these problems and issues.” If you don’t have these technical skills, you can’t diagnose that. You can’t figure out what’s wrong. You can’t figure out what needs fixing, what needs addressing.
  • Using rich snippet format to stand out in the SERPs. This is just getting a better click-through rate, which can seriously help your site and obviously your traffic.
  • Applying Google-supported protocols like rel=canonical, meta description, rel=prev/next, hreflang, robots.txt, meta robots, x robots, NOODP, XML sitemaps, rel=nofollow. The list goes on and on and on. If you’re not technical, you don’t know what those are, you think you just need to write good content and lower your bounce rate, it’s not going to work.
  • Using APIs from services like AdWords or MozScape, or hrefs from Majestic, or SEM refs from SearchScape or Alchemy API. Those APIs can have powerful things that they can do for your site. There are some powerful problems they could help you solve if you know how to use them. It’s actually not that hard to write something, even inside a Google Doc or Excel, to pull from an API and get some data in there. There’s a bunch of good tutorials out there. Richard Baxter has one, Annie Cushing has one, I think Distilled has some. So really cool stuff there.
  • Diagnosing page load speed issues, which goes right to what Jayson was talking about. You need that fast-loading page. Well, if you don’t have any technical skills, you can’t figure out why your page might not be loading quickly.
  • Diagnosing mobile friendliness issues
  • Advising app developers on the new protocols around App deep linking, so that you can get the content from your mobile apps into the web search results on mobile devices. Awesome. Super powerful. Potentially crazy powerful, as mobile search is becoming bigger than desktop.

Okay, I’m going to take a deep breath and relax. I don’t know Jayson’s intention, and in fact, if he were in this room, he’d be like, “No, I totally agree with all those things. I wrote the article in a rush. I had no idea it was going to be big. I was just trying to make the broader points around you don’t have to be a coder in order to do SEO.” That’s completely fine.

So I’m not going to try and rain criticism down on him. But I think if you’re reading that article, or you’re seeing it in your feed, or your clients are, or your boss is, or other folks are in your world, maybe you can point them to this Whiteboard Friday and let them know, no, that’s not quite right. There’s a ton of technical SEO that is required in 2015 and will be for years to come, I think, that SEOs have to have in order to be effective at their jobs.

All right, everyone. Look forward to some great comments, and we’ll see you again next time for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

Every Marketer Should Be Technical

Posted by Jamie

There's been a lot of talk of roles like growth hackers, marketing ninjas, and technical marketing in the past year. Regardless of whether or not you subscribe to these labels, technical skills are becoming a requirement for success in online marketing. The marketers who know SQL, can write code, leverage APIs, and perform quantitative analysis will be the most desirable and productive individuals in our industry. Those without these skills will find it increasingly difficult to find ideal career opportunities.

I've prepared this guide as an overview to the technical skills that are most helpful in online marketing and included a directory of resources to help you get there.

Growth hacker, growth cracker; the labels don't matter, but the skills do!

There's been plenty of discussion in the past year on the importance of growth hacking, who is and isn't a growth hacker, and if growth hacking is really just marketing. Although I appreciate this conversation, I think we're overlooking what's most important.

Instead of focusing on what those who do technical marketing call themselves, I'd rather we explore what it means to be technical and help each other develop those skills. Refer to these marketers however you like; what really matters is what we're capable of as professionals.

Can a marketer be technical? Of course. (And developers can be phenomenal marketers, too.)

I started my own career as a developer and slowly became more focused on marketing as the years progressed. I worked as a developer when Netscape Navigator was popular and Yahoo was #1 in search. However, I was a pretty lousy developer, slapping things together with table tags and transparent gifs. I was fortunate enough to keep my job because that's how most of us did web development back then.

I'm actually more technical now as a full-time marketer than I ever was back then because I've been fortunate to continually be exposed to, work with, and do work that requires technical skills. And that's really only because it was a matter of necessity in the organizations I've worked in. So, if an unfocused individual like me can do this, anyone can.

Better examples can be found in the phenomenal marketing and technical skills of individuals like Richard Baxter, Vanessa FoxWil Reynolds, Alex Schultz, Tom Critchlow, or Michelle Robbins. All of these individuals have different stories of how they developed their capabilities, but I'd bet they all share a passion for staying up late, tinkering, and hacking away at their work, with a strong desire to always be developing new skills. (As a side note, I feel so fortunate to work an industry with so many individuals like this, and it's been an absolute pleasure to learn from them.)

Developers can be remarkable marketers, too, and some of the best marketers I've known work as developers first and foremost. The one difference is that a lack of marketing skills is not likely to prevent an engineer from being successful at their work. Marketers, on the other hand, are going to have a much more difficult time doing their work without some semblance of technical skills, which brings me to my next thought:

Generalist/specialists are the new minimum viable professional

For generations, professionals have been pressured to be either a generalist or specialist. The generalists were the managers who oversaw operations, and had a holistic view of how marketing was accomplished, but were less capable of doing the work themselves. Generalists relied upon specialists who knew how to write, design, code, or analyze. And for generations of marketing, this worked just fine.

But, this trend just doesn't cut it anymore. To be successful nowadays, you need have both a breadth and depth of skills. You have to know what to ask for and how it's done. Without both of these capabilities, you're prone to be less efficient than a colleague or competitor who does.

This is especially pronounced in the startup world, where budgets are constrained and companies can't afford to hire both managers and specialists. And this trend explains why the growth hacker meme is so popular in startup communities. You have to be able to do everything to hack it at a startup.

I like refer to these individuals as generalist specialists. These are individuals who have both knowledge of marketing channels, methods, and techniques, but also have the specialist technical knowledge to understand what's possible and what's not, and to do the work themselves.

Know what to ask for, or just do the work yourself

Perhaps my favorite reason to develop these skills is the ability to communicate better with everyone in your organization. If you know what's possible, then you'll know what to ask for when you work with developers, designers, and analysts. And in many cases, you'll be able to just do the work yourself.

What is a technical marketer capable of?

Stated simply, a great technical marketer can devise, develop, launch, and analyze their marketing campaigns with little or no assistance. The example I've prepared below is fictitious, but by no means a panacea. I happen to be using a fictitious marketer at Incase, a company I randomly chose, but whose products I really like.

So, let's take a look at the process and capabilities a technical marketer would use to manage their efforts.

1. Find something to improve

A technical marketer can review their efforts and find and prioritize opportunities for improvement. In this case, our marketer has decided to try to increase repeat purchases.

2. Devise a strategy

From there, they need to determine how they are going to accomplish that.

 

3. Forecast the improvement

The next step is to estimate the efficacy of the campaign to see if it's worth their time and effort. It looks like it is!

4. Pull customer list from database

The marketer would then use SQL to query their database for the appropriate users to generate an email list.

5. Wireframe the email, and write the copy

From there, they would create a simple wireframe and draft the email copy.

6. Design and code the HTML for the email template

Next up is creating the HTML template, first using an image editor like Photoshop, and then developing the HTML and CSS.

7. Instrument end-to-end tracking

The marketer will then ensure that there is end-to-end tracking in place, and likely place a few test orders to confirm it's all working properly.

8. Launch the campaign

It's time to send the campaign and wait for the results. Meanwhile, our fictitious marketer enjoys a bland, but reasonably-priced American beer.

9. Evaluate the results

A few days later, the marketer collects analytics from the various systems, combines them in Excel, and calculates the quantitative impact of the campaign.

10. Automate for ongoing success

The marketer determines the campaign was successful enough to do it each month and develops a script will automate the process.

11. Correlate those that receive email with purchases

Ever the ambitious individual, the marketer then performs some statistical analysis to determine if those who receive email campaigns have a higher propensity to make purchases on the site. 

12. Rinse and repeat

After a successful campaign, the marketer begins all over again, armed with additional experience on what sort of campaigns are successful, and is better prepared to be successful in the future.

What does it take to get there? Here's a recipe to develop your technical skills.

The capabilities demonstrated above show a fictional marketer who is able to run a successful campaign with little or no assistance from others. So, how do you get there? Primarily, by jumping in, trying it out, and learning as you go.

To help you on your way, I've put together a recipe of skills with links to resources. Some resources are better than others, and you can pursue them in any order you'd like. Have better resources than what I've included? Please feel free to contribute them in the comments.

Databases and SQL

Pull your own data. Understand how databases work and create your own.​

Web development

Build web pages and emails. Use JavaScript to add functionality. Utilize server side scripting.

​

Web technology

Understand how HTTP and web servers work. Harness the power of the query string.​

Web design and UX

Pick up some design skills. Give better wireframes to your designers, or design it yourself.​

Copywriting

Learn to write for the web, email, and social marketing channels. Be creative AND pithy.

Analytics

Know how you’re doing. Evaluate performance and determine how to make it better.

Forecasting and Statistics

Predict the future. Create a forecast or budget. Run correlations and regression analyses.​

Technical SEO

Become an SEO-friendly web developer. Use your knowledge of HTTP to fix on-site issues.​

Content platforms and Hosting

Know how to publish your wonderful words and code. Use the right tool for the job.​

E-commerce tech

Learn to accept money graciously. Discover how SSL works, PCI compliance, and industry vendors.​


Many paths, one result: an unstoppable force of capability, limited only by your own creativity.

Developing technical skills isn't about being becoming indispensable; it's about developing capabilities to be self-reliant when necessary and providing signficiant value to your organization. These skills help you not only in doing your own work, but in working with your team and other individuals. In other words, these skills will remain valuable for your entire career.

I'd love to know what you think in the comments. What resources do you like? What have you used to bolster your technical skills?

(Some images provided by Shutterstock.)

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


SEOmoz Daily SEO Blog

Find More Articles

Posted in Latest NewsComments Off

Biting the Bullet of Technical Debt

Posted by MozCTO

Rand has talked about the technical debt  that is impacting our ability to grow and deliver new products. We knew we’d have to bite that bullet at some point, but sometimes it’s not a clean bite…you’ve got to gnaw away at it until you finally break through.

To that end, we created an 18-month roadmap to pay back that technical debt, and have worked out the stepping stones needed for each team to chip away at that proverbial bullet. It’s going to take a lot of hard work and some of our funding to help get us there, with the ultimate goals of giving you, our customers, greater value, enabling further growth, and getting to 99.9% uptime. We’ll update you as we take each step along the way. But for now, take a look at the roadmap as we see it.  

Get to 99.9% Uptime

The first step on the road to success is upgrading system operations. We’re focusing our efforts here on hardening our network infrastructure and increasing system redundancy and monitoring, with the following key goals:

  • Better and redundant equipment: We’re implementing the network at our own co-location facility in a way that allows us to grow and is not as vulnerable to equipment failures. We are also moving off hosted servers, load balancers, and switches in favor of our own equipment. The new equipment is much higher quality, and will be duplicated here in Seattle and at our colocation site in Herndon, Virginia.
  • Rigorous monitoring: I love that we have enthusiastic customers willing to tweet when one of our systems is down, but that is not the normal way to monitor systems! Our system administrators are implementing monitoring not only on our servers, but also on the jobs, queues, and a plethora of other things that keep our service running. Increased monitoring will help us catch problems before the servers go down, and hopefully head off problems like the latest rankings outage before they affect our customers.

The Tech Ops Team

Mark

David J

Stephen Wood.jpg

jacob.jpg

Nicholas Kosuk.jpg

 

Mark
Sr. Director

David
Principal Engr

Stephen
Sys Admin

Jacob
Sys Admin

 Nicholas
Tech Writer

 

rogergray.jpg.jpg

Who

 

 

Fay
Database architect

Dave K
Office Admin

New System/
Network Engineer

New DBA

 

 

The Tech Ops Stepping Stones

Tech Ops Stepping Stones

Deliver Our Largest, Freshest, Most Reliable Index

In parallel to this systems work, we are also working on our applications reliability and scalability.  The Big Data team’s work includes:

  • More reliable data processing: We’re moving our processing out of the cloud and onto our own hardware.
  • Fix things right: We now have the luxury of the time and a little cash in the bank to do things right. We’re not going to cobble together a hack that will get us over the hump today, but will come back to bite us tomorrow.
  • Improve the index: Our goal is to triple our index size and release more frequently, getting back to our May 2012 index size, while also increasing freshness…with the ultimate goal of creating an index every 7-10 working days.

The Big Data Team

Carin Overturf.jpg

Phil

Brandon

Martin

Doug

Dan

Carin
Senior Manager

Phil
Principal Engineer

Brandon
Principal Engineer

Martin
Principal Engineer

Doug
Senior Engineer

Dan
Engineer

Maura

Who

Kenny

Who

David B

 

Maura
Senior Engineer

Sarfraz
TPM

Kenny
Web Dev

Brad K
Senior Engineer

David B.
Engineer

 

 

The Big Data Stepping Stones

Make Everything Bullet-proof

The Production Engineering Team (PE) is knee-deep in the bowels of the production systems: reviewing code, suggesting where new or more hardware could be used, and making things more maintainable and bullet-proof in general. PE has already implemented code changes to our core systems over the last few weeks to address some of the current sticking points. Some of the things this team is working on:

  • New servers: We’re in the process of standing up over 200 new servers.
  • Reducing complexity: We’re reducing the types of databases and queuing systems we run on. We’re picking systems that either we can support or that have dependable support to help us reach our goal of 99.9% uptime. Between data storage/retrieval and queuing, we have 7 (that I know of) different types of systems.  We aim to get down to one queuing system and two or three different database types.

For more information on these recent fixes, check out the blog post Where are My Rankings?

The Production Engineering Team

Shawn

Thomas

David W

Evan

Ben

 Shawn
Senior Manager

Thomas
Senior Engineer

David W. 
Engineer

Evan 
Engineer

Ben
Engineer

Ethel.jpg

shelly

Who

Who

 

Ethel
SDET  

Shelly
TPM

New Ruby Engineer

New Ruby Engineer

 

The Production Engineering Stepping Stones

Net New Development

The Net New Development Team is working on implementing on new product features. Shhhhh!

The Net New Development Team

Walt

Chris

andrew.jpg

Walt
Sr. Software Manager

Chris
TPM

Andrew
SDET

 

Myron

Marty

Patrick

Brandon R

Ben K

Myron
Senior Engineer

Marty
Engineer

Patrick
Engineer

Brandon
Engineer

Ben K.
Engineer

Wes

John

AK

Jason

Koos

Wes
Principal Engineer

John
Senior Engineer

AK
Engineer

Jason
Engineer

Koos
Engineer

New Net Stepping Stones

Top Secret!

Rock the Marketing Website

Inbound Engineering is the team focused on the Marketing website. The team goals are:

  • Create new services: Create the Common Email service, the new Moz Authorization service, and the front end for Q&A.
  • Upgrade billing: Upgrade our billing infrastructure for more reliable payment processing.
  • Upgrade the website: Build additional functionality into the marketing website.

Inbound Engineering Team

Casey

Dudley

Devin

Who

Who

Who

Casey
Senior Web Manager

Dudley
Senior Director

Devin
PHP Engineer

New PHP
Engineer

New PHP
Engineer

New PHP
Engineer

Inbound Stepping Stones

Inbound Engineering

Make Tweets Sing

The Followerwonk team is working on advancing the customer experience and digging deeper into Twitter and what makes Tweets sing.  We’re going to use split-testing to specific goals to measure customer experience, which will help us decide on designs and features that our customers like the best.

Followerwonk Team

Peter

Who

Marc

Who

Peter
Followerwonk Founder

Galen
Software Engineer

Marc
Software Engineer

Amy
TPM

Followerwonk Stepping Stones

Followerwonk Roadmap

Test and Document

In lockstep with these teams, our test and doc folks are adding testing and documentation that will improve quality and communication across the company. These teams are still small, but are already having a big impact. We have already seen an improvement in our last index release, where testing contributed to it going out with no issues.

Test and Docs Team

lisa.jpg

Nicholas Kosuk.jpg

Ethel.jpg

andrew.jpg

Lisa
Technical Writer

Nicholas
Technical Writer

Ethel
SDET

Andrew
SDET

Docs Roadmap

Test Roadmap

Sharing Our Success

As we take each step along our technical roadmap we will share our accomplishments, turning these planned stepping stones green over the next 18 months. As we gnaw away at our technical debt, we hope you’ll start seeing benefits from the changes along the way. Stay tuned!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


SEOmoz Daily SEO Blog

Find More Articles

Posted in Latest NewsComments Off


Advert