What is canonicalization

Canonicalization is the process of selecting the representative –canonical – URL of a piece of content. Consequently, a canonical URL is the URL of a page that Marfeel chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Marfeel show only one version and consolidated metrics of the otherwise duplicate content.

There are many reasons why a site may have duplicate content:

  • Page Technologies: AMP vs Web vs Apps
  • Content syndication
  • Mobile site. i.e.: m.yoursite.com

Marfeel utilizes the canonical page for two primary purposes:

  1. As the main source to extract all the editorial metadata trying to create a visual representation on how Google bot sees the page. Marfeel never extracts information directly from AMP, FBIA or native applications. They all must point to a valid canonical link.
  2. As the url to attribute pageviews and other events

How Marfeel indexes and chooses the canonical URL

When Marfeel crawls a page, it determines the primary content (or centerpiece) of each page following this process:

  1. Custom mrf:canonical. There situations like syndicated content from a third-party site where you usually want to attribute that traffic to your domain. In these situations you can use mrf:canonical meta property to inform Marfeel to use that URL instead of the canonical one.
<meta property="mrf:canonical" content="https://yourdomain.com/yourcanonical" />
INFO:
Please note that the use of mrf:canonical will not have any effect on SEO, Google Analytics or any other tool. This is a Marfeel-specific parameter used only for reporting convenience
  1. By default, Marfeel assigns the traffic to the canonical URL informed on the page as Google recommends.
<link rel="canonical" href="{{my-canonical-url}}"/>
  1. In case there isn’t a canonical URL, Marfeel will use the current URL without any params
INFO:
Marfeel does not recommend using this method. This can lead to duplicated URLs, or to aggregate many URLs in one in case you use params.

Canonical URL Considerations

Marfeel periodically crawls the canonical URL of a page based on its last update date. Marfeel tries to re-enforce Google’s best practices on canonicals.

Mandatory canonical urls

Correctly placed canonical meta tags in the form of <link rel="canonical" href="{{my-canonical-url}}"/> are essential for both SEO and Marfeel’s understanding of a site. If they are missing or incorrect, editorial metadata won’t be correctly assigned to each article.

Valid canonical url

Marfeel always tracks the traffic of all urls in a site whether they have valid structure data or not. URLs are only enriched with the information of their canonicals when the canonical has valid structure data.

Canonical URLs are case sensitive

As Google’s John Mueller clarified, URLs are case sensitive, so it matters whether the characters are uppercase or lowercase.

In case 2 articles declare the same canonical with a upper or lower case variation, Marfeel will consider them as two different unique editorials

https://mydomain.com/path/to/url
https://mydomain.com/path/to/URL

Canonicals must have absolute paths

Use absolute paths rather than relative paths with the rel=“canonical” link element. As disclosed by Google even though relative paths are supported, they can cause problems in the long run (for example, if you unintentionally allow your testing site to be crawled) and thus we don’t support them.

Canonicals can’t be redirections

When the informed canonical of a page returns a 3xx, Marfeel crawlers don’t follow the redirections. 3xx on canonicals are a strong signal that the target of the redirect should become canonical and might be interpreted by Google as an incorrect element or a missconfiguration.

Canonicals can’t have a hash or hashbang

In the same way as Googlebot does, Marfeel ignores the hash in URLs to avoid duplicating URLs.
mydomain.com/hello#world will becomemydomain.com/hello

AMP and Web canonical recommendations for syndicated content

In case you are using AMP, the proper configuration should be:

  1. Original content:
  • URL: thirdpartydomain.com/theurl
  • canonical: thirdpartydomain.com/theurl
  1. Syndicated content in your website, web version:
  • URL: yourdomain.com/yoururl
  • canonical: thirdpartydomain.com/theurl
  • mrf:canonical: yourdomain.com/yoururl
  1. Syndicated content in your website, AMP version:
  • URL: amp.yourdomain.com/yoururl
  • canonical: yourdomain.com/yoururl
INFO:
Please note that while this is a widespread method, Google discourages the use of meta canonical as a syndication method. More info here

Links of interest

  1. What is canonicalization
  2. How to specify a canonical
  3. Canonicalization troubleshooting