What is canonicalization and how Marfeel selects canonical URLs

Canonicalization is the process of selecting the representative, canonical, URL of a piece of content. A canonical URL is the URL that Marfeel chose as the most representative from a set of duplicate pages. Often called deduplication, this process ensures Marfeel shows only one version and consolidated metrics of otherwise duplicate content.

There are many reasons why a site may have duplicate content:

  • Page technologies: AMP vs Web vs Apps
  • Content syndication
  • Mobile site. i.e.: m.yoursite.com

Marfeel utilizes the canonical page for two primary purposes:

  1. As the main source to extract all the editorial metadata, creating a visual representation of how Google bot sees the page. Marfeel never extracts information directly from AMP, FBIA or native applications. They all must point to a valid canonical link.
  2. As the URL to attribute pageviews and other events

How Marfeel indexes and chooses the canonical URL

When Marfeel crawls a page, it determines the primary content (or centerpiece) of each page following a three-step priority order.

  1. Custom mrf:canonical. For situations like syndicated content from a third-party site where you want to attribute that traffic to your domain, you can use the mrf:canonical meta property to instruct Marfeel to use that URL instead of the canonical one.
<meta property="mrf:canonical" content="https://yourdomain.com/yourcanonical" />

You can also use the name attribute as an alternative to the property attribute.

INFO:
Please note that the use of mrf:canonical will not have any effect on SEO, Google Analytics or any other tool. This is a Marfeel-specific parameter used only for reporting convenience
  1. Standard rel=canonical. By default, Marfeel assigns the traffic to the canonical URL informed on the page as Google recommends.
<link rel="canonical" href="{{my-canonical-url}}"/>
  1. Fallback to the current URL. If there is no canonical URL, Marfeel will use the current URL without any params.
INFO:
Marfeel does not recommend using this method. This can lead to duplicated URLs, or to aggregate many URLs in one in case you use params.

Canonical URL considerations

Marfeel periodically crawls the canonical URL of a page based on its last update date. Marfeel tries to re-enforce Google’s best practices on canonicals.

Mandatory canonical URLs

Correctly placed canonical meta tags in the form of <link rel="canonical" href="{{my-canonical-url}}"/> are essential for both SEO and Marfeel’s understanding of a site. If they are missing or incorrect, editorial metadata won’t be correctly assigned to each article.

Valid canonical URL

Marfeel always tracks the traffic of all URLs in a site whether they have valid structured data or not. URLs are only enriched with the information of their canonicals when the canonical has valid structured data.

Canonical URLs are case sensitive

As Google’s John Mueller clarified, URLs are case sensitive, so it matters whether the characters are uppercase or lowercase.

If two articles declare the same canonical with an upper or lower case variation, Marfeel will consider them as two different unique editorials.

https://mydomain.com/path/to/url
https://mydomain.com/path/to/URL

Canonicals must have absolute paths

Use absolute paths rather than relative paths with the rel=“canonical” link element. As disclosed by Google, even though relative paths are supported, they can cause problems in the long run (for example, if you unintentionally allow your testing site to be crawled) and thus Marfeel does not support them.

Canonicals cannot be redirections

When the informed canonical of a page returns a 3xx, Marfeel crawlers do not follow the redirections. 3xx on canonicals are a strong signal that the target of the redirect should become canonical and might be interpreted by Google as an incorrect element or a misconfiguration.

Canonicals cannot have a hash or hashbang

In the same way as Googlebot does, Marfeel ignores the hash in URLs to avoid duplicating URLs.
mydomain.com/hello#world will become mydomain.com/hello

AMP and Web canonical recommendations for syndicated content

If you are using AMP, the proper configuration should be:

  1. Original content:
  • URL: thirdpartydomain.com/theurl
  • canonical: thirdpartydomain.com/theurl
  1. Syndicated content in your website, web version:
  • URL: yourdomain.com/yoururl
  • canonical: thirdpartydomain.com/theurl
  • mrf:canonical: yourdomain.com/yoururl
  1. Syndicated content in your website, AMP version:
  • URL: amp.yourdomain.com/yoururl
  • canonical: yourdomain.com/yoururl
INFO:
Please note that while this is a widespread method, Google discourages the use of meta canonical as a syndication method. More info here

Links of interest

  1. What is canonicalization
  2. How to specify a canonical
  3. Canonicalization troubleshooting
What is canonicalization in Marfeel?

Canonicalization is the process of selecting the representative (canonical) URL of a piece of content. Marfeel uses the canonical URL to extract editorial metadata and to attribute pageviews and other events, consolidating metrics for duplicate pages into a single version.

How does Marfeel determine the canonical URL of a page?

Marfeel follows a three-step priority: first it checks for a custom mrf:canonical meta tag, then it uses the standard rel=canonical link element as recommended by Google, and finally, if neither is present, it falls back to the current URL stripped of query parameters.

What are common canonical URL mistakes that affect Marfeel metrics?

Common mistakes include using relative paths instead of absolute URLs, pointing canonicals to redirecting URLs (3xx responses), including hash fragments in canonical URLs, and case mismatches between canonical declarations. All of these can cause duplicated or missing editorial metadata.