Canonicalization is the process of selecting the representative –canonical – URL of a piece of content. Consequently, a canonical URL is the URL of a page that Marfeel chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Marfeel show only one version and consolidated metrics of the otherwise duplicate content.
There are many reasons why a site may have duplicate content:
- Page Technologies: AMP vs Web vs Apps
- Content syndication
- Mobile site. i.e.: m.yoursite.com
Marfeel utilizes the canonical page for two primary purposes:
- As the main source to extract all the editorial metadata trying to create a visual representation on how Google bot sees the page. Marfeel never extracts information directly from AMP, FBIA or native applications. They all must point to a valid canonical link.
- As the url to attribute pageviews and other events
How Marfeel indexes and chooses the canonical URL
When Marfeel crawls a page, it determines the primary content (or centerpiece) of each page following this process:
- Custom
mrf:canonical
. There situations like syndicated content from a third-party site where you usually want to attribute that traffic to your domain. In these situations you can usemrf:canonical
meta property to inform Marfeel to use that URL instead of the canonical one.
<meta property="mrf:canonical" content="https://yourdomain.com/yourcanonical" />
Please note that the use of mrf:canonical will not have any effect on SEO, Google Analytics or any other tool. This is a Marfeel-specific parameter used only for reporting convenience
- By default, Marfeel assigns the traffic to the canonical URL informed on the page as Google recommends.
<link rel="canonical" href="{{my-canonical-url}}"/>
- In case there isn’t a canonical URL, Marfeel will use the current URL without any params
Marfeel does not recommend using this method. This can lead to duplicated URLs, or to aggregate many URLs in one in case you use params.
Canonical URL Considerations
Marfeel periodically crawls the canonical URL of a page based on its last update date. Marfeel tries to re-enforce Google’s best practices on canonicals.
Mandatory canonical urls
Correctly placed canonical meta tags in the form of <link rel="canonical" href="{{my-canonical-url}}"/>
are essential for both SEO and Marfeel’s understanding of a site. If they are missing or incorrect, editorial metadata won’t be correctly assigned to each article.
Valid canonical url
Marfeel always tracks the traffic of all urls in a site whether they have valid structure data or not. URLs are only enriched with the information of their canonicals when the canonical has valid structure data.
Canonical URLs are case sensitive
As Google’s John Mueller clarified, URLs are case sensitive, so it matters whether the characters are uppercase or lowercase.
In case 2 articles declare the same canonical with a upper or lower case variation, Marfeel will consider them as two different unique editorials
https://mydomain.com/path/to/url
https://mydomain.com/path/to/URL
Canonicals must have absolute paths
Use absolute paths rather than relative paths with the rel=“canonical” link element. As disclosed by Google even though relative paths are supported, they can cause problems in the long run (for example, if you unintentionally allow your testing site to be crawled) and thus we don’t support them.
Canonicals can’t be redirections
When the informed canonical of a page returns a 3xx, Marfeel crawlers don’t follow the redirections. 3xx on canonicals are a strong signal that the target of the redirect should become canonical and might be interpreted by Google as an incorrect element or a missconfiguration.
Canonicals can’t have a hash or hashbang
In the same way as Googlebot does, Marfeel ignores the hash in URLs to avoid duplicating URLs.
mydomain.com/hello#world
will becomemydomain.com/hello
AMP and Web canonical recommendations for syndicated content
In case you are using AMP, the proper configuration should be:
- Original content:
- URL:
thirdpartydomain.com/theurl
- canonical:
thirdpartydomain.com/theurl
- Syndicated content in your website, web version:
- URL:
yourdomain.com/yoururl
- canonical:
thirdpartydomain.com/theurl
- mrf:canonical:
yourdomain.com/yoururl
- Syndicated content in your website, AMP version:
- URL:
amp.yourdomain.com/yoururl
- canonical:
yourdomain.com/yoururl
Please note that while this is a widespread method, Google discourages the use of meta canonical as a syndication method. More info here