Marfeel creates a visual representation of how Googlebot sees your site, based on its structured data. When a user visits a page and triggers an event to Marfeel, the Editorial Crawler crawls the page and detects, extracts, and audits the structured data and extra metadata of the canonical url, such as the title, author, or section it belongs to.
This article lists issues you may find interesting
There might be cases where extracted articles lack editorial information like the title or the author. Instead they show a plain url, such as
https://domain.com/path/to/article in the example below:
There are different situations that can cause the Marfeel Editorial Crawler to fail:
- WAF or Web Application Firewall. The Marfeel Editorial Crawler follows good citizen practices to throttle the number of concurrent requests per site, but if you have a WAF it might block the Marfeel crawlers. Follow these steps to whitelist them.
- URL with a non-existing canonical or without a title or an H1. Marfeel crawls all the information from the informed canonical url. If that fails, then the editorial information won’t be reported correctly.
- Yoast in combination with WPRocket cache plugin in Wordpress. Read more about some known issues with this set up.
- Detection of external sites. If you see domains that you don’t own, you’ll want to review your canonicals strategy.
- Using an article preview in your CMS may activate our SDK for traffic tracking. Keep in mind that if the link is not yet published, our web crawler cannot access or analyze the content. Essential plan users and above will benefit from our crawler’s persistent attempts to access the content, gradually reducing its frequency upon repeated failures. On Free plans, the crawler will halt attempts after 10 consecutive failures.
There might be cases when using reporting that you see listed hosts and urls that you don’t operate.
There are a few known situations when this can happen:
- When your articles specify a canonical outside of your property
- When users use a reverse proxy
- Shared Google Tag Manager across sites
- Audits on referral pages
- Sites copying your content including Marfeel tracking
By default, Marfeel attributes the traffic to the canonical URL informed on the page. If you are using syndicated content from a third-party site you might have to keep their canonical. On Marfeel all the traffic will be classified under the canonical url and domain, which might be different from your main domain.
If you want you can change the attribution using mrf:canonical
There are platforms and tools that allow users to browse sites using a reverse proxy. The users consume your site content from sites like
anonymousviewer.org. These sites get a copy of your site and serve it to their readers. Marfeel JS tracks these sessions and respects the canonical they inform which, in lots of cases, is rewritten to the proxied domain.
In case Marfeel is implemented via Google Tag Manager make sure it’s only active on the desired sites. In multi property GTM instances you might deploy the pixel to multiple properties by mistake.
If that’s the case get in touch with the Marfeel Support and we’ll try to help providing a list of the URLs generating the hits.
The Marfeel Editorial crawler crawls any url with a real user hit. If it’s under the same domain, It also crawls the referral url to provide
Previous pages information.
URLs discovered by the Editorial crawler are then processed by the Audits crawler.
There are publishers that add the Marfeel pixel only on certain folders or urls that live within the main domain of a different publication. i.e. The marfeel pixel is only present on
domainA.com/folder/article but not on
domainA.com. When a user coming from a page like
domainA.com/any/referral recirculates to
domainA.com/folder/article the Marfeel Editorial Crawler will crawl both the article and
domainA.com/any/referral urls. If any audit triggers on the latter, Marfeel will report the issues too even if the pixel is not present on the referral pages.