Marfeel tracker does not send any metadata about the page in every request, it just sends the essential. This allows it to be lightweight and consume a minimal amount of bandwidth. All the rest of data needed is obtained through our crawlers.
Many urls may point to the same content, Marfeel crawlers only crawl canonical urls and their amphtml counterparts. All urls pointing to the same canonical will be stored as aliases.
amphtmllink rel elements are correctly set in all your content for Marfeel crawling to work perfectly.
All Marfeel bots follow the following rules in order to be good web citizens:
- Sites are not proactively crawled to identify new content. Marfeel only crawls urls with active users.
- Marfeel limits the number of concurrent requests to each of our client’s servers. Re-crawls are rate limited to 1000 requests every 5 min
- All assets are centrally cached so different bots may reuse them without having to fetch them separately.
- Redirects are not followed unless necessary.
Marfeel currently uses 3 types of crawlers.
The Marfeel Editorial Crawler crawls a url and builds the editorial profile of a page using its metadata. It crawls urls when they first get a hit and every time the content is modified.
The user agent used by the editorial crawler is:
Mozilla/5.0 (compatible; NewsRoom.BI/0.1; +http://www.newsroom.bi/bot.html)
In order to detect structured data, meta tags and many other potential issues in our client’s HTML, Marfeel periodically crawls all relevant urls (the ones that have traffic) using the following user agents:
Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA51N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Mobile Safari/537.36 (compatible; mrfCompass-Booldog/1.0) Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36(KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 (compatible; mrfCompass-Booldog/1.0) Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; mrfCompass-Marshall/1.0)
mrfCompass-Booldogwill crawl each url initially using a mobile user agent, and if a
vary: User-Agentheader is received in the response, it will crawl it using a desktop user agent as well.
mrfCompass-Marshallwill crawl all amphtml links found by
Flowcards that load content directly from specific urls will also use a bot to fetch mentioned content. This bot identifies itself with the following user agent:
Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA51N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Mobile Safari/537.36 (compatible; mrfCompass-Jukebox/1.0)
The recurrency of the crawling respects the
cache-control header returned.
Social experiences like Facebook, Twitter(X), Telegram, Reddit and LinkedIn will use the following user agent.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 (compatible; mrfCompass-Social/1.0)
These experiences/services use Marfeel’s public IPs when crawiling your site.
Many hosting and CDN providers include WAF services that may consider Marfeel bots to be potentially malicious and block them.
To make sure Marfeel can access and monitor your website, you can either whitelist User Agents mentioned above or whitelist our list of static IPs available here.
If you are using Cloudflare as your CDN provider, you can whitelist Marfeel crawlers’ IPs following these steps:
- On your Cloudflare console, click on the firewall icon on Tools tab.
- List Marfeel’s crawlers IP addresses under the IP Access Rules.
a. Enter the IP address
Whitelistas the action to apply
c. Choose the website where to apply whitelisting rules
- Click add
- Repeat for each IP
All Marfeel Crawler IP addresses offer a reverse DNS lookup pointing to
You can use it to verify Marfeel bots authenticity. You can do it following these steps:
- Run a reverse DNS lookup on the accessing IP address from your logs, using the
- Verify that the domain name is
- Run a forward DNS lookup on the domain name retrieved in step 1 using the
hostcommand on the retrieved domain name.
- Verify that it’s the same as the original accessing IP address from your logs.
$ host 18.104.22.168 22.214.171.124.in-addr.arpa domain name pointer crawler.marfeel.com. $ host crawler.marfeel.com crawler.marfeel.com is an alias for vampiresquid.het.mrf.io. vampiresquid.het.mrf.io has address 126.96.36.199 vampiresquid.het.mrf.io has address 188.8.131.52