A few days ago they were identifying as “FaceBot” (not "FacebookBot").
When that began to be blocked, they switched to reusing the “Facebookexternalhit” user agent they also use for redirects; one people are less likely to block.
Meta as a company always acts like the most unprofessional idiots if they feel under pressure. They have all the time and ressources to do it right and they never do.
isn't this crawler to generate previews on Facebook? they have others for training and AI stuff. oh well, one never knows with Meta...
——
The facebookexternalhit/1.1 user agent you're seeing in the logs is a Facebook crawler, specifically used by Facebook’s servers to fetch content (like Open Graph metadata) when:
Someone shares a link on Facebook or Messenger
Facebook needs to generate a preview (title, image, description) for that URL
Interestingly, they rotate user agents.
A few days ago they were identifying as “FaceBot” (not "FacebookBot").
When that began to be blocked, they switched to reusing the “Facebookexternalhit” user agent they also use for redirects; one people are less likely to block.
that’s not interesting. That’s borderline fraud. Once someone gets a huge bill they’ll have standing to sue, and should.
What's "borderline fraud"? Not fraud?
Meta as a company always acts like the most unprofessional idiots if they feel under pressure. They have all the time and ressources to do it right and they never do.
isn't this crawler to generate previews on Facebook? they have others for training and AI stuff. oh well, one never knows with Meta...
—— The facebookexternalhit/1.1 user agent you're seeing in the logs is a Facebook crawler, specifically used by Facebook’s servers to fetch content (like Open Graph metadata) when:
Someone shares a link on Facebook or Messenger
Facebook needs to generate a preview (title, image, description) for that URL
Genuine question: how do they know the bot is from Facebook, apart from what's written in the user agent?
They're cropped off to the side, but I assume the IPs making those requests are in a block owned by facebook.
Would a 404 or a 403 be more appropriate? What if you just want Meta crawlers to go away forever?
Better yet… 402
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...