The Facebook crawler is hammering the internet

26 points by jedisct1 3 days ago

jedisct1 3 days ago

Interestingly, they rotate user agents.

A few days ago they were identifying as “FaceBot” (not "FacebookBot").

When that began to be blocked, they switched to reusing the “Facebookexternalhit” user agent they also use for redirects; one people are less likely to block.

dmitrygr 3 days ago

that’s not interesting. That’s borderline fraud. Once someone gets a huge bill they’ll have standing to sue, and should.
- philipallstar 3 days ago
  
  What's "borderline fraud"? Not fraud?

42lux 3 days ago

Meta as a company always acts like the most unprofessional idiots if they feel under pressure. They have all the time and ressources to do it right and they never do.

emot 3 days ago

isn't this crawler to generate previews on Facebook? they have others for training and AI stuff. oh well, one never knows with Meta...

—— The facebookexternalhit/1.1 user agent you're seeing in the logs is a Facebook crawler, specifically used by Facebook’s servers to fetch content (like Open Graph metadata) when:

Someone shares a link on Facebook or Messenger

Facebook needs to generate a preview (title, image, description) for that URL

N19PEDL2 3 days ago

Genuine question: how do they know the bot is from Facebook, apart from what's written in the user agent?

extraduder_ire 3 days ago

They're cropped off to the side, but I assume the IPs making those requests are in a block owned by facebook.

bediger4000 3 days ago

Would a 404 or a 403 be more appropriate? What if you just want Meta crawlers to go away forever?

mdhb 3 days ago

Better yet… 402
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...