[SOLVED] 403 from requests to Stack Overflow?

I’m pulling a StackOverflow RSS feed fine locally but I get 403 with the same code running in my Glitch. Any ideas?

Example URL: https://stackoverflow.com/feeds/tag?sort=newest&tagnames=ipfs

Hey @autonome I seem to be able to curl that url from the console of a Glitch project without any difficulty; can you share some more details or your project’s name?

The project could potentially be on a banned AWS IP/host? (Yes, Glitch runs on AWS) It’s likely a problem with that project as I’m able to use curl https://stackoverflow.com/feeds/tag?sort=newest&tagnames=ipfs in the console (like @cori) and receive correct data. It could even just be how your sending the request.

Thanks all! Yeah it feels like a banned IP.

Here’s code example:

Ok, this is interesting…

For the same URL:

  1. In my browser I get download of the XML file for the feed
  2. Locally in my node.js code I get the XML of the feed
  3. In Glitch running my node.js code I get 403 FORBIDDEN
  4. In Glitch console using curl I get a 404 html page

SOLVED!

All I needed to get the correct response was to set a user agent header.

It could be anything. I put “fibblebonkers” and it worked fine. No user agent, get a 403.

2 Likes

I do hope you keep “fibblebonkers” as your User Agent. :joy::joy::joy:

Yeah, a user-agent that has a significant length is needed. I answered a similar question over on Meta.SE here. It is worth mentioning that SE has extra tips on what they expect crawlers to do as Jeff Atwood wrote here basically:

  • Use GZIP requests
  • Identify yourself.
  • Use the right formats.
  • Be considerate.