I’m pulling a StackOverflow RSS feed fine locally but I get 403 with the same code running in my Glitch. Any ideas?
Hey @autonome I seem to be able to
curl that url from the console of a Glitch project without any difficulty; can you share some more details or your project’s name?
The project could potentially be on a banned AWS IP/host? (Yes, Glitch runs on AWS) It’s likely a problem with that project as I’m able to use
curl https://stackoverflow.com/feeds/tag?sort=newest&tagnames=ipfs in the console (like @cori) and receive correct data. It could even just be how your sending the request.
Thanks all! Yeah it feels like a banned IP.
Here’s code example:
Ok, this is interesting…
For the same URL:
- In my browser I get download of the XML file for the feed
- Locally in my node.js code I get the XML of the feed
- In Glitch running my node.js code I get 403 FORBIDDEN
- In Glitch console using curl I get a 404 html page
All I needed to get the correct response was to set a user agent header.
It could be anything. I put “fibblebonkers” and it worked fine. No user agent, get a 403.
I do hope you keep “fibblebonkers” as your User Agent.
Yeah, a user-agent that has a significant length is needed. I answered a similar question over on Meta.SE here. It is worth mentioning that SE has extra tips on what they expect crawlers to do as Jeff Atwood wrote here basically:
- Use GZIP requests
- Identify yourself.
- Use the right formats.
- Be considerate.