How would I go about getting blog / tumblr posts via Node.js?

I’m building a little bot that pulls in all the text posts from a tumblr blog, picks one at random, and returns it to my chatbot that called the command.

I’ve got the chatbot all sorted, and can return whatever text I want from the node.js app, but as I’m still quite new to this kind of programming, I am well and trully flummoxed as to how I would get all the text posts from a tumblr blog. I tried looking into it in many different ways, but there’s so much I really don’t understand about how to do it.

  • Twitch keeps talking about its API and how you need to submit an entire application to them to get oauth codes, but I’m only trying to read posts, not actually post or modify anything, and the blog is public
  • I found you can put /api/read at the end of the tumblr url to get the xml document containing the latest 20 posts (and you can add a ?start=2 to the end to offset and get the next 20 etc), but have no idea how to get or parse an xml from the link like that, let alone if it’s viable to do several times per offset required PER command
  • various confusing bits of info about javascript not being able to get stuff from another domain due to the same origin policy, unless you use JSONP (I don’t even fully understand what the difference between Javascript, JSON and JSONP is X_X )

Any help or guidance AT ALL would be SUPER appreciated!! <3 Sorry for what I presume to be very noob-y questions too.

Hi HedgeWizardly, welcome to the forum!

The part of your post that I can help with is fetching XML from Tumblr.

Here’s what I’ve done:

  • Started a new glitch-mvp-node
  • Added ‘node-fetch’ and ‘fast-xml-parser’ packages
  • Made the default route do a fetch to a tumblr url (https://jennschiffer.tumblr.com/api/read)
  • Parsed the XML into JSON data using fast-xml-parser
  • Sent it to the frontend

This is just a demo, so it uses a webpage whereas you’re making a bot, but you can adapt as necessary.

Here’s the key piece of code:

// Our main GET home page route, pulls from src/pages/index.hbs
fastify.get("/", function(request, reply) {
  const url = "https://jennschiffer.tumblr.com/api/read";
  fetch(url)
    .then(res => res.text())
    .then(xmlData => parser.parse(xmlData))
    .then(json => {
      //console.log(json.tumblr.posts.post);
      // I have just stringified these so I can display them in the HTML:
      let posts = json.tumblr.posts.post.map(p => JSON.stringify(p, null, '\t'));
      // We have to respond inside the promise handler (.then())
      // so that the data is available at the right time
      let params = {
        posts: posts
      };
      reply.view("/src/pages/index.hbs", params);
    });
});

Remember the requirements at the top of the file:

const fetch = require("node-fetch");
const parser = require("fast-xml-parser");

…And here is the project: https://glitch.com/edit/#!/tumblr-xml-api-json

I would personally pull down the XML in one shot, store it somewhere, perhaps by saving the JSON as a file in your .data directory, and then re-load it from there, so that you aren’t making lots of slow network calls to the Tumblr API.

In your bot, instead of building params and sending them to the frontend, you could build up your bot response and send that. It has to happen inside the chain of then calls in the promise, which is where you have access to the data that came back from Tumblr API.

As for JSON/JSONP and CORS, that generally affects calls made from the browser to an API; in your bot code you should be ok to ignore that stuff.

Hope this helps a bit!! Come back with questions and I’m sure the community will try to help :slight_smile:

6 Likes

Thank you so much, SteGriff!! This is so kind of you and so so much help!

I’m a little ashamed to admit that after looking and poking things for a number of hours, and trying to figure out how to do what you suggested, I’m still stumped as to how I can access the posts from that point, let alone how to save it into a .data directory.

Sorry to continue to be a pain! Also, I’m not 100% familiar with how this stuff works… is it that the whole server.js runs when an external source tries to access it? I’ve done a couple of basic text response twitch bots in glitch, but this is way above that, so far.

Here is my project: Glitch :・゚✧

1 Like

No shame in that!

So, what’s happened is that you’ve grabbed a bit too much of the piece of code I mentioned; around line 53:

client.on('message', (channel, tags, message, self) => {
	if(self) return;
	if(message.toLowerCase().startsWith('!beefact')) {
    
    // Our main GET home page route, pulls from src/pages/index.hbs
fastify.get("/", function(request, reply) {
  const url = "https://bee-facts-official.tumblr.com/api/read";
  fetch(url)
    .then(res => res.text())
    .then(xmlData => parser.parse(xmlData))
  
  
    .then(json => {
      //console.log(json.tumblr.posts.post);
      let posts = json.tumblr.posts.post.map(p => JSON.stringify(p, null, '\t'));
      // We have to respond inside the promise handler (.then())
      // so that the data is available at the right time
    
    
        console.log('testing')
      let params = {
        posts: posts
        
      };
    
    
    
      //console.log(channel, typeof reply);
      //reply.view("/src/pages/index.hbs", params);
    })

The fastify.get bit isn’t relevant to you, that line of code says

“Hey fastify, when someone makes a call to root (/), do this…”

but you’re already inside a block of code that says

“Hey bot client, when someone sends a message, do this”

Anyway, long story short, get rid of the fastify bit, and simplify down to this:

if(message.toLowerCase().startsWith('!beefact')) {
  const url = "https://bee-facts-official.tumblr.com/api/read";
  fetch(url)
    .then(res => res.text())
    .then(xmlData => parser.parse(xmlData))
    .then(json => {
      //console.log(json.tumblr.posts.post);
      let posts = json.tumblr.posts.post.map(p => JSON.stringify(p, null, '\t'));
      // We have to respond inside the promise handler (.then())
      // so that the data is available at the right time
      console.log('testing')
	  client.say(channel, posts);
    });
}

this will dump out an ugly JSON string of all posts into the twitch reply.

Forget about caching the XML to file to start with. An engineer’s trick is to get the simplest thing working, then add extensions and features later :slight_smile:

2 Likes

A quick follow up after looking at your source data:

Add the he package (“html entities”) in package.json and use

const he = require('he');

Then to get the text of your facts:

  let posts = json.tumblr.posts.post
    .filter(p => p["regular-body"] != null)
    .map(p => he.decode(p["regular-body"]));

Updated project to show the output:
https://tumblr-xml-api-json.glitch.me/

Then your remaining challenge is picking one at random, which has been covered before on this forum: Need help with discordjs bot - #2 by 123HD123

2 Likes

Thank you so so much, SteGriff! You’re a miracle and a scholar. I won’t be able to give this a try until after work, but it sounds like you’ve helped me bring me out of the big scary confusing bit, and back down into territory I’m more familiar with. I appreciate that so much! :slight_smile:

Will keep you updated with my success (or failure)

Cheers!

2 Likes

Sorry! What kind of object / format will I be looking at with the posts variable?

Using that “client.say(channel,posts)” (which I put inside a try catch to debug) returns an error;
TypeError: message.startsWith is not a function

I’m coming from C# and visual basic so I was half expecting an array or list of sorts, but can’t seem to figure it out, and as “typof” just tells me it’s an object, I’ve been unable to figure out what to even google ^^’

(Thank you again for all this!)

Yes, it’s essentially an array of string, where each string is some HTML

Sounds like TMI.js expects the parameter to be a string (startsWith is a method of string in JS), so pass one post instead of the whole collection

Console.log is your friend also :slightly_smiling_face:

2 Likes

I thought I’d tried treating it as an array before actually, and failed, but I tried again after what you said and it turns out I was trying to use foreach rather than forEach…!

But that’s given me a big console log output of EXACTLY what I need to get started on familiar territory, so thank you so much again, you’re a lifesaver! <3

1 Like

Whew, I see the Glitch community is still second to none when it comes to accessibility. I wish I knew about this place a few years back when I was struggling with getting the hang of js.

And @HedgeWizardly, most people are confused by one aspect of JS or another but everyone is befuddled by Node in the beginning. It does get easier, so long as you keep asking questions.

4 Likes

Thank you very much, that’s super kind of you! :slight_smile:

Just while I’m still here… Any idea why the he.decode function in SteGriff’s code might not be converting &rsquo; and &ldquo; into ’ and " respectively?

I was going to do a simple string replace on those things, but wanted to check if there was a better way, and one that makes sure there aren’t any other HTML objects that get missed out!

EDIT: Probably not the most elegant solution but I just ran the decode a second time before I passed the string where it needed to go and that seems to work…! x)

1 Like

Weird… could be a library bug

:grin::grin::grin:

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.