Building a comment system for a static site, part 3


posted | about 11 minutes to read

tags: amazon web services comments hugo tutorial database design web development dynamodb

This post is part of a series:
  1. Building a comment system for a static site, part 1
  2. Building a comment system for a static site, part 2
  3. Building a comment system for a static site, part 3

Sorry for the wait on this one! Life got in the way for a while, and it also turned out that writing this was a bit more of a lift than I thought it was going to be. When we left off, we had a very basic implementation of a comment system. Now, we need to take that implementation and fill in the gaps.

Going back to the original “nice to haves” from the first post, the outstanding items were as follows (I’ve paraphrased a bit):

We’re going to save the replies thing for last because it’s probably the most difficult, in my opinion, and requires the most work. This kind of mirrors my own approach towards this stuff - I tend to try and hit the low hanging fruit in the nice to haves first after the core features are done. The lowest hanging fruit here is probably comment sanitization, just because it’s a solved problem; there are numerous Node packages out there to strip tags from comments. I used the xss package because it was really straightforward and flexible. Right after parsing the comment body in the POST Lambda, I added the following:

body.comment_text = xss(body.comment_text, {
  whiteList: {
    u: [],
    em: [],
    strong: [],
    i: [],
    b: [],
    pre: [],
    code: [],
    kbd: [],
    a: ['href', 'title']
  },
  stripIgnoreTag: true,
  stripIgnoreTagBody: ['script', 'style']
}).substring(0,1000).replace(/(\n\s*){2,}/g, '<br/></br>');
body.author = xss(body.author, {stripIgnoreTagBody: true}).substring(0,20);

This does a few different things to help with comment formatting. First, it strips out all HTML tags not explicitly whitelisted - you’ll note I am mostly just allowing basic text formatting and links. In addition to removing any other tags, I’m also removing the content of anything enclosed in <script> or <style> tags, as anything that someone’s putting in tags like that is likely to be malicious on some level. Second, I’m limiting comment length to 1000 characters to prevent spam with excessively long comments, and replacing long stretches of whitespace with 2 <br/> tags, to eliminate excessive whitespace. I’m also removing any tags from the author field. This gives us safe, reasonably-formatted comments. Pretty simple, right?

Next up, let’s take a look at anti-spam. It’s fairly simple to integrate something like akismet or cleantalk if you want to use a managed service, but I decided to take a different route and simply make all commenters manually verify that they were humans by having them click a confirmation link sent to their email. I do this by leveraging Amazon SES, which is fairly easy to use especially when you’re already using the AWS SDK as we are in this project. That said, this is probably the part where I’d say we’re going to leave the “shallow end” of this exercise, so I’d definitely recommend taking a few deep breaths before jumping into the rest of this post.

First step here is to jump into the AWS console and set up your domain in SES. This will require you to add some entries to your DNS zone, and you’ll also want to make sure that SES is added to your domain’s SPF record (Amazon provides instructions on how to do this during setup). You’ll also need to add the ses:SendEmail action for your domain to your POST Lambda execution role policy in IAM (easiest way is probably with Terraform, as referenced in my previous post). Once this is done, you can jump back to your code.

After you’ve added an email field to your comment submission form (this should be trivial), we’ll start out by modifying our POST function. First step, let’s install the uuid/v4 package and then include it using const uuidv4 = require('uuid/v4');. We’ll use this to generate an approval_uuid field that we’ll include as part of the write to the database, e.g.:

var approval_uuid = uuidv4();
var ts = Date.now();
var params = {
  TableName: 'comments',
  Item: {
    'author': body.author,
    'text': body.comment_text,
    'ts': ts,
    'post_uid': body.uid,
    'sortKey': ts,
    'approval_uuid': approval_uuid
  }
};

Then, we’ll actually need to send the email. I wrote a separate function for this:

function sendEmail(approval_uuid, email, done) {
  var ses = new AWS.SES();
  var params = {
    Destination: {
      ToAddresses: [
        email
      ]
    },
    Message: {
      Body: {
        Text: {
          Data: 'Your comment on <Domain> requires email verification before appearing on the website. Use the following link: https://<API Gateway URL>/approve?uid=' + approval_uuid,
          Charset: 'UTF-8'
        }
      },
      Subject: {
        Data: 'Approve Comment - <Domain>',
        Charset: 'UTF-8'
      }
    },
    Source: from
  };
  ses.sendEmail(params, done);
}

This should be pretty easy to read. All we do is set the properties of the email, send it to the address that was sent in via POST, and include a URL with the approval UUID. The neat thing here, of course, is that we don’t even have to store the email address anywhere because it doesn’t matter; as long as that link gets clicked, we’re golden (once we write the approval processing function).

Next step - we only want to display the comments that don’t have an approval_uuid. This will need to be done in the GET function, so let’s head over there. First, we’ll need to add an async-compatible foreach function (credit to Atinux for this one!):

async function asyncForEach(array, callback) {
  for (let index = 0; index < array.length; index++) {
    await callback(array[index], index, array);
  }
}

Then, we’ll use this after we get our comments back from Dynamo, e.g.:

// ...
var comments = await retrieveComments(params);
await asyncForEach(comments, async(comment => {
  if (typeof comment.approval_uuid === "undefined") { //If the approval_uuid is not present, then the comment has been approved.
    approvedComments.push(comment);
  }
});
var response = {
  "isBase64Encoded": false,
  "headers": {"Content-Type": "application/json", "Access-Control-Allow-Origin": origin},
  "statusCode": 200,
  "body": JSON.stringify(approvedComments)
};
// ...

Last up, we need to figure out how to actually process an approval. We’ll need another Lambda function for this - mapped to that “/approve” path that I referenced above in the POST stuff. We’ll also need to make a minor change to our DynamoDB table to allow us to look up records based on the approval UUID, so let’s start there.

DynamoDB allows us to do these things called “secondary indexes”. What that means is essentially even if we have a sort key and primary key already set on a table, we can define additional fields by which we can query a table. By creating a new global secondary index that uses the approval_uuid field as its partition key, we’re able to leverage this in a new JavaScript function. Here’s the Terraform for a new secondary index followed by the new Javascript approval function:

resource "aws_dynamodb_table " "comments_table" {
  # ...
  global_secondary_index {
    name = "approval_uuid-index",
    hash_key = "approval_uuid",
    projection_type = "INCLUDE",
    non_key_attributes = ['post_uid', 'sortKey']
  }
}
var AWS = require('aws-sdk');

exports.handler = function(event,context,callback) {
  var dynamoClient = new AWS.DynamoDB.DocumentClient()
  var params = {
    TableName: "comments",
    IndexName: "approval_uuid-index", //This is how we reference the index created in DynamoDB, letting us query based on the approval_uuid field
    KeyConditionExpression: "approval_uuid = :uuid",
    ExpressionAttributeValues: {
      ":uuid": event.queryStringParameters.uid
    }
  };

  dynamoClient.query(params, function(err, data) {
    if(err) {
      console.error('Query failed: ', JSON.stringify(err, null, 2));
    } else {
      data.Items.forEach(function(comment) { // there's only one result - it's a UUID, and a temporary one besides - but a foreach is still the easiest way to write this
        var params = {
          TableName: 'comments',
          Key: {
            "post_uid": comment.post_uid,
            "sortKey": comment.sortKey
          },
          UpdateExpression: "remove approval_uuid", //Taking out the approval_uuid field means that it'll show up in future GET requests to retrieve comments
          ReturnValues: "ALL_NEW"
        };
        dynamoClient.update(params, function(err, data) {
          if (err) {
            console.error('Update failed: ', JSON.stringify(err, null, 2));
          } else {
            var response = {
              "isBase64Encoded": false,
              "headers": {"Content-Type": "text/html"},
              "statusCode": 200,
              "body": '<html><head><title>Comment Approved - ajl.io</title></head><body>Approval successful! This window can now be closed.</body></html>'
            };
            callback(err, response); // Returns an HTML response so the person who clicked the link isn't left hanging
          }
        });
      });
    }
  });
};

Now, simply give this function read/write permissions on your DynamoDB table, hook it up to API Gateway as described in the previous post, and you’re all set for comment approval! Simple, right?

Last - and probably most complex - it’s time to look at making the comments more of a “conversation” than just a series of posts. I thought about doing something with @ mentions (like Twitter or Slack) but ended up settling on parent-child comments. I figured with @ mentions, I’d have to store stuff like email addresses to provide notifications, and that’s something I really didn’t want to do - that, and I find threaded conversations easier to follow. Your mileage may vary. This lets us explore some flexibility with our DynamoDB stuff anyway - composite sort keys and reuse of fields - so it’ll be fun! Let’s get started.

To begin implementation here, we need to think about how we need to evolve our data a little bit to accommodate threading. Looking at a parent-child relationship, we’ll need to store either the parent post (for top-level comments) or the parent comment ID (for child comments) as a “parent” field. We’ll still partition the table by post ID, but onwe we have this additional data. We can just use this field for either parent post ID or parent comment ID, since it’s not like these IDs are sequential/numeric or anything. That does mean, though, that we actually need to start storing comment ID’s since now they’ll matter. Since we already had the uuid/v4 package imported, I just generated a new one and inserted it as comment_id for each new comment. Then, we can return the comment ID as part of each parent comment, and reference that on child comments as part of the POST.

One thing we will have to do, though, is modify our sort key. Since now we have two different things to sort on, we can use something called “composite sort keys” to retrieve either parent or child comments. The way to do this: modify the sortKey field to store both the timestamp and the parent field, e.g.:

if(typeof body.parent_comment === 'undefined' || body.parent_comment === null) {
  params.sortKey: body.uid + '#' + ts + '#' + comment_id;
} else {
  params.sortKey: body.parent_comment + '#' + ts + '#' + comment_id;
}

Then you can retrieve things using the beginsWith query operator in your KeyCondition Expression: post_uid = :uid and begins_with(sortKey, :uid) for parent comments, or post_uid = :uid and begins_with(sortKey, :parent) for child comments. We can still sort by timestamp since that’s the second field in the sortKey.

This is also where the async and await stuff in the GET function becomes more important: we need to make absolutely sure that all the data exists in our result set before we return it to the page. You’ll want to keep this pattern going through the entire function and all of your database calls. From there, though, it’s a simple matter of adding it to your frontend, e.g.:

$.each(comments.rootComments, function(i, commentObject) { //rootComments contains comments with no parents on this slug, sorted by date descending
  var commentDate = new Date(commentObject.ts);
  var formattedDate = "posted " + commentDate.toLocaleDateString('default', {year: 'numeric', month: 'short', day: 'numeric', hour: 'numeric', minute: '2-digit'});
  comments_text += '<div class="conversation card card-body mb-1" data-id="' + commentObject.id + '"><div class="parent-comment" data-id="' + commentObject.id + '"><div class="comment-author">' + commentObject.author + '</div><div class="comment-body">' + commentObject.text + '</div><div class="comment-timestamp text-muted">' + formattedDate + '</div><div class="comment-reply"><a href="#reply- ' + commentObject.id + '" class="reply-link">reply</a></div></div>'; //css classes
  if(typeof commentObject.children !== "undefined" && commentObject.children.length > 0) {
    comments_text += '<div class="child-comments card card-body bg-light mb-1">';
    $.each(commentObject.children, function(i2, childComment) { // contains comments, array by parent comment, sorted by date ascending
      var commentDate = new Date(childComment.ts);
      var formattedDate = "posted " + commentDate.toLocaleDateString('default', {year: 'numeric', month: 'short', day: 'numeric', hour: 'numeric', minute: '2-digit'});
      comments_text += '<div class="child-comment" data-id="' + childComment.id + '"><div class="comment-author">' + childComment.author + '</div><div class="comment-body">' + childComment.text + '</div><div class="comment-timestamp">' + formattedDate + '</div></div>';
    });
    comments_text += '</div>';//child
  }
  comments_text += '<a name="reply-' + commentObject.id + '"></a><div class="comment-reply-form"></div></div>'; // /conversation
});

I then added a Javascript onClick event to the .reply-links to generate bespoke comment forms for each parent comment - if you didn’t want to do this, you could simply pregenerate the reply forms and include them in the initial comment display frontend JavaScript. Totally up to you.

That pretty much does it - the end of the tutorial for a fully functional Lambda-driven comment system. Plenty of other stuff you could do here - a good thing to try yourself might be to write some code to notify you via email every time someone comments on your blog, or to implement cross-origin restrictions leveraging event.headers.Origin so you can only load the comments from your own domain. I’ve made the repository as I use it on my site available on GitHub as well (licensed under Apache 2.0), so please don’t hesitate to work off of that if necessary - I specifically left some of the code a little bit vague in this section because discussing how to modify existing code is a bit more wordy of a proposition, and the provided samples in the GitHub repository reflect a working state with all of the above features implemented, including IAM policies, Lambda function, DynamoDB database design, and actually a basic implementation of the cross-origin request filtering that I mentioned. If you need additional help with the frontend code, simply check out the end of the source code of this page for a working example using vanilla JavaScript. I hope you’ve found it helpful, and by all means feel free to leave a comment below if this helped you or if you’ve got any questions.