Migrating services with zero downtime using EFS and lsyncd


posted | about 5 minutes to read

tags: amazon web services efs lsyncd tutorial system administration

Today, I want to talk about a situation that I found myself in recently. I have a production mail server that I had to migrate recently, as the OS version was getting quite out of date. Unfortunately, when I stood the server up, I didn’t plan on ever migrating - so I set everything up using local storage. Since email goes in and out all the time, my nightly snapshot backups were a terrible option and could easily lead to data loss - so I needed a better solution to get my email to the new server without making it impossible for people to email me and without having the server put mail in a folder that is no longer the “live” mailbox. I designed a solution using lsyncd and Amazon EFS that allowed me to perform this migration with absolutely no downtime or loss of data.

Since I’m living in the AWS cloud already, I had a good idea of how I wanted to handle the file storage in a way that would scale well and still work if I ever needed to migrate the server again - Amazon EFS offers good, scalable storage, and the volume of reads and writes with email isn’t too high where I’d be getting throttled even with very low actual storage usage. I spun up a share, waited a couple minutes for Amazon to create mount points, and added the mount points to a security group that only allowed connections from my mail server’s security group. Once that was done, I went back into the server and mounted the storage as an NFS share. (EFS gives very clear directions on how to set this up, and it’s trivial to translate their instructions into an /etc/fstab entry if that fits your use case.) If you’re not in Amazon’s cloud, there are other options: Azure Files offers a SMB share service which is pretty fully-featured, or you can leverage network storage in your on-prem environment.

How to sync the files to the new network storage - and, more importantly, keep them in sync - was a little more difficult. I thought about it for a while, and realized that rsync kind of fit my use case - the whole idea is to keep two locations in sync. The problem was that rsync was a run-once thing as opposed to something that would constantly keep the directories in sync. That said, the concept was right, so I decided to keep looking at it - and very quickly came across lsyncd, which is literally just daemonized rsync for all intents and purposes - perfect for what I was doing. It even has available packages for most widely used Linux distributions!

That said, the installation wasn’t totally smooth - when I installed, it didn’t generate a default config file in /etc/lsyncd/lsyncd.conf.lua - I had to go look at /etc/init.d/lsyncd to see where the default config file was supposed to go, create it, and then add all of the configuration. To compound the issue, without a config file in place, systemctl start lsyncd doesn’t actually report a failure, so this is definitely something to be mindful of. Here’s what I used, which should work in any environment:

settings {
  logfile = "/var/log/lsyncd/lsyncd.log",
  statusFile = "/var/log/lsyncd/lsyncd-status.log",
  statusInterval = 20
}

sync {
  default.rsync,
  source = "/var/vmail", # Original directory
  target = "/mnt/efs/vmail", # New EFS directory
  delay = 5, # Number of seconds lsyncd will wait since the last change before starting a sync
  rsync = {
    archive = true # Uses the "-a" flag for rsync (preserves file permissions, ACL's, etc).
  }
}

The documentation for lsyncd goes a bit more in depth, but this is a good starting point which I feel is usable for a typical use case. With this in place, running systemctl start lsyncd will immediately start syncing the contents of /var/vmail to /mnt/efs/vmail. The log files should help you determine when everything is in sync.

In the meantime, we’ll need to get the new server built out. You’ll need to make sure the following are done:

I don’t want to get into too much in the way of specifics here since I’m trying to keep the guide pretty general, but in most cases (a mail server is one of these cases), you should be able to start the services, since nothing is using this server yet - all we’re doing is getting ready for the cutover.

The easiest way to perform the cutover, if you’re in AWS, is to just migrate the Elastic IP from the old server to the new server. Anything hitting your old endpoint should now start hitting the new endpoint. Once you see no more connections coming in to your old server, you can shut it down in preparation for decommissioning - or, if you’re not ready to take that step or have other services running on the server, at least shut off lsyncd to be extra safe. If you’re not in AWS, I still think the best option is to migrate the IP rather than updating DNS records or anything similar - that way, you can be sure all the services come over - but you may have some downtime in between taking the IP off the old server and putting it on the new server. As long as you do all of the configuration before cutover, you should be able to limit this to less than 5 seconds, though.

I was able to stop here, but depending on your use case and performance needs, you may now need to migrate the files back to local storage on the new server. You can use lsyncd again to do the same thing, except this time all you have to do is change the paths in your service configuration files and bounce the services when lsyncd is done syncing. Once that’s done, you’ll be all set.