How we implemented container shell access on ECS

At Simply Business, our developers often need terminal emulation access to production-like environments. In the DevOps team, we’re keen to support this but also need to cater for the security implications, as well as the need to audit activity on interactive container sessions. In this post, we share how in the DevOps team, we’ve addressed these needs by creating a custom solution for container shell access on ECS Exec, integrated with our in-house CI/CD system built with Github Actions and Codebuild.

Recently the Simply Business DevOps team has been looking into replacing our existing solution – butlerx/wetty, which we use to grant shell access for developers to Docker containers running in our ECS cluster. Just to make it clear, this post is not a philosophical discussion on shell access to Docker containers. There’s plenty of material out there if you’re looking to forge an opinion on that. Rather, it’s about how we’re handling very legitimate use cases for granting shell access to containers, given that turning off our existing solution isn’t really an option.

Why build a container shell access system?

We’ve been using butlerx/wetty for several years, and although it has served us well, we were looking to improve on a number of issues identified with the existing solution, including tight coupling to production deployment pipelines, logging and observability, security and costs.

In designing a replacement, it was also important that the new solution integrated well with two other systems that we’ve introduced to improve our deployment pipeline and logging needs, namely our custom-built CI/CD platform Huxley, which uses Github Actions and Codebuild to replace our previous system on Jenkins, and Elastic SIEM, an information security logging system.

As chance would have it, AWS had enough customer feedback from the developer community wanting something similar to K8’s kubectl exec for accessing ECS containers that they released ECS Exec.

After some investigation, it turned out that ECS Exec looked like it might offer us the types of improvement we were looking for, particularly in regard to security. With ECS Exec, we could handle all authorisation through Identity and Access Management (IAM), and it could also be set up to use attribute-based access control (ABAC) authorisation, and secure access to services and resources via IAM policies that fit with our existing IAM authorisation model, all on a per-teams access basis. By granting authorisation through IAM, we’d be able to log all activity and calls in CloudTrail. Tick!

One of the security requirements for our container shell access system was that the containers providing shell access should not be the same containers used for production workload computation. So the DevOps team decided to provision a new container on ECS as a one-off task with sleep running under the initialisation process of the container, that would provide shell access until sleep ended. This implementation also solved a cost issue we had with the initial solution we’d proposed, which required an ECS service and an application load balancer (ALB) to be provisioned to run and provide access to the shell.

However, this implementation presented another issue with ECS Exec’s logging, whereby logs would be lost if the sleep timeout was reached and the container was deprovisioned before any user who was logged into the shell had exited that shell.


Wilson – waiting for cats with pipes

After some thinking around alternative ideas on how to improve on this, we ended up writing a small program in Go, which we’ve named Wilson. The main goal of implementing Wilson was to capture the standard i/o of the shell sessions and log it to standard output of the container.

We’ve implemented this in Wilson as follows.

  • Wilson starts a process in the container replacing the sleep we initially had, implemented as a ticker.
  • Wilson takes control of the configuration of the /root/.bashrc, which is run when a user starts an interactive session. In this configuration, when a user starts a session, a fifo (named pipe) is provisioned on the file system for that login and a script is started through a call to exec to write to this fifo.
  • Wilson sits and monitors the /proc directory for these logins and will then start a child process to cat from the fifo.
  • The output of this cat is then piped to an internal buffer and sent to stdout for that container.

In our case, these logs are sent to CloudWatch and then on to SIEM, but would also work with other log forwarders for containers. The solution we’ve created in Wilson gives us keystroke-level logging for container shell sessions in real-time. If any of the configuration is tampered with, Wilson will exit 1 and stop the container.

Beyond the mentioned *nix primitives, what we’ve implemented in Wilson doesn’t require any further dependencies to work with ECS Exec, as script is already a requirement of the AWS native logging solution for ECS Exec.

At the time of writing, we’re working on open sourcing Wilson. Look out for future updates, and check out our other open source project – Version Forget-Me-Not, on GitHub Marketplace.\

Ready to start your career at Simply Business?

Want to know more about what it’s like to work in tech at Simply Business? Read about our approach to tech, then check out our current vacancies.

Andy McKay

This block is configured using JavaScript. A preview is not available in the editor.