Sustainability Starts at the Codebase

Posted on 2024-08-04

Being Green

As Solution Architects in the modern world, we are charged with the responsibility of ensuring our solutions are sustainable, green, or environmentally friendly.

Doing so means that the data centres we use place less demand on the world's precious resources for manufacturing the hardware that our solutions run on.

A typical area of focus is the use of "green" data centres. These green data centres often use energy-efficient cooling solutions, make use of renewable energy sources and may well have a hardware recycling/re-purposing scheme.

It's not too difficult to achieve the use of green data centres in the cloud, the big three cloud providers all have "green" credentials ^[1] ^[2] ^[3].

Another area of focus is the efficient utilisation of physical resource through lifecycle management (assessment, maintenance, upgrades), resource optimisation (deploying infrastructure that meets workload requirements, rather than exceeds them) and virtualisation.

This second area of focus is where I'd like to spend some time with you today.

There's a plethora of tools which promise to help you optimise your infrastructure use. These tools usually work by spending some time monitoring the CPU and Memory usage of your applications and infrastructure utilisation before reporting back where you can reduce the compute.

Actioning the reported information results in a more energy and computationally efficient system (i.e. through a reduction of wasted CPU cycles) and, if you're in the cloud, a more cost efficient system (i.e. through paying for less CPUs and Memory). Rock and roll! Job done....right?

Well, if you've just completed a cloud migration with a load of monolithic applications, this might be the only thing you can do for now. But there's more that can be done.

A Scenario

Let's form a scenario where you have an existing monolithic application, written in Java, running in a Virtual Machine (VM) and are looking to break it up into smaller, containerised applications and deploy them to a Kubernetes cluster.

You and your team have already gone through defining your application boundaries and followed the domain driven design principals to the letter...there's a plan.

Naturally, as Java aficionados, you've selected Spring Boot as your framework of choice for the back-end API work. Hold the phone for just a moment - there's an opportunity here to revisit the programming language used to craft your application's new roots.

I'm going to be picking on Java a little here, as I've seen it used extensively in an enterprise application developer environment - and it allows me to draw on my experiences.

"Why should I reconsider the programming language used?" I'm glad you asked. Let's revisit the [abridged] Software Development Life Cycle (SDLC) a little...

Developer writes code
- Maybe pulling in additional third party dependencies, which may well be acquired via an internal artifact repository
Developer pushes code to a remote code repository and creates a Pull Request (PR)
Continuous Integration (CI) pipeline kicks in
- Builds application
  - Pulls third-party dependencies in the process
- Tests application
- Pushes application build output to an artifact repository
  - Vulnerability scans run here
- Containerises application
- Pushes application container to a container registry, ready for deployment
  - Vulnerability scans also run here

Assuming all stages of the CI job passed and completed successfully, the developer's PR can be merged and the updated application is then ready for deployment.

Let’s break down the infrastructure used to complete this single cycle:

Step	Infrastructure	Notes
Developer writes code	Storage, CPU, Memory	For pulling dependencies from the self-hosted artifact repository
Developer pushes code	Storage, CPU, Memory	Only if using a self-hosted source code repository
CI pipeline kicks in	CPU, Memory	Assuming self-hosted pipeline runners - billed in time if not
Pushing to artifact repository	Storage
Pushing to container registry	Storage

You can see that I’ve listed storage as key infrastructure here. Whilst it isn’t as expensive to purchase as computational resource, efficient use of storage means that less storage drives need to be provisioned in the data centre.

Another layer here is network. The source code, all of the application dependencies, the final application binary and the containerised application get transmitted over the network.

The size of data being transmitted at each stage of the CI process has bandwidth demands, which drives how much network infrastructure is required and used within the data centre - and how much bandwidth is available for other processes running simultaneously on the same network.

With our scenario in mind, the size of the application containers also impacts how quickly they can be retrieved from a container registry by a Kubernetes cluster at deployment time. This will have repercussions on the first startup time of your application.

Hopefully, you can see how optimising the storage space your applications demand reverberates through your infrastructure.

In My Experience...

Back to the question: "Why should I reconsider the programming language used?"

At a previous job, I was generously granted some time to rewrite one of our Java REST APIs in GoLang. The driver for this activity was to compare the performance characteristics of the two languages - and to appease my nagging. We'll focus on the impact the experiment had on the infrastructure here though:

Language	Number of Dependencies	Binary Size	Container Size
Java (Spring Boot)	142	43MB	130MB ^[4]
GoLang	3	8.3MB	9.72MB

Take a moment to consider how dramatic the difference in infrastructure usage these numbers imply.

Firstly, there are fewer third-party dependencies in the GoLang version because the application could be built almost completely using GoLang's standard library - only reaching out to a third party for a database driver. This leads to a reduction in caching requirements for the artifact repository and also reduces the surface area for vulnerabilities in the codebase.

The GoLang binary size is over four times smaller than the Java equivalent. Again, this reduces the utilisation of the artifact repository and demands less bandwidth from the network in the transferring of the application binary.

Finally, the GoLang application container is dramatically smaller than the Java application container. This reduces the storage capacity requirements for the container registry and reduces deployment time. Again, there's less demand on the network in transferring the containerised application to and from the container registry.

Before this post becomes an advertisement for GoLang, I will quickly report that:

Application binary and container had zero vulnerabilities
API response times reduced from multiple seconds to microseconds
Each instance of the application could handle significantly more customer requests before having to scale out horizontally
Scaling out was much faster because application start-up was near-instant (apposed to 20-30 seconds for Spring Boot)
The application used a fraction of the CPU and Memory at runtime

The two versions of the application provided the exact same functionality - they were simply written in two different programming languages.

Whilst I've largely focussed on the impact of storage and network usage so far, it's important to note that the GoLang application also used less computational resource - CPU and Memory. This means that you can fit more applications into the same size of compute infrastructure, or look to reduce the size of that compute infrastructure.

I bet you can guess which version of the application made it through to production...and what language we adopted for new work.

Imagine how different your infrastructure landscape would look after taking some time to evaluate the choice of programming language used in your applications. Maybe the components would be the same but they'd certainly be smaller in size. You'd be reducing the stress on your data centre, doing your bit for the planet and saving some money - all at the same time!

Going Forwards

Choosing a programming language is significantly easier to do when in a greenfield environment - where you're creating new things from scratch.

For brownfield environments, where you have existing applications, you could do something similar to me - take a small application and rewrite it as an experiment. Alternatively, the next time a new piece of development work presents itself, use that as an opportunity to assess how the application should be crafted.

Whilst a new language can involve a learning process for your development teams, it's also a growth opportunity for them. A chance to expand their skillset and look at their work from a fresh perspective.

Matthias Endler and Jeremy Soller discuss the introduction of a new programming language in depth on this episode (about 1 hour in) of the wonderful "Rust In Production" Podcast - it's clearly focused on introducing the Rust programming language into an organisation but the concepts apply to any language you may be considering.

Have the conversation with your development team. Whilst a different programming language isn't always viable, there may be options with your current programming language. With Java, you can reduce your application container size with jlink, or making use of GraalVM native images.

To Conclude

Carefully considering the codebase has a rippling effect throughout your solution and can lead to more efficient use of your infrastructure. Efficient use of your infrastructure leads to a reduction in hardware needed in the data centres you use, and places less demand on the world's precious resources which are used to manufacture that hardware.

Azure Sustainability ↩
Amazon Web Services Sustainability ↩
Google Cloud Sustainability ↩
The Java container requires the Java Runtime Environment for the application to run. We managed to reduce the size of the container image by over 100Mb through arduous use of jlink. ↩