Go Binary Optimization Tricks

Alexei Yuzhakov
7 min readMay 7, 2021

If you’ve ever tried to write a Go program, you probably noticed the size of the final binary (it’s huge). Of course, in the age of Gigabit links and Terabyte drives, it is not a big deal. But some situations force you to keep the size as small as possible. At the same time, you prefer to stick with Go instead of switching to C or some other language. So I’m going to explain the possible ways to reduce the size of Go binaries.

Victim Trepanation

To begin with, I want to provide a bit of context. I have a daemon (long-running process) that performs quite simple operations. Similar programs (in the way they work) are DigitalOcean Agent and Amazon CloudWatch Agent, that collect the metrics and send them to central storage. Our daemon is doing a little bit of a different job, but it doesn’t matter.

Some other facts about the daemon:

  • Written in Go (and there is no reason or desire to re-write it in a different language)
  • Installed on a lot of different machines
  • Requires periodic updates

At the beginning of the research, the size of Go-binary was around 11 MB.

Let Me See You Stripped

A compiled binary contains the debug information. In my case, this debug information is not necessary because there is no ability to debug the code on the destination machines anyway (they are out of my control). So we can remove the debug information by compiling the binary with appropriate flags or using the strip utility. The process is called “stripping” and should be familiar to Linux developers (you can find the description of the flags in the output of go tool link):

go build -ldflags "-s -w" ./...

After this procedure, the size of the binary becomes 8.5 MB. So the debug information costs me +30% of the size in my particular case.

Compression

The next trick is to use compression. Of course, we can create a gzipped tar archive and distribute it. The solution works. The destination machines can decompress such archives.

Another way is to use special packers that can decompress and launch the binaries ‘on the fly’. Perhaps the most famous packer is UPX. I first became acquainted with it more than 20 years ago in the epoch of dialup modems and small-as-possible exe-files distribution. And UPX is still alive, has a community, and continues evolving. I missed the point in time when UPX started to work with Go binaries ‘out of the box’, but now it works without any additional steps required. According to the project’s history, it seems like it had happened four years ago, so the approach is rather mature.

Let’s try to pack our binary using UPX:

upx agent

It took 1.5 seconds to compress, and we’ve got a binary which is 3.4 MB in size. Excellent result!

If we check the packer options, we will discover such things as --brute and --ultra-brute. Let’s try the first one:

upx --brute agent

The final size of the binary is 2.6 MB. It is four times less than the initial size of the binary! But the payoff for that is a much longer time for compression. It took 134 seconds. That’s pretty long.

Let’s try the option --ultra-brute:

upx --ultra-brute agent

The size of the binary is the same: 2.6 MB (in fact, it became a little bit smaller: minus 8 Kb). But it took an additional 11 seconds to compress, and the final time was around 145 seconds.

The thought that I want the speed like in the first case and the size like in the last case continued to bother me, and I found another approach:

upx --best --lzma agent

I got a 2.6 MB binary, and it took only 4 seconds!

Fat Dependencies

The modern ecosystems allow the addition of external dependencies easily, and Go is not an exception here. It is rather easy to add some “necessary” dependency to solve a small task, thus increasing the binary size inappropriately.

There is a good but often ignored practice to organize the monitoring of distribution size. If you have a chart that traces the revisions and the distribution sizes, it is very easy to find out the impact of changes on the size of binaries.

In my case, the problem was related to the Sentry integration dependency. If you’ve never heard of Sentry before, it is a service that allows you to collect information about the errors happening in your application. Such types of systems are usually integrated to increase the quality of the application and automatic tracking of errors that occurred during production use. Let’s go back to my problem with fat dependencies. Let’s check the influence of Sentry dependency. We will start with our original binary that was 11 MB in size. Without “stripping”, just by removing the Sentry dependency, the size of the binary fell to 7.8 MB. After “stripping” its size became 6.2 MB. 2 times less than the original one, and even without compression!

Of course, automatic error tracking is a good thing and I want to keep it. But in my case, it’s cheaper and easier to organize a separate HTTP endpoint that will act as a proxy to Sentry (to avoid direct integration).

Compression Once Again

Now we have a binary without fat dependences. So let’s try the compression once again:

upx --best --lzma agent

The final size of the binary is 1.9 MB! I want to remind you that we started with an 11 MB binary.

The penalty for a small size will be the time needed to launch the binary because it needs to decompress it first. Brief checks using the time utility show a 170–180 milliseconds slowdown. In the context of a long-running daemon, it is almost invisible and not a problem at all. But it would help if you keep this aspect in mind.

Alternatives

What else can you do if you want more?

One of the possible solutions to the problem of delivering updates with minimal size is binary patches. For example, Google Chrome utilizes this concept. There are bsdiff/bspatch utilities, which help to organize such processes. In my case, the bspatch utility is absent on the destination machines, so I considered not using the approach for production (at least for now). But preliminary experiments have demonstrated good results in terms of the size of patches (they are relatively small).

Another option (I mentioned it briefly in the beginning) is to choose another programming language. If the smallest size is the main goal, we will end up with C. I’d prefer not to rewrite my code in C because Go brings me more joy.

One more option is gccgo. If the destination machines are similar to each other, we can use this option and get a dynamically linked Go binary. The size of the binary will be small enough.

That is not my case (the destination machines have different OSes), but I did the experiment:

go build -compiler gccgo -gccgoflags "-s -w" ./...

The conditions are not equal to our original experiment (this is another VM and another OS). Anyway, right from the beginning, I got 1.8 MB binary! But dynamically linked. After applying the UPX compression, we’ve got… 284 Kb binary! Awesome! Don’t forget about the similarities of destination machines during the distribution. Otherwise, there is a high probability of getting the following error:

./agent: error while loading shared libraries: libgo.so.16: cannot open shared object file: No such file or directory

TinyGo is one more exotic option. Unfortunately, I can’t compile the current project using it due to a bunch of errors. But I successfully compiled a little bit simpler project without issues. The final binary will be dynamically linked and have a smaller number of dependencies than in the case of gccgo usage (so it will be more portable).

If you have a significant amount of platform-dependent code, build tags can help you to reduce the size. The rules can be more complex than simple naming of files like _windows.go or _linux.go. The profit highly depends on the particular situation. In my case, it is almost absent because there is only one destination platform: Linux x86_64 (Mac and ARM support is just an experiment).

Docker

Sometimes Go binary is distributed as a Docker container. For example, to achieve full isolation from the host system. In my case, there is another daemon that is distributed exactly in this way. So the optimization should be applied for the Docker image as well. There is a quite popular trick with multi-stage build:

FROM golang:1.15 as builderARG CGO_ENABLED=0WORKDIR /appRUN apt-get update && apt-get install -y upxCOPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN make release
FROM scratchCOPY --from=builder /app/server /serverENTRYPOINT ["/server"]

Each phase starts with a FROM directive. In the first phase, all the dependencies exist to prepare the binary. After that, the FROM scratch directive helps us create an empty image. We just copy the prepared binary there and set the entry point for launch.

The command make release consists of go build and upx calls. As a result, we’ve got a 1.5 MB Docker image (the size is a little bit smaller because we are talking about a little bit of a different daemon). If we try to build everything in one phase using a golang image, the size will be huge: 902 MB.

Conclusion

We started with 11 MB binary and ended up with 1.9 MB, so we reduced the size of the binary by six times. Stripping the binary and packing it using UPX is a very efficient way to reduce the size. But don’t forget about fat dependencies. In my case, it played a significant role. If the destination environments are homogeneous, take a look at gccgo.

--

--