Timothy Wolodzko - Makefiles for not-only programmers

Make is commonly used in software development for managing the compilation of the source code. Use cases of make however go far beyond compiling C code, as it can be used as a tool for writing any kind of command pipeline. While this may be considered heresy by some, Makefiles can be quite useful as a place to store collection of commands to execute, I agree on this with Peter Baumgartner:

After using Makefiles on a few DS projects, my advice is: forget everything you've read about them, put SHELL := /bin/bash at the top, and just use it as a place to name and document project-specific shell commands.
— Peter Baumgartner (@pmbaumgartner) June 22, 2020

For learning make, there is a great, freely available, book Managing Projects with GNU Make by Robert Mecklenburg and extensive online documentation.

Basics and syntax

To use make, you need to define the instructions in the Makefile. The syntax for the instructions is

target: prerequisites # comment
<TAB> recipe

where usually the target is a compiled, binary file, recipe is the set of instructions needed for generating the file, and prerequisites are the target names of the other instruction that needs to be run before running the current one. The recipe is just a set of shell instructions.

Using tabs in Makefiles is important, for example, if you used spaces instead of tab, you could see errors like:

Makefile:3: *** missing separator.  Stop.

Hello World!

Let’s start with the “Hello World!” example. To run it, you need to install make, and create a file named Makefile, having the following content

hello:
   echo "Hello World!"

To run it, go to the directory containing the Makefile and run make hello command from the command line.

Multi-line instructions

The instructions are not limited to single-line ones, they can consist of any number of tab-prefixed lines.

hello:
   touch hello
   echo "Hello World!" > hello
   cat hello

The example is overly complicated for such a simple task, but it shows how you can define multiple steps to be run sequentially (create an empty file with touch, write “Hello World!” to it using echo, and print it with cat).

Make keeps files up-to-date

Make always checks if the target file is available, and if this is the case, it doesn’t run the instruction to build it. Moreover, if any of the prerequisites are newer than the target, it re-runs the sequence of instructions. This is helpful when compiling source code stored in multiple files since it keeps the compiled binaries up-to-date with each other.

You can try yourself running the Makefile below. What happens when any of the one, two, three, or four files do not exist? What if they differ in the time of creation? Notice that make will print all the instructions executed. At any time, you can run make clean (don’t bother with its syntax for now) to remove all of the files and start from scratch.

.PHONY: clean
clean:
   @ rm -rf one two three four

one two: # this one has two targets!
   touch one
   touch two

three:
   touch three

four: two three
   touch four

The four target depends on three and two, but not one. So make four checks if four file exists, then it recursively checks its dependencies and their dependencies. Missing files or discrepancies in file save dates invoke commands for creating the file, and all the upstream commands.

Phony targets

I said that target is usually a filename, but this doesn’t have to be the case. Let’s again use the trivial Makefile:

hello:
   echo "Hello World!"

If by chance you have the file named hello in the directory containing your Makefile, you would see the following message:

$ touch hello
$ make
make: 'hello' is up to date.

What make did, is checked that the hello file exists, so it doesn’t need to build it. The above functionality is not relevant when using phony targets, i.e. targets without related files. In such cases, make will display the “strange” messages like above. To disable the check for the target file, you can use the .PHONY variable to list such targets:

.PHONY: hello

hello:
   echo "Hello World!"

Using phony targets is not that uncommon, for example, you could add instructions like help, to print the help, or clean to clean the working directory from unnecessary files, test to run unit tests, etc.

Don’t show the commands

When running the instructions, make by default will print the commands that were invoked, for example when running:

hello:
   echo "Hello World!"

we’ll see the following result:

$ make hello
echo "Hello World!"
Hello World!

Printing the command can be silenced by adding @ at the begging of the line:

silent-hello:
   @ echo "Hello World!"

so will only print the result:

$ make silent-hello
Hello World!

Using variables

Makefiles can use variables that can be modified when calling make from the command line. For example, with the following Makefile:

MESSAGE ?= "Hello World!"

hello:
   @ echo $(MESSAGE)

if MESSAGE is not provided, it prints the default:

$ make hello
Hello World!

but we can provide it either from environment:

$ MESSAGE="Hi!!!" make hello
Hi!!!

or as a parameter:

$ make hello MESSAGE="Hi there!"
Hi there!

To set variables, you can use =, := (simple expansion), ?= (set if absent), += (append). The variables are evaluated at the time of calling make, so they do not persist:

$ make hello MESSAGE="Hi"
Hi
$ make hello MESSAGE="Bye"
Bye
$ make hello
Hello World!

This may be even more obvious with another trivial Makefile:

TIME != date

once:
   @ echo $(TIME)

twice:
   @ echo $(TIME)
   @ sleep 5s
   @ echo $(TIME)

Every time you call make once, it will print different times, but calling make twice will print the same time twice, since the TIME variable was evaluated only once per call. The above code uses != to run the right-hand side code and assign it to the left-hand side variable, alternatively, you could use the shell command to execute the external command:

TIME := $(shell date)

If you need to access environment variables, use the ${...} syntax. For example, in Unix system the $USER environment variable holds the username of the currently logged user, so we can access it with:

who:
   @ echo ${USER}

Macros

Besides variables, make supports macros. Macro can be a set of commands, for example:

define commands
   @ echo "Hello!"
   @ echo "It's $(shell date)"
endef

default:
   $(commands)
   @ echo "Bye!"

Another usage may be to “paste” the parameters into a command:

define tz
   --utc \
   '+%Y-%m-%d %H:%M:%s %Z'
endef

utctime:
   date $(tz)

This may be useful if we repeat some commands, or parameters in the code, and do not want to repeat ourselves.

Conditional statements

Make supports conditional statements: ifeq, ifneq, ifdef, and ifndef. The tricky part is that the statements are not indented, so the formatting needs to be:

COND ?= false

default:
ifeq "$(COND)" "true"
   @ echo "It's true"
else
   @ echo "It's false"
endif

Self-documenting the Makefile

It is useful to provide the user with some kind of documentation of what are the functionalities of make. While there is no build-in solution for that, it can be easily achieved with Makefile comments. A simple and useful solution was described in a blog post by François Zaninotto:

hello: ## Say hello
   echo "Hello World!"

help: ## Print help
   @ grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'

As you can see, it assumes that the documentation is prefixed with the ## signs, and prints those lines if your calls make help. In some cases, it might be useful to set help as a default goal, by making it the first instruction, or by setting .DEFAULT_GOAL := help.

Other uses of make

Makefiles are useful for using together with docker, so instead of needing the user to run the necessary commands by hand, you can provide them with ready recipes.

REPO ?= my-repository
TAG ?= my-image-0.1
IMAGE ?= $(REPO):$(TAG)

.PHONY: image push build

build: image push

image:
   docker build -t $(IMAGE) -f Dockerfile .

push:
   docker push $(IMAGE)

Make can be used also for many other tasks like setting up Python environments, running unit tests and linters for the code, making API calls, running Terraform to setup cloud-based architecture, and other tasks that need to be run repeatably, or by different users.

It can be used also for data science projects, where we are interested in building pipelines that download the data, preprocess it, do feature engineering, train the model, validate the results, save them, etc., as described by Zachary M. Jones, Rob J. Hyndman, Mark Sellors, Mike Bostock, Byron J. Smith, Jenny Bryan, and others.

Change the defaults

By default, instructions in Makefile assume using sh as default shell, however /bin/sh is just a symbolic link, that in different systems can point to different shells, so for consistency, it might be worth to change the SHELL variable, e.g. to SHELL=bash.

Some other useful defaults include using “strict” mode in bash .SHELLFLAGS := -eu -o pipefail -c, or forcing make to check the Makefile for unused variables and turning off the automatic rules written for parsing source code files:

MAKEFLAGS += --warn-undefined-variables
MAKEFLAGS += --no-builtin-rules

In newer versions of make you can switch from using tabs for indenting the instructions by setting the .RECIPEPREFIX variable, though it is probably easier to stick to tabs when using make.

If you want to change the default make goal, so that make invokes other than the first recipe in the Makefile, set .DEFAULT_GOAL variable to the name of the desired instruction.