heptadecagram.net

Programming

Effective Use of make

Any C, C++, or Fortran programmer (and likely any remaining Pascal or Modula-2 programmers) have needed to build their programs. The most common tool that exists for this task is make. Many developers dislike make. A large majority of developers do not understand how to use make, which can easily lead to frustration. Understanding that make is a declarative, fourth-generation language is the key to getting the most out of it.

make is a Fourth-Generation Language

The majority of programming languages that a developer interacts with are known as Third-Generation Languages. These are languages that are more ergonomic than assembly, but where the act of computation is still present in the syntax.

int total = 0;
for (int i=0; i < data_len; ++i) {
	if (data[i] % 2 == 0) {
		total += data[i]*data[i];
	}
}
C code to sum up even squares

In the above C code, the programmer must "program the loop". They initialize values to zero. They create an index variable that walks through the range of numbers. Each line of code decomposes into a few assembly instructions.

total = sum(n**2 for n in data if n % 2 == 0)
Python code to sum up even squares

Here, Python straddles the gap between third- and fourth-generation languages. The precise order of instructions is not specified, but the instructions themselves are still quite visible.

A Fourth-Generation Language abstracts away the bookkeeping to optimize a specific domain. Rather than manually tracking details like loop indices and registers, the programmer can express higher-level ideas or operations. This is more than just having a large standard library, it affects the syntax and expressiveness of the language. The algorithm to use would be decided upon by the language runtime!

SELECT SUM(n*n)
FROM data
WHERE n < 20
AND n % 2 = 0;
SQL code to sum up even squares

In SQL, the programmer usually specifies the desired results, and the runtime determines how best to get those results. The database may choose to iterate through an index, or walk the rows in a table, or something else entirely. A programmer can generally inspect what a fourth-generation language is doing, but most of the time the runtime makes a reasonable decision.

How does this apply to make? Well, to start with, these "language generations" are almost entirely discarded and unused in the present. The terms emerged with the rise of so-called third-generation language to distinguish their ease of use with previous languages. But the idea of an evolution of languages was appealing enough that many computer programmers in the 1970s tried to be the next big innovation. This includes make's syntax. Makefile syntax is customized to the domain of resolving file dependencies.

make is Declarative

A declarative language is one where the programmer declares what they want, and the language determines the best way to provide that demand. This is in contrast to an imperative langauge, where the programmer says (in the language) what specifically they want the program to do. make syntax, while it allows the author to write imperative programs to be run in order, is best used when writing as few imperative commands as possible. Instead, a well-written makefile declares relationships between items, and the make program figures out the necessary work.

As a programmer, much of this involves getting out of the way. make already possesses a number of recipes that tell it how to build programs. Let those recipes do the work!

Refactoring a Makefile

final-program.o: final-program.c lib1.h lib2.h
	cc -Wall -c -Wextra final-program.c -o final-program.o

lib1.o: lib1.c lib1.h
	cc -Wall -DPOSIX_SOURCE -c lib1.c -o lib1.o

lib2.o: lib2.c lib2.h lib1.h
	cc -Wall -Wextra -c lib2.c -o lib2.o

final-program: final-program.o lib1.o lib2.o
	cc *.o -lm -o final-program
A typical novice Makefile

For starters, via likely copy-and-paste errors, the author is turning on compiler warnings on only two of the C files. Rather than copy and paste something over and over, a programmer should look for a way to simplify. In this case, these compiler flags can be put into a variable, and then used throughout the Makefile. Since make reads the entire file before running any rules, it doesn't matter where in the file this variable is defined! The common variable to use to store compiler flags is called CFLAGS.

final-program.o: final-program.c lib1.h lib2.h
	cc $(CFLAGS) -c -Wextra final-program.c -o final-program.o

lib1.o: lib1.c lib1.h
	cc $(CFLAGS) -DPOSIX_SOURCE -c lib1.c -o lib1.o

lib2.o: lib2.c lib2.h lib1.h
	cc $(CFLAGS) -c lib2.c -o lib2.o

CFLAGS += -Wall -Wextra

final-program: final-program.o lib1.o lib2.o
	cc *.o -lm -o final-program
Refactored to use CFLAGS

Similarly, the desired compiler is not always the default cc program. By using a standard variable, CC, to hold the name of the compiler, the project becomes much more portable. Any make variable can be overridden by the command-line invocation of make. This allows for a very flexible build; a user could make a debug build by modifying CFLAGS from the command line.

final-program.o: final-program.c lib1.h lib2.h
	$(CC) $(CFLAGS) -c -Wextra final-program.c -o final-program.o

lib1.o: lib1.c lib1.h
	$(CC) $(CFLAGS) -DPOSIX_SOURCE -c lib1.c -o lib1.o

lib2.o: lib2.c lib2.h lib1.h
	$(CC) $(CFLAGS) -c lib2.c -o lib2.o

CFLAGS += -Wall -Wextra

final-program: final-program.o lib1.o lib2.o
	$(CC) *.o -lm -o final-program
Refactored to use CC

The big reveal of Makefiles is that most explicit targets are unnecessary. make comes with a bevy of rules that describe how to build most targets. For instance, if the target requested is make readlib.o, the make program will check the files in the directory and try to find a file that can be transformed via one of the rules into readlib.o! This might mean a Fortran file (readlib.f), a C++ file (readlib.C, or readlib.cpp, or readlib.cc), or some other compilable source language.

These rules are almost always superior to what a programmer would write. They are controllable via make variables, such as CFLAGS, LDLIBS, and others.

For instance, in GNU make, if make target is the command in a directory containing target.c, make invokes the following pattern rule:

target: target.c
	$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) target.c $(LOADLIBES) $(LDLIBS) -o target

It invokes the C compiler, passing in any compiler, preprocessor, and linker flags, compiling the source file and any specified libraries into the requested target. And it can do this without a Makefile! This is built into make!

With this in mind, the Makefile becomes much, much simpler. The only lines in the Makefile become relevant ones, rather than copy-and-paste boilerplate. Notice that only one rule now specifies what shell commands to run.

# The first target in a file is the default target
# Change LDLIBS variable only when this target is involved
final-program: final-program.o lib1.o lib2.o
final-program: LDLIBS += -lm

# Change this variable for all targets
CFLAGS += -Wall -Wextra -Wpedantic

# Change this variable for this target only
lib1.o: CPPFLAGS += -DPOSIX_SOURCE

# This target wants to add a compiler flag to all things compiled,
# then builds the program.
debug: CFLAGS += -g
debug: final-program

clean:
	$(RM) final-program *.o

.PHONY: debug clean
A clear Makefile

Most people are taught how to write Makefiles in the same way they are taught how to code: "Do this, then this, then this." In other words, an imperative flow. "A Makefile consists of a number of rules. A rule is the name of the output file, the input files, and the code to run to make the output file." A novice would then make sure that every output file has a rule explicitly telling make how to build it.

Instead, a Makefile should give guidance through variables, and only offer up a set of steps when actually necessary. A large number of projects could easily be maintained through such simple Makefiles. If a specific compilation unit requires a feature test preprocessor macro, then the variable is changed for just that unit. If certain libraries are needed for the final target, then the variable is changed for that final target. This is more clear, more reproducible, more portable, and more robust in the face of errors than manually twiddling shell invocations.

make is a domain-specific language for managing build dependencies. The fact that it minimizes work by checking for stale binaries is more of an add-on than the main feature. It keeps track of which flags and which order to build dependencies. So what are some best practices for writing such Makefiles?

Best Practices for Makefiles

Put the default target first

Running make with no specified target will build the first target specified in the Makefile. So, specify that desired target first. Makefiles are declarative, so generally the order does not matter (prior targets can be defined after the default target).

Use the right variables

For versions that support it, use an appropriate variable for the problem at hand. Need to add a linker library under GNU make? Reach for the LDLIBS variables, not LDFLAGS. Setting something for the preprocessor? CPPFLAGS is the correct choice. If a command is needed, use the Makefile variable versions of those commands, like $(CC) or $(RM).

Alter variables for specific targets that need them

Rather than have all files compile with -D_POSIX_C_SOURCE, only set it for the targets that actually need the feature test macro. This keeps each individual build small and compact, and easier to debug when things go wrong.

Alter variables for all targets when simpler

Every compilation probably wants a number of warnings enabled. So, set the compiler flags to enable those warnings globally, across all targets. If a project has a majority of builds requiring a feature test macro, then it should probably be global, too!

Avoid manual shell commands at all costs

The spaghettification of Makefiles can largely be laid at the feet of manual shell commands. Nearly every target imaginable can be constructed using the built-in rules, so use them! Some targets (like clean) still need manual shell lines, but nearly any build artifact does not.