Programming
Effective Use of make
Any C, C++, or Fortran programmer (and likely any remaining Pascal or Modula-2 programmers) have needed to build their programs.
The most common tool that exists for this task is make
.
Many developers dislike make.
A large majority of developers do not understand how to use make
, which can easily lead to frustration.
Understanding that make
is a declarative, fourth-generation language is the key to getting the most out of it.
make
is a Fourth-Generation Language
The majority of programming languages that a developer interacts with are known as Third-Generation Languages. These are languages that are more ergonomic than assembly, but where the act of computation is still present in the syntax.
int total = 0;
for (int i=0; i < data_len; ++i) {
if (data[i] % 2 == 0) {
total += data[i]*data[i];
}
}
In the above C code, the programmer must "program the loop". They initialize values to zero. They create an index variable that walks through the range of numbers. Each line of code decomposes into a few assembly instructions.
total = sum(n**2 for n in data if n % 2 == 0)
Here, Python straddles the gap between third- and fourth-generation languages. The precise order of instructions is not specified, but the instructions themselves are still quite visible.
A Fourth-Generation Language abstracts away the bookkeeping to optimize a specific domain. Rather than manually tracking details like loop indices and registers, the programmer can express higher-level ideas or operations. This is more than just having a large standard library, it affects the syntax and expressiveness of the language. The algorithm to use would be decided upon by the language runtime!
SELECT SUM(n*n)
FROM data
WHERE n < 20
AND n % 2 = 0;
In SQL, the programmer usually specifies the desired results, and the runtime determines how best to get those results. The database may choose to iterate through an index, or walk the rows in a table, or something else entirely. A programmer can generally inspect what a fourth-generation language is doing, but most of the time the runtime makes a reasonable decision.
How does this apply to make
?
Well, to start with, these "language generations" are almost entirely discarded and unused in the present.
The terms emerged with the rise of so-called third-generation language to distinguish their ease of use with previous languages.
But the idea of an evolution of languages was appealing enough that many computer programmers in the 1970s tried to be the next big innovation.
This includes make
's syntax.
Makefile syntax is customized to the domain of resolving file dependencies.
make
is Declarative
A declarative language is one where the programmer declares what they want, and the language determines the best way to provide that demand.
This is in contrast to an imperative langauge, where the programmer says (in the language) what specifically they want the program to do.
make
syntax, while it allows the author to write imperative programs to be run in order, is best used when writing as few imperative commands as possible.
Instead, a well-written makefile declares relationships between items, and the make
program figures out the necessary work.
As a programmer, much of this involves getting out of the way.
make
already possesses a number of recipes that tell it how to build programs.
Let those recipes do the work!
Refactoring a Makefile
final-program.o: final-program.c lib1.h lib2.h
cc -Wall -c -Wextra final-program.c -o final-program.o
lib1.o: lib1.c lib1.h
cc -Wall -DPOSIX_SOURCE -c lib1.c -o lib1.o
lib2.o: lib2.c lib2.h lib1.h
cc -Wall -Wextra -c lib2.c -o lib2.o
final-program: final-program.o lib1.o lib2.o
cc *.o -lm -o final-program
For starters, via likely copy-and-paste errors, the author is turning on compiler warnings on only two of the C files.
Rather than copy and paste something over and over, a programmer should look for a way to simplify.
In this case, these compiler flags can be put into a variable, and then used throughout the Makefile.
Since make
reads the entire file before running any rules, it doesn't matter where in the file this variable is defined!
The common variable to use to store compiler flags is called CFLAGS
.
final-program.o: final-program.c lib1.h lib2.h
cc $(CFLAGS) -c -Wextra final-program.c -o final-program.o
lib1.o: lib1.c lib1.h
cc $(CFLAGS) -DPOSIX_SOURCE -c lib1.c -o lib1.o
lib2.o: lib2.c lib2.h lib1.h
cc $(CFLAGS) -c lib2.c -o lib2.o
CFLAGS += -Wall -Wextra
final-program: final-program.o lib1.o lib2.o
cc *.o -lm -o final-program
CFLAGS
Similarly, the desired compiler is not always the default cc
program.
By using a standard variable, CC
, to hold the name of the compiler, the project becomes much more portable.
Any make
variable can be overridden by the command-line invocation of make
.
This allows for a very flexible build; a user could make a debug build by modifying CFLAGS
from the command line.
final-program.o: final-program.c lib1.h lib2.h
$(CC) $(CFLAGS) -c -Wextra final-program.c -o final-program.o
lib1.o: lib1.c lib1.h
$(CC) $(CFLAGS) -DPOSIX_SOURCE -c lib1.c -o lib1.o
lib2.o: lib2.c lib2.h lib1.h
$(CC) $(CFLAGS) -c lib2.c -o lib2.o
CFLAGS += -Wall -Wextra
final-program: final-program.o lib1.o lib2.o
$(CC) *.o -lm -o final-program
CC
The big reveal of Makefiles is that most explicit targets are unnecessary.
make
comes with a bevy of rules that describe how to build most targets.
For instance, if the target requested is make readlib.o, the make
program will check the files in the directory and try to find a file that can be transformed via one of the rules into readlib.o
!
This might mean a Fortran file (readlib.f
), a C++ file (readlib.C
, or readlib.cpp
, or readlib.cc
), or some other compilable source language.
These rules are almost always superior to what a programmer would write.
They are controllable via make
variables, such as CFLAGS
, LDLIBS
, and others.
For instance, in GNU make
, if make target is the command in a directory containing target.c
, make
invokes the following pattern rule:
target: target.c
$(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(TARGET_ARCH) target.c $(LOADLIBES) $(LDLIBS) -o target
It invokes the C compiler, passing in any compiler, preprocessor, and linker flags, compiling the source file and any specified libraries into the requested target.
And it can do this without a Makefile!
This is built into make
!
With this in mind, the Makefile becomes much, much simpler. The only lines in the Makefile become relevant ones, rather than copy-and-paste boilerplate. Notice that only one rule now specifies what shell commands to run.
# The first target in a file is the default target
# Change LDLIBS variable only when this target is involved
final-program: final-program.o lib1.o lib2.o
final-program: LDLIBS += -lm
# Change this variable for all targets
CFLAGS += -Wall -Wextra -Wpedantic
# Change this variable for this target only
lib1.o: CPPFLAGS += -DPOSIX_SOURCE
# This target wants to add a compiler flag to all things compiled,
# then builds the program.
debug: CFLAGS += -g
debug: final-program
clean:
$(RM) final-program *.o
.PHONY: debug clean
Most people are taught how to write Makefiles in the same way they are taught how to code:
"Do this, then this, then this."
In other words, an imperative flow.
"A Makefile consists of a number of rules.
A rule is the name of the output file, the input files, and the code to run to make the output file."
A novice would then make sure that every output file has a rule explicitly telling make
how to build it.
Instead, a Makefile should give guidance through variables, and only offer up a set of steps when actually necessary. A large number of projects could easily be maintained through such simple Makefiles. If a specific compilation unit requires a feature test preprocessor macro, then the variable is changed for just that unit. If certain libraries are needed for the final target, then the variable is changed for that final target. This is more clear, more reproducible, more portable, and more robust in the face of errors than manually twiddling shell invocations.
make
is a domain-specific language for managing build dependencies.
The fact that it minimizes work by checking for stale binaries is more of an add-on than the main feature.
It keeps track of which flags and which order to build dependencies.
So what are some best practices for writing such Makefiles?
Best Practices for Makefiles
Put the default target first
Running make
with no specified target will build the first target specified in the Makefile.
So, specify that desired target first.
Makefiles are declarative, so generally the order does not matter (prior targets can be defined after the default target).
Use the right variables
For versions that support it, use an appropriate variable for the problem at hand.
Need to add a linker library under GNU make
?
Reach for the LDLIBS
variables, not LDFLAGS
.
Setting something for the preprocessor?
CPPFLAGS
is the correct choice.
If a command is needed, use the Makefile variable versions of those commands, like $(CC)
or $(RM)
.
Alter variables for specific targets that need them
Rather than have all files compile with -D_POSIX_C_SOURCE
, only set it for the targets that actually need the feature test macro.
This keeps each individual build small and compact, and easier to debug when things go wrong.
Alter variables for all targets when simpler
Every compilation probably wants a number of warnings enabled. So, set the compiler flags to enable those warnings globally, across all targets. If a project has a majority of builds requiring a feature test macro, then it should probably be global, too!
Avoid manual shell commands at all costs
The spaghettification of Makefiles can largely be laid at the feet of manual shell commands.
Nearly every target imaginable can be constructed using the built-in rules, so use them!
Some targets (like clean
) still need manual shell lines, but nearly any build artifact does not.