In this tutorial, you will:

A short aside on compilers

In this class, we’re hoping to use (at least mainly) the clang C++ compiler. Clang is a compiler frontend based on LLVM, a project based here at UIUC, and is generally considered more modern (and informative), while being a mostly drop-in replacement for gcc. It provides the default C/C++ compiler on systems designed by Apple, and is becoming increasingly more popular for both industrial and common use. In previous semesters, we taught this course using exclusively the gcc C++ compiler. This tutorial will be executed chiefly using clang, and you are encouraged to follow suit. However, the alternative gcc command will be provided as well, for historical reasons and your interest.

Take note of the difference between a compiler and the language itself—a language is a standard, and a compiler interprets according to an implementation of that standard. (Fun fact: Neither the gcc C++ compiler nor the clang C++ compiler are actually C++ standard compliant.) In practice in this class, the differences should not overly concern you. However, if you run two of the clang/gcc paired commands below, such as the one which invokes their respective preprocessors, you may find that they do in fact have different internal behaviour.

Introduction to compilation

From your CS 225 git directory, run the following on EWS:

git fetch release
git merge release/maketutorial -m "Merging initial maketutorial files"
cd maketutorial/hello

If you’re on your own machine, you may need to run:

git fetch release
git merge --allow-unrelated-histories release/maketutorial -m "Merging initial maketutorial files"
cd maketutorial/hello

(Make sure you’ve followed the directions on the Course Setup page to check out your subversion repository first.)

Open up the file hello_world.cpp in your favourite text editor, and let’s walk through what it’s doing.

All in all, a very bare-bones Hello, world implementation.

Let’s try compiling it manually:

clang++ hello_world.cpp -o hello

or (note that the syntax is the same):

g++ hello_world.cpp -o hello

The -o flag tells the compiler to give the executable an alternative name. Otherwise, the default name is a.out.

./hello

The ./ simply tells your shell to search the current directory for the executable, rather than its normal executable paths. If all goes well, you should see Hello, world! printed as output. But now let’s try to get a little more in-depth. You can get rid of the executable you made by typing:

rm hello

And an

ls

should verify its disappearance. Run the following command:

clang++ -save-temps hello_world.cpp -o hello

or:

g++ -save-temps hello_world.cpp -o hello

The flag -save-temps tells the compiler to retain the temporary files it makes when we compile our program… so we can look at them! Listing the contents of your current directory should yield four new files: naturally the executable hello, but also hello_world.ii, hello_world.s, and hello_world.o, the temporary files we asked the compiler to save, and our guides into the slightly more technical aspects of basic compilation.

Running the macro preprocessor: What is hello_world.ii?

Run the following line:

clang++ -E hello_world.cpp -o preprocessed.ii

or

g++ -E hello_world.cpp -o preprocessed.ii

Then:

cat preprocessed.ii

If all goes well, your terminal will spit out a large amount of somewhat unintelligible code, but at the bottom, there’s the code for our Hello, world program (with the comment stripped out). So what did the preprocessor do?

All it really did for this program was replace our “include” directive (#include <iostream>) with the actual text of the library we included (and, of course, strip the comment out).

What does that actually mean? Well, if you were capable of compiling this program at all, somewhere on the machine (be it virtual, remote, or physically present) that compiled it, there exists a file called iostream, which contains the C++ code that implements the i/o streams library. If you were using clang, it will be located in the directory where the library libc++ (libcxx) is installed. If you were using gcc, it’s in the directory where libstdc++ (libstdcxx) is installed. Don’t worry about the specific libraries, it doesn’t really matter, but if you were so inclined, you would be able to find the code on your own machine. There is no magic involved here.

Back to the preprocessed code. In this case, the only included library was iostream, but it would do exactly the same thing for any other included library. If you had a million include directives, it would go through those millions of lines, find each file you referenced, and tack it to your program, so that when you referenced a function or class defined in one of those standard library files, it would make sense to the compiler—like std::cout in this case, which is a function defined in iostream, that you wouldn’t have been able to use without including the code. Of course the preprocessor has plenty of other jobs as well, but we won’t cover them now.

Question: Why did we enclose the library name, iostream, in angle brackets? It’s not just so our code looks cooler—we could have said #include "iostream" too (feel free to try it out), so what’s the difference? The difference (in clang and gcc) is that using angle brackets specifies that the preprocessor should look in the standard compiler include paths, and quotes tell it to search the current directory first, and via the standard paths only if that fails. Note that the true standard definition is a little more complicated than this: technically, both behave in an “implementation-defined manner” (any implementation could treat that differently if it so wished) but that’s not very important for us.

Now you can run:

cat hello_world.ii

Look familiar? That’s the output file the preprocessor dumped, and it is identical to the output you saw when you ran the preprocessor yourself. This is the file that the compiler really compiles—not your plain, unpreprocessed source file.

If you want to be sure, try running:

diff hello_world.ii preprocessed.ii

diff returns no output if the files it’s comparing are identical. Make sure that both hello_world.ii and preprocessed.ii were produced by the same compiler, though!

The actual compilation step: What is hello_world.s?

Now let’s take a look at the next temporary file. Print the contents of hello_world.s:

cat hello_world.s

For those of you who have seen assembly code before, the output should be recognisable. If you haven’t, assembly is the low-level intermediate between normal, higher-level programming languages like C++, and the machine code that your computer actually executes. In this case, the compiler (this is the step of compilation that’s actually called compilation) has translated the preprocessed source code from C++ to assembly, and dumped the output as hello_world.s. Let’s ask our compiler to directly compile the code that we preprocessed into assembly code:

clang++ -S preprocessed.ii -o compiled.s

or

g++ -S preprocessed.ii -o compiled.s

Use diff to verify that the files are the same (again, remember to make sure that both hello_world.s and compiled.s were produced by the same compiler):

diff hello_world.s compiled.s

If you used gcc, there shouldn’t be any differences. With clang, the only line that should be different is a line stating what preprocessed file the assembly was generated from.

Question: Why don’t we just write everything in assembly language? Well, for one, it’s kind of annoying to write all the time, and higher level ideas are harder to keep abstract without our human-friendly programming languages. Perhaps more importantly, assembly isn’t portable in the slightest. Assembly languages are specific to a specific architecture, so what assembles and runs on my machine may not run without alteration on yours. That’s pretty annoying, and compilers work pretty well, so most people normally leave the assembly to them.

Assembly: What is hello_world.o?

The next step is assembling the code—that just means translating the assembly code from hello_world.s into machine-readable code. That’s known as object code, and the standard suffix for object code is .o—and unlike .s, you’re likely to see quite a few .o files as you continue in this course. That doesn’t mean you have to read them, though. If you:

cat hello_world.o

you’ll fast realise it would be a somewhat unrealistic expectation anyway.

If you want to ask your compiler to assemble your assembly code, you can do this:

clang++ -c compiled.s -o assembled.o

or

g++ -c compiled.s -o assembled.o

Linking: Generating the final executable.

Linking is the final step, and arguably the most important and relevant to you. It’s the part you’ll interact with most, and besides perhaps flat out failure to compile at all, it’s the part of compiling you’ll be most confused by, particularly at the beginning of this class, when you’re responsible for all of your own compilation. Linking problems are some of the most notorious issues people have early on in this class… so pay attention to it, and perhaps you will be spared the “undefined reference” trauma.

Hint for the future

“Undefined reference” errors are pretty much always linking errors, and you will probably have them. Remember this.

All a linker does is take all the object files tossed out by the assembling step, and join them together into a single executable—in this case, the file hello which you ran earlier. We only have one object file in our Hello, world program, so this linking process is very uninteresting, but very soon (like, later in this tutorial), you’ll be dealing with multiple object files.

Run the following, to have our compiler link our object file and output our final executable, hello_manual:

clang++ assembled.o -o hello_manual

or

g++ assembled.o -o hello_manual

Feel free to verify that it does exactly the same thing as our original executable, hello:

./hello_manual
./hello

Congratulations, you’ve just compiled your own miniature program!

Dealing with multiple object files

Let’s visit the example directory animals now.

cd ../animals/
ls

The files you’ll see listed are dog.hpp, dog.cpp, and main.cpp. Feel free to check out the source code. dog.hpp is a C++ header file, what we’d call the definition of the Dog class, and dog.cpp is a source file, the implementation for said class. You’ll become more familiar with the details of that relationship as the class moves on, but right now, just know that together, they make the Dog class. main.cpp might look more familiar to you. It’s a lot like hello_world.cpp from the last exercise, in that it has some includes and it has an executable main function. In that main function, it calls a constructor for the class Dog, and asks the object it creates to do a number of things. But including the Dog header file doesn’t actually make the source code available. First, compile the main object file:

clang++ -c main.cpp -o main.o

or

g++ -c main.cpp -o main.o

Then, try compiling dog_program:

clang++ main.o -o dog_program

or

g++ main.o -o dog_program

That’s what we did before for our Hello, world program, so what happened this time? You got a bunch of “undefined reference” errors, and if you remember what we said a few paragraphs up, “undefined reference” errors are pretty much always linking errors. The compiler’s telling us that it doesn’t know what the function Dog::bark() (or any Dog function) does, because it doesn’t have that information in main.cpp. The solution is to compile a separate object file for the Dog class. In general, you’ll have one object file per .cpp source file, compiled together with its header file (.h or .hpp) and other necessary dependencies. So let’s compile an object file for the Dog class.

clang++ -c dog.cpp

or

g++ -c dog.cpp

And then:

ls

You’ll see that it added a new file called dog.o, the object file for the Dog class (if you include the header in the compilation, you’ll also see a .h.gch or .hpp.gch file. The .gch file is a precompiled header; all that happens with that is in the future, for fulfilling an #include "dog.hpp" directive, the precompiled header is preferentially used). So now if we wanted to compile these together, we would do this:

clang++ dog.o main.o -o dog_program

or

g++ dog.o main.o -o dog_program

And that should complete just fine. Try running it like so:

./dog_program

But what happens if we change something? If we just change something in main.cpp, like the Dog’s name, we just have to run that final linking command again, and that’s easy. But if we change something in the Dog class itself, like adding a new function, or changing an implementation, we have to recompile the Dog object file, and then link it back to the main object file. That may not seem like a big deal now, but it gets annoying extremely fast when you have more than a single tiny class.

Introducing the program make

Those of you with some experience in compilation are probably aware of a common Unix utility called make. It’s a program extremely widely used on Unix based systems (Microsoft also has a Visual Studio spinoff called nmake), generally to build executable program files from source files. (Don’t let the “expected use” case fool you, though—make is not a program limited by the narrow realm of compilation, as you’ll see before this tutorial is over.)

The best instruction is by example, so let’s build a basic Makefile for our dog_program. Open a file called Makefile (make sure it’s titlecase—make will recognise the lowercase makefile as well, but our autograder won’t, so it’s good to get into the habit now) with your preferred text editor (mine is emacs, yours may not be, so replace “emacs” with your editor of choice if you disagree):

emacs Makefile

Note that you won’t see the new file in your directory until you save it.

Makefile rules are written in the format:

target : tgt_dependency1 tgt_dependency2 ...
	command

So if our target is dog.o, what are the dependencies (the files needed to make the target)? They’re dog.cpp and dog.hpp, of course. And the command is the same as the one we used to compile the object file to begin with. So our rule for dog.o, the dog object file, will look like this:

dog.o : dog.cpp dog.hpp
	clang++ -c dog.cpp

Copy that into your new Makefile, and save it (for the makefile examples, I won’t explicitly give you the gcc equivalents, but if you want to use gcc instead, just replace all references to clang++ with g++). Now let’s write a rule for main.o:

main.o : main.cpp
	clang++ -c main.cpp
Tabbing in makefiles

Remember: the tab is very important—if you don’t tab the second line of a rule, you’ll get the error “*** missing separator. Stop.” Don’t forget your tabs!

You can remove everything in the directory besides dog.cpp, dog.hpp, main.cpp, and Makefile for the demonstration to have any real effect, and then execute make.

rm dog.o dog_program
make

If you ls now, you’ll see that it’s built the target dog.o (and left the precompiled header as well). But what is make doing?

An aside about the order in which make interprets makefiles

When called, make will search the current directory for a file called Makefile or makefile (again, for your sanity and grades, please only use Makefile, titlecase, with a capitalised M). If it finds one, it will execute the first rule in the file, and if one of the dependencies of the first target does not yet exist, it will search for a rule that creates it. So for example, if I have a makefile like so:

animal_assembly : moose goose cat
	command
moose : antlers hooves fur
	command
goose : beak wings webbed_feet interest_in_bread
	command
cat : whiskers evil_personality
	command

then make, when called with no arguments, will attempt to build the target animal_assembly. Assuming the dependencies moose, goose, and cat are already available in the directory, it will completely ignore the rules for them, and build animal_assembly from what’s present. If moose and cat are available, but goose is not, it will note that moose is present, see that goose is not present, look for a rule to build goose, find the rule, build goose, and then note that cat is present and build animal_assembly. If none of moose, goose, cat are present, it will have to build all of them using the rules available.

But what if you put the target for moose first?

moose : antlers hooves fur
	command
animal_assembly : moose goose cat
	command
goose : beak wings webbed_feet interest_in_bread
	command
cat : whiskers evil_personality
	command

Well, then if make is called with no arguments, it will make the target moose and stop. If you wanted it to make animal_assembly, you would then have to call it like so:

make animal_assembly

So a good rule of thumb is to put the final and most important command (for our purposes, the one that finally links the object files together into an executable) at the top.

Now back to our dog example. For our dog program, what the above means is that we should put the rule for the whole program at the top. How should we write it? Well, perhaps as you’d expect at this point:

dog_program : dog.o main.o
	clang++ dog.o main.o -o dog_program

Put that at the top of your makefile, save it, and run make again.

make
ls

Now you should see the executable dog_program, which should behave as it has in all previous post-compilation incarnations.

Now let’s do one final thing—in general, you should do this when writing your own Makefiles, but it’s especially useful for instructive purposes: we’ll write a clean rule.

clean :
	rm dog_program *.o

Add that to the bottom of your Makefile (as long as it’s not the top, it doesn’t really matter, but in long Makefiles, you want to separate the clean targets from real compilation-relevant targets for clarity), save it, and run make again, passing clean as an argument to invoke the clean rule:

make clean
ls

What happened? We’ve deleted all of the executables and compilation byproducts that we created, to clean up the directory. But the most notable thing about this rule compared to the others we’ve seen is that it a.) lacks dependencies and b.) doesn’t perform anything compilation-related in its command. Let’s talk about those two things a bit.

The dependency list

The dependency list you write for a target exists so that make knows what other targets to ensure you have before you run the command, but if the targets are guaranteed to be present and make isn’t responsible for updating them, make technically doesn’t need to check for anything. (It does not parse the actual command you give it, so it will not know what files to look for based on that.) Try deleting the dependency list of the target dog.o, and then running:

make clean
make dog.o

Since dog.cpp and dog.hpp are present in the directory, and make doesn’t have to rebuild them individually when they change (as it does for dog.o), make will have never have errors when compiling that line. But if you deleted the dependency list for the target dog_program and ran:

make clean
make

make will output an error that the recipe for target 'dog_program' failed, because dog.o was not in the dependency list, and make therefore did not check to make sure it existed. As such, it didn’t bother to build it. As for including dependencies that make will never have to build (such as .h/.hpp and .cpp files), well, it’s simply good practice to document the dependencies of each target thoroughly. It’s cleaner for other people to read, and it’s a good way for you to confirm that you’re doing what you wanted to do, particularly late at night when the lines start to blur together. And now onto point B.

make will run anything you ask it to, because it’s not as smart as you think it is

This is what we were referring to earlier, when we said make was not limited to compilation-related commands. Let’s move over to a different directory, for some make-related messing about.

cd ../file_meddling/
ls

As you can see, the Makefile is currently the only thing in this directory. It’s a very small and simple one, so open it up with your favourite text editor, and try guessing what it will do. It’s not compilation—it’s something altogether much sillier. When you have your prediction, execute make:

make
ls

And now there’s a new file in the directory. The command

cat silly_file

will yield the somewhat accurate phrase “Hello, there is nothing important here”—I say somewhat because while the file and indeed the phrase itself are completely unimportant, the concept is, in fact, important. make is not a magical program that intuits the mysterious delicacies of compilation by parsing incomprehensible syntax and making anything more of it than what you yourself put there. make is simply executing the command you gave it, and it does so blindly, and without any particular personal interest in the results. Feel free to execute the following now:

make move_file
ls

Now, when make executes the rule for the target move_file, it simply renames the file silly_file to something even more ungainly. And finally:

make delete_file
ls

removes the file altogether. Usually a rule like this will be named clean, and it’s very acceptable to stick to that convention for the rest of your life. However to illustrate that there is nothing magical about the target name clean (or indeed, any target name at all), in this Makefile, we are using the clean target to populate our directory with junk. Try it:

make clean
ls

Note that there are now five empty junk files (the directory is not cleaner), and feel free to remove them:

make really_clean

(For the future, it is recommended that this educational example not be taken too deeply to heart. Conventions exist for a reason, and that reason is usually to make everybody’s lives easier. It is always worth knowing, though, that conventions are ultimately just that—conventions.)

Another important concept is understanding the control flow. In what order would the commands have to have gone in order to create a new file and fill it with text? Cheerfully, make will tell you what command it’s executing as it executes them, but don’t take that for granted. Walk through the Makefile yourself. In fact, let’s do it together.

The first rule you hit is the rule for the target all. all is a phony target, commonly used both in the real world and in CS225, placed at the top of a Makefile, which, in its typical use case, will list all relevant targets which produce executables as dependencies. This ensures that make will compile all of the executables for which there are rules listed. In this case, we’ve just put it at the top because we can. It, of course, is not currently responsible for any executables.

When you read the rule for all, you see the dependency listed is fill_file_with_nonsense. Obviously fill_file_with_nonsense doesn’t actually exist in the directory, so we skip down to the rule for fill_file_with_nonsense. The dependency listed is create_file, which also isn’t a real file, so we skip to the rule for create_file, which tells us it has no dependencies, and to touch silly_file. touch is a standard Unix program that can create, as we have done here, an empty file.

Once that’s done, we can finish up the rule to “build” fill_file_with_nonsense, which pipes the string “Hello, there is nothing important here” into the newly created file silly_file.

Then we can finish up “building” the target all, for which the command is to print the string “I have mostly created a lot of junk today!” to standard out. And so it does. Take note that, of course, it “builds” none of the targets that are not present in its direct control flow, so the unmentioned targets have to be explicitly passes as arguments to make in order for it to build them.

Just to be really clear, let’s add another rule to our Makefile. Open the Makefile in your text editor of choice, and write the rule open_file:

open_file :
	gedit another_silly_file

(If you do not have gedit installed, use another text editor.) Now run:

make open_file

and the gedit text editor will open another_silly_file. Feel free to make a little change and run make open_file again. It will open the same file. And because of our cleverly repetitive naming scheme, we can even delete it with

make delete_file

So hopefully now the basics are painfully clear. Let’s move on.

Marvelous macros

Now let’s gloss over a basic component of makefile syntax that we’ve so far neglected to mention. Makefile syntax allows for a certain kind of variable called a macro. Macros are useful in a standard makefile essentially for the same reason that variables are useful in a normal program—they allow you to quickly define parts of your program which will appear repeatedly, and if you later to decide to change that part of the program, well, it’s a single change, rather than the countlessly many that are possible in large makefiles. In this class, you will never actually need macros to write an effective and mostly unrepetitive makefile, but it’s not a bad habit to get into, so let’s see an example.

cd ../macro_intro/

You may notice that our Hello, world example from ages ago has returned, and now we have a makefile for it. Open up the Makefile. There’s some rather strange syntax in here, so let’s try to break it down.

First, we’ve defined a macro called CXX. Unfortunately, this is a special macro, so we’re going to ignore it briefly and jump to FLAGS. FLAGS is a macro we defined to refer to the flags we’re passing our compiler; in this case, the flag is -O, an optimisation option that turns on a series of other flags which it’s not important for you to know right now (see the clang/gcc documentation for that information). FLAGS of course isn’t restricted in value to valid flags—we could have said FLAGS = some moose have large antlers and make would have been perfectly happy with that, until the call to clang++ failed later (you can try it out; make will actually try to execute g++ some moose have large antlers hello_world.cpp -o hello).

Now let’s talk about CXX. Not all macro names in the Makefile language are completely without meaning—there is a certain set of names which do have a default meaning. In this case, we’ve defined CXX = clang++. The CXX macro’s default value is usually g++ on Linux systems, so if we never defined the macro CXX, when we used it in the command to compile the executable, it would have probably used g++ instead. Try running make right now, and you should see the following output:

make
clang++ -O hello_world.cpp -o hello

But if you delete the line that says CXX = clang++, what happens?

make
g++ -O hello_world.cpp -o hello

Feel free to replace the line now.

When you call a macro, enclose it like so: $(MACRO). That’s simply makefile language syntax. (You may have noticed that my example macro’s name was all uppercase—as in fact, all of my macros thus far have been. This is not syntactically required, but it is conventional.)

So that explains most of what’s going on in this file, but the strange symbols $? and $@ remain, perhaps, mysteries. As you might guess, those are also macros—they’re special predefined macros in the makefile language, with the respective meanings “names of the dependencies (newer than the target)” and “name of the target”, so in this case, $? refers to hello_world.cpp (provided that you make clean before you make), and $@ refers to hello, incidentally (purposefully) the name of the executable created as well. Using shorthand like this is a good motivation to name targets after the file the rule creates (this is, of course, also conventional, and increases the readability of your Makefiles drastically). Special predefined macros aren’t important for you to know—there are others we haven’t yet mentioned—but as you go about life in CS225 and the real world, you are bound to come across them.

Compiler and linker flags in CS225

For this class we are going to have a very standard set of flags to pass during compilation and linking. We are going to define these as macros in each assignment’s Makefile. Here is an example of what those look like (taken from lab_intro):

# This defines our compiler and linker, as we've seen before.
CXX = clang++
LD = clang++

# These are the options we pass to the compiler.
# -std=c++1y means we want to use the C++14 standard (called 1y in this version of Clang).
# -stdlib=libc++ specifies that we want to use the standard library implementation called libc++
# -c specifies making an object file, as you saw before
# -g specifies that we want to include "debugging symbols" which allows us to use a debugging program.
# -O0 specifies to do no optimizations on our code.
# -Wall, -Wextra, and -pedantic tells the compiler to look out for common problems with our code. -Werror makes it so that these warnings stop compilation.
CXXFLAGS = -std=c++1y -stdlib=libc++ -c -g -O0 -Wall -Wextra -Werror -pedantic

# These are the options we pass to the linker.
# The first two are the same as the compiler flags.
# -l<something> tells the linker to go look in the system for pre-installed object files to link with.
# Here we want to link with the object files from libpng (since we use it in our code) and libc++. Remember libc++ is the standard library implementation.
LDFLAGS = -std=c++1y -stdlib=libc++ -lpng -lc++abi -lpthread

A final diversion: The makefile language is Turing complete?

Limited the uses may be for such information, but particularly thanks to its support for lambda abstractions and combinators, the makefile language is actually a complete functional programming language. Will you ever need to write a Fibonacci number generator in the makefile language? Probably not, but you certainly can.

cd ../functional_fun/
make

This will, of course, get quite slow as $$n$$ gets large (the naive solution takes exponential time), so I suggest you stop the process with a well timed Ctrl-C as it begins to lag.

fin

That concludes the tutorial on compilation and Makefiles. If you have any questions, please feel free to look up the concepts yourself, or take them to the CS225 Piazza newsgroup, or ask your TAs or classmates for help.