|
Page 1 of 6
The process of language translation
By reading this chapter first, you’ll
get the basic flavor of what it is like to program with objects in
C++, and you’ll also discover some of the reasons for the
enthusiasm surrounding this language. This should be enough to
carry you through Chapter 3, which can be a bit exhausting since it
contains most of the details of the C language.
The user-defined data type, or
class, is what
distinguishes C++ from traditional procedural languages. A class is
a new data type that you or someone else creates to solve a
particular kind of problem. Once a class is created, anyone can use
it without knowing the specifics of how it works, or even how
classes are built. This chapter treats classes as if they are just
another built-in data type available for use in programs.
Classes that someone else has created are
typically packaged into a library. This chapter uses several of the class
libraries that come with all C++ implementations. An especially
important standard library is iostreams, which (among other things)
allow you to read from files and the keyboard, and to write to
files and the display. You’ll also see the very handy
string class, and the vector container from the
Standard C++ Library. By the end of the chapter, you’ll see
how easy it is to use a pre-defined library of classes.
In order to create your first program you
must understand the tools used to build applications.
All computer languages are translated from
something that tends to be easy for a human to understand
(source code) into something that is executed on a computer
(machine instructions).
Traditionally, translators fall into two classes:
interpreters and compilers.
Interpreters
An interpreter translates source code into
activities (which may comprise groups of machine instructions) and
immediately executes those activities. BASIC, for example, has been a popular
interpreted language. Traditional BASIC interpreters translate and
execute one line at a time, and then forget that the line has been
translated. This makes them slow, since they must re-translate any
repeated code. BASIC has also been compiled, for speed. More modern
interpreters, such as those for the Python language, translate the entire program
into an intermediate language that is then executed by a much
faster interpreter[25].
Interpreters have many advantages. The
transition from writing code to executing code is almost immediate,
and the source code is always available so the interpreter can be
much more specific when an error occurs. The benefits often cited
for interpreters are ease of interaction and rapid development (but
not necessarily execution) of programs.
Interpreted languages often have severe
limitations when building large projects (Python seems to be an
exception to this). The interpreter (or a reduced version) must
always be in memory to execute the code, and even the fastest
interpreter may introduce unacceptable speed restrictions. Most
interpreters require that the complete source code be brought into
the interpreter all at once. Not only does this introduce a space
limitation, it can also cause more difficult bugs if the language
doesn’t provide facilities to localize the effect of
different pieces of code.
Compilers
A compiler translates source code directly
into assembly language or machine instructions. The eventual end
product is a file or files containing machine code. This is an
involved process, and usually takes several steps. The transition
from writing code to executing code is significantly longer with a
compiler.
Depending on the acumen of the compiler
writer, programs generated by a compiler tend to require much less
space to run, and they run much more quickly. Although size and
speed are probably the most often cited reasons for using a
compiler, in many situations they aren’t the most important
reasons. Some languages (such as C) are designed to allow pieces of
a program to be compiled independently. These pieces are eventually
combined into a final executable program by a tool called
the linker. This
process is called separate compilation.
Separate compilation has many benefits. A
program that, taken all at once, would exceed the limits of the
compiler or the compiling environment can be compiled in pieces.
Programs can be built and tested one piece at a time. Once a piece
is working, it can be saved and treated as a building block.
Collections of tested and working pieces can be combined into libraries for use by other
programmers. As each piece is created, the complexity of the other
pieces is hidden. All these features support the creation of large
programs[26].
Compiler debugging features have improved significantly over
time. Early compilers only generated machine code, and the
programmer inserted print statements to see what was going on. This
is not always effective. Modern compilers can insert information
about the source code into the executable program. This information
is used by powerful source-level debuggers to show exactly
what is happening in a program by tracing its progress through the
source code.
Some compilers tackle the compilation-speed
problem by performing in-memory compilation. Most compilers
work with files, reading and writing them in each step of the
compilation process. In-memory compilers keep the compiler program
in RAM. For small programs, this can seem as responsive as an
interpreter.
The compilation process
To program in C and C++, you need to
understand the steps and tools in the compilation process. Some
languages (C and C++, in particular) start compilation by running a
preprocessor on the
source code. The preprocessor is a simple program that replaces
patterns in the source code with other patterns the programmer has
defined (using preprocessor directives). Preprocessor directives are used to save
typing and to increase the readability of the code. (Later in the
book, you’ll learn how the design of C++ is meant to
discourage much of the use of the preprocessor, since it can cause
subtle bugs.) The pre-processed code is often written to an
intermediate file.
Compilers usually do their work in two
passes. The first pass parses the pre-processed code. The compiler
breaks the source code into small units and organizes it into a
structure called a tree. In the expression “A +
B” the elements ‘A’,
‘+,’ and ‘B’ are leaves on
the parse tree.
A global optimizer is sometimes used between the first and
second passes to produce smaller, faster code.
In the second pass, the code generator walks through the parse tree
and generates either assembly language code or machine code for the
nodes of the tree. If the code generator creates assembly code, the
assembler must then be run. The end result in both cases is an
object module (a file that
typically has an extension of .o or .obj). A
peephole optimizer is
sometimes used in the second pass to look for pieces of code
containing redundant assembly-language statements.
The use of the word “object” to describe chunks of machine
code is an unfortunate artifact. The word came into use before
object-oriented programming was in general use.
“Object” is used in the same sense as
“goal” when discussing compilation, while in
object-oriented programming it means “a thing with
boundaries.”
The linker combines a list of object modules into an
executable program that can be loaded and run by the operating
system. When a function in one object module makes a reference to a
function or variable in another object module, the linker resolves
these references; it makes sure that all the external functions and
data you claimed existed during compilation do exist. The linker also adds a special
object module to perform start-up activities.
The linker can search through special files
called libraries in order to resolve all its references. A
library contains a collection
of object modules in a single file. A library is created and
maintained by a program called a librarian.
Static type checking
The compiler performs type checking during the first pass. Type
checking tests for the proper use of arguments in functions and
prevents many kinds of programming errors. Since type checking
occurs during compilation instead of when the program is running,
it is called static type checking.
Some object-oriented languages (notably
Java) perform some type
checking at runtime (dynamic type checking). If combined
with static type checking, dynamic type checking is more powerful than
static type checking alone. However, it also adds overhead to
program execution.
C++ uses static type checking because the
language cannot assume any particular runtime support for bad
operations. Static type checking notifies the programmer about
misuses of types during compilation, and thus maximizes execution
speed. As you learn C++, you will see that most of the language
design decisions favor the same kind of high-speed,
production-oriented programming the C language is famous
for.
You can disable static type checking in
C++. You can also do your own dynamic type checking – you
just need to write the code.
|