Your Ad Here



























                     LET'S BUILD A COMPILER!

                                By

                     Jack W. Crenshaw, Ph.D.

                          16 April 1989


                       Part IX: A TOP VIEW


*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************


INTRODUCTION

In  the  previous  installments,  we  have  learned  many of  the
techniques required to  build  a full-blown compiler.  We've done
both  assignment   statements   (with   Boolean   and  arithmetic
expressions),  relational operators, and control constructs.   We
still haven't  addressed procedure or function calls, but even so
we  could  conceivably construct a  mini-language  without  them.
I've  always  thought  it would be fun to see just  how  small  a
language  one  could  build  that  would still be useful.   We're
ALMOST in a position to do that now.  The  problem  is: though we
know  how  to  parse and translate the constructs, we still don't
know quite how to put them all together into a language.

In those earlier installments, the  development  of  our programs
had  a decidedly bottom-up flavor.  In  the  case  of  expression
parsing,  for  example,  we  began  with  the  very lowest  level
constructs, the individual constants  and  variables,  and worked
our way up to more complex expressions.

Most people regard  the  top-down design approach as being better
than  the  bottom-up  one.  I do too,  but  the  way  we  did  it
certainly seemed natural enough for the kinds of  things  we were
parsing.

You mustn't get  the  idea, though, that the incremental approach
that  we've  been  using  in  all these tutorials  is  inherently
bottom-up.  In  this  installment  I'd  like to show you that the
approach can work just as well when applied from the top down ...
maybe better.  We'll consider languages such as C and Pascal, and
see how complete compilers can be built starting from the top.

In the next installment, we'll  apply the same technique to build
a  complete  translator  for a subset of the KISS language, which
I'll be  calling  TINY.    But one of my goals for this series is
that you will  not only be able to see how a compiler for TINY or
KISS  works,  but  that you will also be able to design and build
compilers for your own languages.  The C and Pascal examples will
help.    One  thing I'd like you  to  see  is  that  the  natural
structure of the compiler depends very much on the language being
translated, so the simplicity and  ease  of  construction  of the
compiler  depends  very  much  on  letting the language  set  the
program structure.
                              
It's  a bit much to produce a full C or Pascal compiler here, and
we won't try.   But we can flesh out the top levels far enough so
that you can see how it goes.

Let's get started.


THE TOP LEVEL

One of the biggest  mistakes  people make in a top-down design is
failing  to start at the true top.  They think they know what the
overall structure of the  design  should be, so they go ahead and
write it down.

Whenever  I  start a new design, I always like to do  it  at  the
absolute beginning.   In  program design language (PDL), this top
level looks something like:


     begin
        solve the problem
     end


OK, I grant  you that this doesn't give much of a hint as to what
the next level is, but I  like  to  write it down anyway, just to
give me that warm feeling that I am indeed starting at the top.

For our problem, the overall function of a compiler is to compile
a complete program.  Any definition of the  language,  written in
BNF,  begins here.  What does the top level BNF look like?  Well,
that depends quite a bit on the language to be translated.  Let's
take a look at Pascal.


THE STRUCTURE OF PASCAL

Most  texts  for  Pascal  include  a   BNF   or  "railroad-track"
definition of the language.  Here are the first few lines of one:


      ::=   '.'

      ::= PROGRAM 

      ::=  


We can write recognizers  to  deal  with  each of these elements,
just as we've done before.  For each one, we'll use  our familiar
single-character tokens to represent the input, then flesh things
out a little at a time.    Let's begin with the first recognizer:
the program itself.
                              
To translate this, we'll  start  with a fresh copy of the Cradle.
Since we're back to single-character  names, we'll just use a 'p'
to stand for 'PROGRAM.'

To a fresh copy of the cradle, add the following code, and insert
a call to it from the main program:


{--------------------------------------------------------------}
{ Parse and Translate A Program }

procedure Prog;
var  Name: char;
begin
   Match('p');            { Handles program header part }
   Name := GetName;
   Prolog(Name);
   Match('.');
   Epilog(Name);
end;
{--------------------------------------------------------------}


The procedures  Prolog and Epilog perform whatever is required to
let the program interface with the operating system,  so  that it
can execute as a program.  Needless to  say,  this  part  will be
VERY OS-dependent.  Remember, I've been emitting code for a 68000
running under the OS I use, which is SK*DOS.   I  realize most of
you are using PC's  and  would rather see something else, but I'm
in this thing too deep to change now!

Anyhow, SK*DOS is a  particularly  easy OS to interface to.  Here
is the code for Prolog and Epilog:


{--------------------------------------------------------------}
{ Write the Prolog }

procedure Prolog;
begin
   EmitLn('WARMST EQU $A01E');
end;


{--------------------------------------------------------------}
{ Write the Epilog }

procedure Epilog(Name: char);
begin
   EmitLn('DC WARMST');
   EmitLn('END ' + Name);
end;
{--------------------------------------------------------------}
                              
As usual, add  this  code  and  try  out the "compiler."  At this
point, there is only one legal input:


     px.   (where x is any single letter, the program name)


Well,  as  usual  our first effort is rather unimpressive, but by
now  I'm sure you know that things  will  get  more  interesting.
There is one important thing to  note:   THE OUTPUT IS A WORKING,
COMPLETE, AND EXECUTABLE PROGRAM (at least after it's assembled).

This  is  very  important.  The  nice  feature  of  the  top-down
approach is that at any stage you can  compile  a  subset  of the
complete language and get  a  program that will run on the target
machine.    From here on, then, we  need  only  add  features  by
fleshing out the language constructs.  It's all  very  similar to
what we've been doing all along, except that we're approaching it
from the other end.


FLESHING IT OUT

To flesh out  the  compiler,  we  only have to deal with language
features  one by one.  I like to start with a stub procedure that
does  nothing, then add detail in  incremental  fashion.    Let's
begin  by  processing  a block, in accordance with its PDL above.
We can do this in two stages.  First, add the null procedure:


{--------------------------------------------------------------}
{ Parse and Translate a Pascal Block }

procedure DoBlock(Name: char);
begin
end;
{--------------------------------------------------------------}


and modify Prog to read:


{--------------------------------------------------------------}
{ Parse and Translate A Program }

procedure Prog;
var  Name: char;
begin
   Match('p');
   Name := GetName;
   Prolog;
   DoBlock(Name);
   Match('.');
   Epilog(Name);
end;
{--------------------------------------------------------------}


That certainly  shouldn't change the behavior of the program, and
it doesn't.  But now the  definition  of Prog is complete, and we
can proceed to flesh out DoBlock.  That's done right from its BNF
definition:


{--------------------------------------------------------------}
{ Parse and Translate a Pascal Block }

procedure DoBlock(Name: char);
begin
   Declarations;
   PostLabel(Name);
   Statements;
end;
{--------------------------------------------------------------}


The  procedure  PostLabel  was  defined  in  the  installment  on
branches.  Copy it into your cradle.

I probably need to  explain  the  reason  for inserting the label
where I have.  It has to do with the operation of SK*DOS.  Unlike
some OS's,  SK*DOS allows the entry point to the main  program to
be  anywhere in the program.  All you have to do is to give  that
point a name.  The call  to  PostLabel puts that name just before
the first executable statement  in  the  main  program.  How does
SK*DOS know which of the many labels is the entry point, you ask?
It's the one that matches the END statement  at  the  end  of the
program.

OK,  now  we  need  stubs  for  the  procedures Declarations  and
Statements.  Make them null procedures as we did before.

Does the program  still run the same?  Then we can move on to the
next stage.


DECLARATIONS

The BNF for Pascal declarations is:


      ::= (