Cobcy for Hackers
Introduction
The purpose of this document is to explain the internal design of Cobcy to
start you up as a Cobcy hacker. Here I will in brief try to explain some of
the functions in source code (since this is a pretty big task, chances
are you are still reading an incomplete document)
Some global portability conventions: I use
#ifndef __MSDOS__
...
#endif
directive to help those who want to compile under DOS. I have not heard
any success stories yet, so these are mostly just a guess.
Another macro I use is CAN_HAVE_STDIO
, an unfortunate name
perhaps, inserted because some compilers, like some of the older versions
of gcc don't like to mix iostreams and stdio. Current gcc does not have
this problem. I also use the NDEBUG macro to isolate debug messages.
I chose that because it is compatible with assert
. An example
would be:
#ifndef NDEBUG
cout << "\tIn GenMove\n";
#endif
All code should be written using codef
and declf
streams, with the former being the output .c file and the latter the .h file.
These streams are opened and closed in seminit.cc
Modules
Contents
Cobcy code is divided into three roughly divided parts - semantic actions,
code generating classes, and glue. Semantic actions are stored in files
starting with sem, code generating classes are in files starting
with sym, and everything else is glue routines. This is of course
with the exception of .l and .y files, which constitute the parser.
Semantic action deal with the stack and all parser-related stuff. Code
generating classes do just that - generate most of the actual code.
main.cc
This file contains routines to process command line-arguments, set
compiler configuration, and anything else that does not have anything
to do with code generation. CobcyConfig
, the global
configuration structure, is defined and set up here. main also
opens the source file and starts the parser. When the parser is
finished, main checks for errors and removes output files if found.
semarith.cc
This defines arithmetic-related and movement semantic actions.
Since many functions require rounding, a global variable
RoundResult
is defined here with type BOOL
.
Most of these simply involve plenty of tedious setup with not much
interesting stuff going on. Basically all of these generate some
sort of an assignment statement or a string assignment function with
possible conversion (all from cobfunc.c)
Present functions:
- void GenMove (void)
- Record moves are not implemented.
- void GenAdd (void)
- void GenSubtract (void)
- void GenMultiply (void)
- void GenDivide (void)
- Divide does generate zero checking as an
if
statement,
but it is just that unless ON SIZE ERROR appends an
else
clause to handle the error.
- void GenCompute (void)
- Same thing as all the others.
- void SetResultRounding (void)
- Sets ResultRounding variable to TRUE. All of the above
functions will reset it to FALSE.
semconio.cc
This file contains user interaction routines, which at present include
only displaying and accepting text.
Functions:
- void GenAccept (void)
- Processes ACCEPT clause. Expects a marked list of identifiers on
the stack, processing them one at a time, generating code to
assign appropriate value to it. ACCEPT sources other than
console generate unimplemented function stubs, so the error
will be only reported by gcc.
- void GenDisplay (void)
- Expects a marked list of identifiers on
the stack, processing them one at a time, generating code to
print each one using a separate fprintf statement. At the end
an fprintf printing a newline is generated.
- void SetDisplayOutput (void)
- Expects an identifier on the stack.
Sets the file stream to DISPLAY to. All Cobcy files are of type FILE,
so GenDisplay can just pass the needed one to fprintf.
- void SetAcceptSource (AcceptSourceType NewSrc)
- This one is a little more tricky, with source as an enum. GenAccept
will parse it to determine where the data should come from.
semcontrol.cc
This file contains code generation of control structures. Paragraphs are
implemented as separate functions with a same-name label in the beginning.
The label is there to allow faster loops. See GenGoto for more details.
There is an arbitrary limit on the number of paragraphs to allow STOP RUN
to work. The limit is set to 32000 at present, which is a lot of paragraphs.
The compiler will not give you any bad message about it, but if you have
50000 paragraphs and STOP RUN in paragraph 3, you'll go to paragraph 32000.
I'll fix that if anyone needs that many.
Global variables are ParagraphList
, which is a queue of all
paragraphs for generation of the main control switch in generated C main().
See seminit.cc for more detail. CurPar
points to the current paragraph object. CurLoopVar
is the
name of the last declared loop variable, needed because there are several
loop generation routines which need to use this. And the last one is
LoopNesting
, an integer denoting current nesting level, used
mostly for indentation.
- void GenParagraph (void)
- Expects an identifier on the stack.
Closes the current paragraph, a C function, and starts a new one.
Calls GenEndProc to close the current paragraph.
- void GenGoto (void)
- Expects an identifier on the stack. If the destination is the current
paragraph, use a C goto statement. This comes in handy with loops,
because it thus avoids recursion. Otherwise a
return
statement is generated to give the switch in main the location
of the paragraph to go to. This mechanism works with bad code :)
- void GenPerform (void)
- This just generates a function call.
Expects an integer on top, number of times to perform the function, and
an identifier right under it. Will build a
for
loop
around the call to perform many times.
- void GenEndProc (void)
- Generates function closing, i.e. a return statement and a closing brace.
- void BeginCompound (void)
- Inserts a brace and increments indent level.
- void EndCompound (void)
- Inserts a brace and decrements indent level.
- void GenStartProc (void)
- Calls GenParagraph.
- void GenStartIf (void)
- Prints "if ("
- void GenStartElsif (void)
- Prints "else if ("
- void GenEndIf (void)
- Prints ")\n"
- void GenElse (void)
- Prints "else ("
- void GenBool (void)
- Expects two identifiers if first is a unary, like "ALPHABETIC",
expects three otherwise. Generates a C conditional in parentheses.
- void GenConnect (void)
- Expects a connector on the stack and prints it directly.
(connector == &&, ||)
- void GenStopRun (void)
- Generates an
exit
statement.
- void GenParagraphCalls (void)
- Writes paragraph sequence constants to the header stream and generates
the big switch statement in main, with
_cpi
being the
track variable, set to the sequence constant of the paragraph to
call. The switch is basically a mapping of the former to the latter.
- void GenStartLoop (void)
- Prints "for (" and increments indentation and loop nesting.
- void GenLoopInit (void)
- Generates the statement inside the
for
to init the loop variable. Expects the value to be on top of the
stack and the variable identifier under it.
Sets CurLoopVar
.
- void GenLoopCondition (void)
- Generates the loop condition. This is not a boolean, it is just
" <= " always because of the way COBOL works. Expects the end
value identifier or a constant on the stack.
- void GenLoopIncrement (void)
- Expects the increment value on the stack. If not specified by the
source, the parser sets it to 1.
- void GenEmptyClause (void)
- Generates a ";", C empty statement.
semdecl.cc
This file contains functions related to declarations in the data division.
Since parsing is sequential and a record is declared before its entries,
several global variables are used to keep track of parenting properties.
NestedRecordList
is a stack of CobolRecord
s, the
top of which denotes the record that is the parent of the current
ParentRecord. I have no idea why I didn't just use Top
for
this, but I am afraid to change it. After all, it works :)
A record is pushed onto this stack when it is declared and popped when a
record with a smaller or equal level number is declared. Yes, level numbers
are what determines nesting. The numbers do not mean a thing currently
unlike in real COBOL.
ParentRecord
denotes the parent of the current scope.
SFQueue
is borrowed externally from
semfile.cc to declare special names, which are treated
essentially as any other file.
VarInit
is a queue holding the variable objects that need
to be initialized.
- void DeclareRecordLevel (void)
- Declares one variable/record declaration. In COBOL these are usually
on separate lines, so that's consistent with 'level'.
Expects a 4-entry list of stuff on the stack, a detail taken care of by
the parser. The difference between a record and a variable is
decided by the presence of a picture field.
Adoption of variables is another fine point. Each child needs to know
who its parent is to set itself up. But if the child is a record,
it is not finished setting up until it is closed, so the parent is
not told to adopt until that moment. Variables can be adopted
immediately upon declaration.
- void CloseScopeLevels (WORD LastLevel)
- This is the function that takes care of closing records. It checks whether
LastLevel
will close any records, does that if so, and
adopts the closed record to its parent.
- void InitializeVariables (void)
- Creates initialization code for variables in
VarInit
.
- void DeclareSpecialName (void)
- Declares a special name. See above general notes for this file for
discussion.
semfile.cc
This file contains file-related actions. Since COBOL has extensive file
handling capabilities, this is a pretty big module. This also includes
record association functions. Any particular implementation details are
in CobolFile
class, to which most of the functions will
delegate. The record declaration is reused from
semdecl.cc, since they are pretty much identical.
The function that binds the record to the file descriptor is called by
the parser after the first record level is declared.
- void BeginFileDecl (void)
- This function is for the SELECT statement.
Expects an identifier on top of the stack designating the COBOL name
to call the file. If the name is a reserved identifier
printer, this file is declared to output to a temporary
file, a print spool, which is flushed and printed on file's closing.
(The exact sequence is flush, close, and print)
FileToSelect variable is set here for later association
with its other properties. This is the function that enters the
file descriptor into the symbol table, so every fd must be SELECTed
before anything else.
- void EndFileDecl (void)
- This function is for the SELECT statement.
FileToSelect is set to NULL. Perhaps error checking could
be added here for missing properties?
- void GenFileDesc (void)
- This is a semantic action for FD statement in the file section of
the data division.
Expects a file name on top and a file and under it an identifier,
describing a previously SELECTed file.
Appends the file record to
FDInit
queue for being
associated with its parent record. This is done because although
the record can be told what the parent fd is (it isn't), the fd cannot
associate with the record at that point (after the first statement
of record declaration), so this queue is created to associate all
fds with their actual records, not just names of the records.
- void AssociateFileRecord (void)
- This copies the name of the associated record to the file record.
See GenFileDesc for explanation.
- void GenOpen (OpenModeType mode)
- Generates a statement to open the file. Expects a list of fds on the stack.
The mode is the same for all.
- void GenClose (void)
- Generates a statement to close the file. Expects a list of fds on the stack.
- void GenRead (void)
- Expects a record or a fd identifier on the stack, seeks, calls
GenReadData, GenReadEnd, and GenEOFCheck in
CobolFile
.
- void GenWrite (void)
- Expects a source record, and a record or a fd identifier on the stack,
sets up for appending, calls
GenWriteData and GenWriteEnd in
CobolFile
.
- void GenRewrite (void)
- The only difference between this and GenWrite is seeking instead of
setting up for append.
- void AssociateRecordsWithFD (void)
- This loops through FDInit queue, calling the function to
bond records and file descriptors.
- void SetFileStatus (void)
- Expects the status on stack and passes it to CobolFile.
- void SetAccessMode (AccessModeType mode)
- void SetOrganization (OrganizationType org)
- These just delegate.
- void SetRelativeKey (void)
- void SetRecordKey (void)
- Expects identifier on stack and delegates. There is no distinction
between record and relative keys since each file can only have
one at a time.
- void OpenSpecialFiles (void)
- void CloseSpecialFiles (void)
- These look through SFQueue for files that are opened at
startup and closed at shutdown. A printer is such a file for instance.
seminfo.cc
This file contains informational functions, which just generate comments,
taken from identification division. Available functions:
- void SetProgramName (void)
- void SetSourceComputer (void)
- void SetObjectComputer (void)
seminit.cc
This file contains
void InitializeVariables (void)
Creates a sepearate function to assign all initial values specified
upon declaration. This is a little more convenient for initializing
long variables then an inline initialization.
void FinishDecl (void)
Closes all scopes and associates file descriptors.
void StartCode (void)
Generates the header for the first paragraph.
void EndCode (void)
This generates main() and all related things in it.
void StartProgram (void)
Opens output files and writes initial headers there, like the include files,
standard variables, etc.
void EndProgram (void)
Closes output files.
semutil.cc
This file contains accessory routines to make coding a bit easier and
perhaps a bit more portable.
- void WriteError (char * str)
- Writes a compiler error message. Sets the error flag, so that
the glue routines could delete the output files as not to confuse
the user to whether compilation was successful or not.
- void WriteWarning (char * str)
- Writes a compiler warning message.
- void NIY (char * str)
- Call this function from every stub you make. It just generates a
comment in the output code and prints a compiler warning.
- void GenComment (char * str);
- Generates a comment in the output code.
- BOOL ErrorOccured (void)
- This function returns TRUE if WriteError was ever called.
- void PrintConstant (StackEntry * entry, ostream& os)
- Prints constant from the given stack entry on os.
- CobolSymbol * LookupIdentifier (char * id)
- Returns the
CobolSymbol
associated with the given
identifier, generating a compiler error if not found.
- void PrintIdentifier (char * id, ostream& os)
- A shortcut to print the C name of id on os.
- void GenIndent (void)
- Indents to the current indentation level.
- WORD CountIdentifiers (void)
- Counts identifiers on the stack in a marked list. Removes the mark.
- void ReverseIdentifiers (WORD nIds)
- Reverses the order of nIds identifiers on the stack.
- void Push (StackEntryKind kind)
- Called by the parser, pushes the given kind of a thing on the stack using
parser's global variables.
- BOOL IsInSet (char c, char * set)
- This is the same as
member
template in adtlib; I wrote
this before I wrote that. Use either one.
- void PrintStackEntry (StackEntry * se)
- A debug function, enabled only if NDEBUG is undefined.
Prints the given stack entry to stderr.
- void PrintStack (void)
- A debug function, enabled only if NDEBUG is undefined.
Prints the whole stack to stderr.
Codegen classes
This part is still under construction, so you may be better off just reading
the header files for now.
Contents
Hierarchy
Streamable
|
+--- CobolSymbol
| |
| +--- CobolData
| | |
| | +--- CobolRecord
| | |
| | +--- CobolVar
| |
| +--- CobolFile
| |
| +--- CobolLabel
|
+--- CobolConstant
|
+--- PictureType
Class descriptions
Defined in symbase.h and symbase.cc. Defines an abstract symbol
table entry. This is an abstract virtual class, so don't try to make any.
Restricts the symbol name to 50 characters (MAX_SYMBOL_LENGTH) and the
prefix name to 80 characters (MAX_PREFIX_LENGTH).
Defines the following member variables:
- ParentCobolName [MAX_SYMBOL_LENGTH]
- COBOL name of the parent, if any.
- Prefix [MAX_PREFIX_LENGTH]
- The list of all the parents in a C structure format to prepend
the variable name to make it legal. This does not contain
the prefix at all time. This is just a buffer variable.
- CName [MAX_SYMBOL_LENGTH]
- C variable name.
- FullCName [MAX_FULLNAME_LENGTH]
- Cached Prefix + CName.
- CobolName [MAX_SYMBOL_LENGTH]
- COBOL name of the variable or whatever.