Cobcy for Hackers


Introduction

The purpose of this document is to explain the internal design of Cobcy to start you up as a Cobcy hacker. Here I will in brief try to explain some of the functions in source code (since this is a pretty big task, chances are you are still reading an incomplete document)

Some global portability conventions: I use

#ifndef __MSDOS__
...
#endif
directive to help those who want to compile under DOS. I have not heard any success stories yet, so these are mostly just a guess. Another macro I use is CAN_HAVE_STDIO, an unfortunate name perhaps, inserted because some compilers, like some of the older versions of gcc don't like to mix iostreams and stdio. Current gcc does not have this problem. I also use the NDEBUG macro to isolate debug messages. I chose that because it is compatible with assert. An example would be:
#ifndef NDEBUG
    cout << "\tIn GenMove\n";
#endif

All code should be written using codef and declf streams, with the former being the output .c file and the latter the .h file. These streams are opened and closed in seminit.cc

Modules

Contents

Cobcy code is divided into three roughly divided parts - semantic actions, code generating classes, and glue. Semantic actions are stored in files starting with sem, code generating classes are in files starting with sym, and everything else is glue routines. This is of course with the exception of .l and .y files, which constitute the parser. Semantic action deal with the stack and all parser-related stuff. Code generating classes do just that - generate most of the actual code.

main.cc

This file contains routines to process command line-arguments, set compiler configuration, and anything else that does not have anything to do with code generation. CobcyConfig, the global configuration structure, is defined and set up here. main also opens the source file and starts the parser. When the parser is finished, main checks for errors and removes output files if found.

semarith.cc

This defines arithmetic-related and movement semantic actions. Since many functions require rounding, a global variable RoundResult is defined here with type BOOL. Most of these simply involve plenty of tedious setup with not much interesting stuff going on. Basically all of these generate some sort of an assignment statement or a string assignment function with possible conversion (all from cobfunc.c) Present functions:
void GenMove (void)
Record moves are not implemented.
void GenAdd (void)
void GenSubtract (void)
void GenMultiply (void)
void GenDivide (void)
Divide does generate zero checking as an if statement, but it is just that unless ON SIZE ERROR appends an else clause to handle the error.
void GenCompute (void)
Same thing as all the others.
void SetResultRounding (void)
Sets ResultRounding variable to TRUE. All of the above functions will reset it to FALSE.

semconio.cc

This file contains user interaction routines, which at present include only displaying and accepting text. Functions:
void GenAccept (void)
Processes ACCEPT clause. Expects a marked list of identifiers on the stack, processing them one at a time, generating code to assign appropriate value to it. ACCEPT sources other than console generate unimplemented function stubs, so the error will be only reported by gcc.
void GenDisplay (void)
Expects a marked list of identifiers on the stack, processing them one at a time, generating code to print each one using a separate fprintf statement. At the end an fprintf printing a newline is generated.
void SetDisplayOutput (void)
Expects an identifier on the stack. Sets the file stream to DISPLAY to. All Cobcy files are of type FILE, so GenDisplay can just pass the needed one to fprintf.
void SetAcceptSource (AcceptSourceType NewSrc)
This one is a little more tricky, with source as an enum. GenAccept will parse it to determine where the data should come from.

semcontrol.cc

This file contains code generation of control structures. Paragraphs are implemented as separate functions with a same-name label in the beginning. The label is there to allow faster loops. See GenGoto for more details. There is an arbitrary limit on the number of paragraphs to allow STOP RUN to work. The limit is set to 32000 at present, which is a lot of paragraphs. The compiler will not give you any bad message about it, but if you have 50000 paragraphs and STOP RUN in paragraph 3, you'll go to paragraph 32000. I'll fix that if anyone needs that many. Global variables are ParagraphList, which is a queue of all paragraphs for generation of the main control switch in generated C main(). See seminit.cc for more detail. CurPar points to the current paragraph object. CurLoopVar is the name of the last declared loop variable, needed because there are several loop generation routines which need to use this. And the last one is LoopNesting, an integer denoting current nesting level, used mostly for indentation.
void GenParagraph (void)
Expects an identifier on the stack. Closes the current paragraph, a C function, and starts a new one. Calls GenEndProc to close the current paragraph.
void GenGoto (void)
Expects an identifier on the stack. If the destination is the current paragraph, use a C goto statement. This comes in handy with loops, because it thus avoids recursion. Otherwise a return statement is generated to give the switch in main the location of the paragraph to go to. This mechanism works with bad code :)
void GenPerform (void)
This just generates a function call. Expects an integer on top, number of times to perform the function, and an identifier right under it. Will build a for loop around the call to perform many times.
void GenEndProc (void)
Generates function closing, i.e. a return statement and a closing brace.
void BeginCompound (void)
Inserts a brace and increments indent level.
void EndCompound (void)
Inserts a brace and decrements indent level.
void GenStartProc (void)
Calls GenParagraph.
void GenStartIf (void)
Prints "if ("
void GenStartElsif (void)
Prints "else if ("
void GenEndIf (void)
Prints ")\n"
void GenElse (void)
Prints "else ("
void GenBool (void)
Expects two identifiers if first is a unary, like "ALPHABETIC", expects three otherwise. Generates a C conditional in parentheses.
void GenConnect (void)
Expects a connector on the stack and prints it directly. (connector == &&, ||)
void GenStopRun (void)
Generates an exit statement.
void GenParagraphCalls (void)
Writes paragraph sequence constants to the header stream and generates the big switch statement in main, with _cpi being the track variable, set to the sequence constant of the paragraph to call. The switch is basically a mapping of the former to the latter.
void GenStartLoop (void)
Prints "for (" and increments indentation and loop nesting.
void GenLoopInit (void)
Generates the statement inside the for to init the loop variable. Expects the value to be on top of the stack and the variable identifier under it. Sets CurLoopVar.
void GenLoopCondition (void)
Generates the loop condition. This is not a boolean, it is just " <= " always because of the way COBOL works. Expects the end value identifier or a constant on the stack.
void GenLoopIncrement (void)
Expects the increment value on the stack. If not specified by the source, the parser sets it to 1.
void GenEmptyClause (void)
Generates a ";", C empty statement.

semdecl.cc

This file contains functions related to declarations in the data division. Since parsing is sequential and a record is declared before its entries, several global variables are used to keep track of parenting properties. NestedRecordList is a stack of CobolRecords, the top of which denotes the record that is the parent of the current ParentRecord. I have no idea why I didn't just use Top for this, but I am afraid to change it. After all, it works :) A record is pushed onto this stack when it is declared and popped when a record with a smaller or equal level number is declared. Yes, level numbers are what determines nesting. The numbers do not mean a thing currently unlike in real COBOL. ParentRecord denotes the parent of the current scope. SFQueue is borrowed externally from semfile.cc to declare special names, which are treated essentially as any other file. VarInit is a queue holding the variable objects that need to be initialized.
void DeclareRecordLevel (void)
Declares one variable/record declaration. In COBOL these are usually on separate lines, so that's consistent with 'level'. Expects a 4-entry list of stuff on the stack, a detail taken care of by the parser. The difference between a record and a variable is decided by the presence of a picture field. Adoption of variables is another fine point. Each child needs to know who its parent is to set itself up. But if the child is a record, it is not finished setting up until it is closed, so the parent is not told to adopt until that moment. Variables can be adopted immediately upon declaration.
void CloseScopeLevels (WORD LastLevel)
This is the function that takes care of closing records. It checks whether LastLevel will close any records, does that if so, and adopts the closed record to its parent.
void InitializeVariables (void)
Creates initialization code for variables in VarInit.
void DeclareSpecialName (void)
Declares a special name. See above general notes for this file for discussion.

semfile.cc

This file contains file-related actions. Since COBOL has extensive file handling capabilities, this is a pretty big module. This also includes record association functions. Any particular implementation details are in CobolFile class, to which most of the functions will delegate. The record declaration is reused from semdecl.cc, since they are pretty much identical. The function that binds the record to the file descriptor is called by the parser after the first record level is declared.
void BeginFileDecl (void)
This function is for the SELECT statement. Expects an identifier on top of the stack designating the COBOL name to call the file. If the name is a reserved identifier printer, this file is declared to output to a temporary file, a print spool, which is flushed and printed on file's closing. (The exact sequence is flush, close, and print) FileToSelect variable is set here for later association with its other properties. This is the function that enters the file descriptor into the symbol table, so every fd must be SELECTed before anything else.
void EndFileDecl (void)
This function is for the SELECT statement. FileToSelect is set to NULL. Perhaps error checking could be added here for missing properties?
void GenFileDesc (void)
This is a semantic action for FD statement in the file section of the data division. Expects a file name on top and a file and under it an identifier, describing a previously SELECTed file. Appends the file record to FDInit queue for being associated with its parent record. This is done because although the record can be told what the parent fd is (it isn't), the fd cannot associate with the record at that point (after the first statement of record declaration), so this queue is created to associate all fds with their actual records, not just names of the records.
void AssociateFileRecord (void)
This copies the name of the associated record to the file record. See GenFileDesc for explanation.
void GenOpen (OpenModeType mode)
Generates a statement to open the file. Expects a list of fds on the stack. The mode is the same for all.
void GenClose (void)
Generates a statement to close the file. Expects a list of fds on the stack.
void GenRead (void)
Expects a record or a fd identifier on the stack, seeks, calls GenReadData, GenReadEnd, and GenEOFCheck in CobolFile.
void GenWrite (void)
Expects a source record, and a record or a fd identifier on the stack, sets up for appending, calls GenWriteData and GenWriteEnd in CobolFile.
void GenRewrite (void)
The only difference between this and GenWrite is seeking instead of setting up for append.
void AssociateRecordsWithFD (void)
This loops through FDInit queue, calling the function to bond records and file descriptors.
void SetFileStatus (void)
Expects the status on stack and passes it to CobolFile.
void SetAccessMode (AccessModeType mode)
void SetOrganization (OrganizationType org)
These just delegate.
void SetRelativeKey (void)
void SetRecordKey (void)
Expects identifier on stack and delegates. There is no distinction between record and relative keys since each file can only have one at a time.
void OpenSpecialFiles (void)
void CloseSpecialFiles (void)
These look through SFQueue for files that are opened at startup and closed at shutdown. A printer is such a file for instance.

seminfo.cc

This file contains informational functions, which just generate comments, taken from identification division. Available functions:
void SetProgramName (void)
void SetSourceComputer (void)
void SetObjectComputer (void)

seminit.cc

This file contains
void InitializeVariables (void)
Creates a sepearate function to assign all initial values specified upon declaration. This is a little more convenient for initializing long variables then an inline initialization.
void FinishDecl (void)
Closes all scopes and associates file descriptors.
void StartCode (void)
Generates the header for the first paragraph.
void EndCode (void)
This generates main() and all related things in it.
void StartProgram (void)
Opens output files and writes initial headers there, like the include files, standard variables, etc.
void EndProgram (void)
Closes output files.

semutil.cc

This file contains accessory routines to make coding a bit easier and perhaps a bit more portable.
void WriteError (char * str)
Writes a compiler error message. Sets the error flag, so that the glue routines could delete the output files as not to confuse the user to whether compilation was successful or not.
void WriteWarning (char * str)
Writes a compiler warning message.
void NIY (char * str)
Call this function from every stub you make. It just generates a comment in the output code and prints a compiler warning.
void GenComment (char * str);
Generates a comment in the output code.
BOOL ErrorOccured (void)
This function returns TRUE if WriteError was ever called.
void PrintConstant (StackEntry * entry, ostream& os)
Prints constant from the given stack entry on os.
CobolSymbol * LookupIdentifier (char * id)
Returns the CobolSymbol associated with the given identifier, generating a compiler error if not found.
void PrintIdentifier (char * id, ostream& os)
A shortcut to print the C name of id on os.
void GenIndent (void)
Indents to the current indentation level.
WORD CountIdentifiers (void)
Counts identifiers on the stack in a marked list. Removes the mark.
void ReverseIdentifiers (WORD nIds)
Reverses the order of nIds identifiers on the stack.
void Push (StackEntryKind kind)
Called by the parser, pushes the given kind of a thing on the stack using parser's global variables.
BOOL IsInSet (char c, char * set)
This is the same as member template in adtlib; I wrote this before I wrote that. Use either one.
void PrintStackEntry (StackEntry * se)
A debug function, enabled only if NDEBUG is undefined. Prints the given stack entry to stderr.
void PrintStack (void)
A debug function, enabled only if NDEBUG is undefined. Prints the whole stack to stderr.

Codegen classes

This part is still under construction, so you may be better off just reading the header files for now.

Contents

Hierarchy

Streamable
 |
 +--- CobolSymbol
 |     |
 |     +--- CobolData
 |     |     |
 |     |     +--- CobolRecord
 |     |     |
 |     |     +--- CobolVar
 |     |
 |     +--- CobolFile
 |     |
 |     +--- CobolLabel
 |
 +--- CobolConstant
 |
 +--- PictureType

Class descriptions

Defined in symbase.h and symbase.cc. Defines an abstract symbol table entry. This is an abstract virtual class, so don't try to make any. Restricts the symbol name to 50 characters (MAX_SYMBOL_LENGTH) and the prefix name to 80 characters (MAX_PREFIX_LENGTH). Defines the following member variables:
ParentCobolName [MAX_SYMBOL_LENGTH]
COBOL name of the parent, if any.
Prefix [MAX_PREFIX_LENGTH]
The list of all the parents in a C structure format to prepend the variable name to make it legal. This does not contain the prefix at all time. This is just a buffer variable.
CName [MAX_SYMBOL_LENGTH]
C variable name.
FullCName [MAX_FULLNAME_LENGTH]
Cached Prefix + CName.
CobolName [MAX_SYMBOL_LENGTH]
COBOL name of the variable or whatever.