shasm
NAME
shasm - binary file assembler written entirely in the GNU Bash shell
PAGE DATE
Jan. 2001
INTERFACE
shasm filename
or interactively,
. shasm
output is to the created files output and listing
DESCRIPTION
Good news
shasm is a trivially extensible, utterly flexible collection of unix
shell routines for assembling arbitrary binary files, including 80386
machine language programs. You probably already know a lot about how it
works. shasm can be run on any computer you can get Bash or
similar installed on. It uses only the shell. shasm provides gobs of
user-feedback.
Bad News
shasm is about 100 times slower than gas. The functionality provided is
just what I need to assembly a particular programming language on 386+.
shasm needs a fairly featureful shell; ash won't cut it.
local-scope jargoneering
bytes, duals and quads are integers of 1, 2
and 4 bytes respectively. An oper is the first byte or the
characteristic byte pair of a specific x86 machine instruction. An
argument is any syntactic modifier to an instruction other than
prefixes. Arguments in shasm are separated by spaces. An operand
is the actual value the instruction will act on at runtime, as defined by
the arguments. Note that I've based this on "instruction" without
defining that. It usually means the thing there is an Intel name for, but
not always. A macro in shasm is a shell expression or routine
additional to what shasm provides.
amble
shasm is an assembler written in the GNU Bash unix-style command
interpreter "shell" to make my H3sm 3-stack programming language
maximally portable. The initial shasm is for that purpose, and is for a
subset of the x86 instruction set which is most useful for systems
programming languages and operating systems. A side-effect of that goal is
that shasm is a flexible and relatively easy-to-use means of creating any
kind of arbitrary binary file. Because it's a set of scripts, the scripts
themselves are the executable, the authoratative documentation, and are
part of the user interface. shasm does not use anything external to the
shell, such as sed, dd and so on. Most assemblers these days are geared to
run in the background supporting a high-level language. shasm is more for
coding machine language directly, interactively.
The initial shasm provides machine code assembly for a subset of the x86
instruction set. shasm implements a non-cryptic set of names for x86
instructions that I find helpful called asmacs. If you prefer
Intel names you can easily transliterate them back in, for the most part.
There are a few names that don't map one-to-one though.
Shasm interprets the file to be assembled as a shell script. The opcodes
in shasm are shell subroutines (functions), and any routine in shasm, and
any functionality of Bash, is available throughout the assembly source.
shell argument syntax
shasm's syntax is actually the behavior of shell argument processing.
Usually one machine instruction and it's arguments are assembled by one
shell routine and it's following arguments. An operator is followed by
space-separated arguments. Arguments to the operator can be shell
expressions if they are contiguous or quoted. The exception is that
instruction prefixes are handled like separate instructions by shasm.
Instruction delimiting and argument delimiting are thus shell-style.
Instructions are separated by ends of lines, and lines may be continued
with a terminating \ or subdivided with ; as per usual
in the shell. Arguments are separated by spaces, also as is typical in the
shell. shasm itself doesn't do any character-by-character parsing/lexing,
so some things that are usually prefixes in other assemblers are separate
tokens in shasm. For example, there are two separators for the source and
destination sides of an instruction's arguments. They are to and
from. These are the equivalent of a comma in e.g. GNU gas, and
must be separated from other arguments by spaces.
The most important variable is here, which is the current
assembly address. here is equivalent to period in other assemblers (and is
degenerately analagous to HERE in Forth). here is a declared integer. You
can use here in Bash expressions as you see fit. L is the label
specifier, and fillto is equivalent to the .org directive of
other assemblers. fillto fills from here to the address specified with
zero-bytes.
The high-level utility of the shell provides many other features typical
of assemblers implicitly. Examples:
- . <filename> is your .include directive.
- MOV () { copy $* ; } renames an opcode or
shasm routine.
- Shell routines more complex than the preceeding constitute "macros".
- Suffixes and other constructs are implicit to shell string concatenation.
- declare -i pi=314159 declares an integer constant.
- echo can send arbitrary progress info to the user anytime
- shell conditional and looping constructs can control assembly
I use the terms "byte", "dual" and "quad" for integers of 1, 2 and 4
bytes. The directives bytes, duals and quads
assemble integers literally. They take one or more numeric or expression
arguments, as is typical for shell commands. Bash and other recent
unix-like shells provide a rich set of operators, but expression syntax is
tricky. shasm itself is full of examples. For each argument to e.g.
"bytes", one integer of the size specified (a byte in this case) is
appended to the assembly. Arguments with larger values than the type being
appended are truncated, low-significance end surviving the truncate. For
x86 there are also operand qualifiers called byte, dual
and quad. These are not directives.
Assembler directives are machine-independant. shasm is therefor split
into two scripts; the main one and the one for the CPU in question.
Currently you have one choice of CPU; x86. shasm has no linker, sections,
or debugging functionality. Please let me know if any of that changes.
The L style labels are for branch resolution. If you want to label some
point in the assembly for other uses do
mydatalabel=$here
and be careful with name conflicts. The ascii directive assembles
a string.
A shasm opcode writes to two output files, output and
a.list. output is the raw binary assembly, and a.list is a
hexadecimal/octal listing. An item in a.list of the form 234 is a byte in
octal, whereas 22 is hex. The 386 modR/M and SIB bytes get built as octal
and might as well be displayed that way. Hopefully by the time you read
this there will be some ELF goodies in the shasm package for
running or linking shasm-generated code, and perhaps a libsys.a as shasm
source.
A shell is roughly 100 times slower than compiled C at low-level stuff.
I've just tried to avoid making shasm unnecessarily worse than that. More
importantly, I don't see any data capacities in shasm that are likely to
be exceeded by any reasonable file of code. I do think shasm makes machine
language less daunting, and may be useful for playing around with other
types of binary data files.
x86 shasm has it's
own seedoc for operator syntax and so on.
...................................................................
...................................................................