shasm

NAME

shasm - binary file assembler written entirely in the GNU Bash shell

PAGE DATE

Jan. 2001

INTERFACE

shasm filename

or interactively,

. shasm

output is to the created files output and listing

DESCRIPTION

Good news

shasm is a trivially extensible, utterly flexible collection of unix shell routines for assembling arbitrary binary files, including 80386 machine language programs. You probably already know a lot about how it works. shasm can be run on any computer you can get Bash or similar installed on. It uses only the shell. shasm provides gobs of user-feedback.

Bad News

shasm is about 100 times slower than gas. The functionality provided is just what I need to assembly a particular programming language on 386+. shasm needs a fairly featureful shell; ash won't cut it.

local-scope jargoneering

bytes, duals and quads are integers of 1, 2 and 4 bytes respectively. An oper is the first byte or the characteristic byte pair of a specific x86 machine instruction. An argument is any syntactic modifier to an instruction other than prefixes. Arguments in shasm are separated by spaces. An operand is the actual value the instruction will act on at runtime, as defined by the arguments. Note that I've based this on "instruction" without defining that. It usually means the thing there is an Intel name for, but not always. A macro in shasm is a shell expression or routine additional to what shasm provides.

amble

shasm is an assembler written in the GNU Bash unix-style command interpreter "shell" to make my H3sm 3-stack programming language maximally portable. The initial shasm is for that purpose, and is for a subset of the x86 instruction set which is most useful for systems programming languages and operating systems. A side-effect of that goal is that shasm is a flexible and relatively easy-to-use means of creating any kind of arbitrary binary file. Because it's a set of scripts, the scripts themselves are the executable, the authoratative documentation, and are part of the user interface. shasm does not use anything external to the shell, such as sed, dd and so on. Most assemblers these days are geared to run in the background supporting a high-level language. shasm is more for coding machine language directly, interactively.

The initial shasm provides machine code assembly for a subset of the x86 instruction set. shasm implements a non-cryptic set of names for x86 instructions that I find helpful called asmacs. If you prefer Intel names you can easily transliterate them back in, for the most part. There are a few names that don't map one-to-one though.

Shasm interprets the file to be assembled as a shell script. The opcodes in shasm are shell subroutines (functions), and any routine in shasm, and any functionality of Bash, is available throughout the assembly source.

shell argument syntax

shasm's syntax is actually the behavior of shell argument processing. Usually one machine instruction and it's arguments are assembled by one shell routine and it's following arguments. An operator is followed by space-separated arguments. Arguments to the operator can be shell expressions if they are contiguous or quoted. The exception is that instruction prefixes are handled like separate instructions by shasm. Instruction delimiting and argument delimiting are thus shell-style. Instructions are separated by ends of lines, and lines may be continued with a terminating \ or subdivided with ; as per usual in the shell. Arguments are separated by spaces, also as is typical in the shell. shasm itself doesn't do any character-by-character parsing/lexing, so some things that are usually prefixes in other assemblers are separate tokens in shasm. For example, there are two separators for the source and destination sides of an instruction's arguments. They are to and from. These are the equivalent of a comma in e.g. GNU gas, and must be separated from other arguments by spaces.

The most important variable is here, which is the current assembly address. here is equivalent to period in other assemblers (and is degenerately analagous to HERE in Forth). here is a declared integer. You can use here in Bash expressions as you see fit. L is the label specifier, and fillto is equivalent to the .org directive of other assemblers. fillto fills from here to the address specified with zero-bytes.

The high-level utility of the shell provides many other features typical of assemblers implicitly. Examples:

I use the terms "byte", "dual" and "quad" for integers of 1, 2 and 4 bytes. The directives bytes, duals and quads assemble integers literally. They take one or more numeric or expression arguments, as is typical for shell commands. Bash and other recent unix-like shells provide a rich set of operators, but expression syntax is tricky. shasm itself is full of examples. For each argument to e.g. "bytes", one integer of the size specified (a byte in this case) is appended to the assembly. Arguments with larger values than the type being appended are truncated, low-significance end surviving the truncate. For x86 there are also operand qualifiers called byte, dual and quad. These are not directives.

Assembler directives are machine-independant. shasm is therefor split into two scripts; the main one and the one for the CPU in question. Currently you have one choice of CPU; x86. shasm has no linker, sections, or debugging functionality. Please let me know if any of that changes.

The L style labels are for branch resolution. If you want to label some point in the assembly for other uses do


	mydatalabel=$here

and be careful with name conflicts. The ascii directive assembles a string.

A shasm opcode writes to two output files, output and a.list. output is the raw binary assembly, and a.list is a hexadecimal/octal listing. An item in a.list of the form 234 is a byte in octal, whereas 22 is hex. The 386 modR/M and SIB bytes get built as octal and might as well be displayed that way. Hopefully by the time you read this there will be some ELF goodies in the shasm package for running or linking shasm-generated code, and perhaps a libsys.a as shasm source.

A shell is roughly 100 times slower than compiled C at low-level stuff. I've just tried to avoid making shasm unnecessarily worse than that. More importantly, I don't see any data capacities in shasm that are likely to be exceeded by any reasonable file of code. I do think shasm makes machine language less daunting, and may be useful for playing around with other types of binary data files.

x86 shasm has it's own seedoc for operator syntax and so on. ................................................................... ...................................................................