What I'm talking about here is using just the shell itself. One of the main reasons for doing an assembler in a sh is to minimize dependancies. Sometimes you want a toolchain with one, count 'em, ONE, link.
Fundamentally, one must be able to emit arbitrary bytes, normally to a file. Not what the shell is about, usually. Well, Bash can do
echo -ne "\125" >> some_file...and interpret the \125 as an octal representation of a byte. That's the worst of it. Even ash has similar, but more bothersome, functionality. We can now make arbitrary binary files. (125 octal is ASCII U, by the way).
Having the capability to output arbitrary binary data is not quite the ability to produce the desired binary data. Machine codes are nasty little composites. They have funny bitfields and so on. Being able to output a byte doesn't necessarily mean we can assemble the right byte. Let's take the 80386 "SIB" byte as an example. This is a byte appended to an opcode if the opcode makes a memory reference using a fancy addressing scheme. In bits, this is the format of a SIB byte...
<---most significant scale index base <---field 7 6 5 4 3 2 1 0 <---bitsLo and behold, the above fields just happen to map directly to the characters in an octal representation. Stuff like this is why octal was popular at one time to begin with. This is also related to the fact that the 386 and 68k have registers in clumps of 8. "index" and "base" in the SIB are register specs. These become the characters 0 through 7. Seen this way, it almost looks like it WAS designed for this. <mANIacAl laUGhteR>
We're not quite done with arithmatic, unfortunately. The x86 modR/M byte also is neatly octal, as are the x86 opcodes that take register fields. Other machines may not be so convenient. Even on x86 though, we have flow control branches to resolve. This is the point at which arithmatic becomes unavoidable, as far as I can see, and this is the point at which I decided to concentrate on Bash and not try to get shasm working on ash. </mANIacAl laUGhteR>
On the first pass at assembly time, forward branches in code are jumps to the future. The branch targets don't exist yet. It therefor takes 2 passes over the code to determine the target addresses of forward branches. All the shasm opcodes get interpreted twice, and the second time is "live". The state created by the first pass has to persist so the second pass can use what was learned on the first pass. This is done by sourcing the assembly source file twice, and changing a pass variable between invocations of the assembly script. Certain actions are conditional on that pass variable. On pass 1 labels are located, and on pass 2 branches are filled in. File output is restricted in pass 1 also. Keep in mind then that your assembly script will be run twice, and you can control per-pass activities with the pass variable.
There is some limited use of N-dimensional arrays of "integers" in shasm. The shell always stores int's as strings though, it seems. There are two times, it seems, when strings get temporarily converted to actual numerical values. In a let statement, and in an array subscript. That means you can do math in an array subscript. That means, in effect, N-dimensional arrays. Unfortunately, the entities being calculated come in as strings. That's why there's some strange if/else "arithmatic" in octacode () in the x86 code.
Tests do not convert strings to integers. For integer tests to work the entity has to be an integer type on arrival. I don't exactly get this, but I keep banging. ....................................................................... .......................................................................