...making Linux just a little more fun!

<-- prev | next -->

PowerPC Assembly Programming on the Mac Mini

By Pramode C.E.

The Mac Mini is a very compact desktop computer designed by Apple. Based on the PowerPC (PPC) G4 CPU, the machine is ideal for those who wish to experiment with GNU/Linux on a non-Intel platform. In this article, we will examine how to get Ubuntu Linux up and running on the Mac Mini. Assembly language skills on a RISC CPU like the PowerPC are very much in demand in the embedded-systems industry - and we shall use the PPC Linux system to do a bit of assembly language hacking!

The Mac Mini Hardware

The Mac Mini runs on a PowerPC CPU having a clock speed of 1.25GHz (a higher-end version is also available). Figure 2 shows the output of running 'cat /proc/cpuinfo' on my machine:

processor	: 0
cpu		: 7447A, altivec supported
clock		: 1249MHz
revision	: 1.2 (pvr 8003 0102)
bogomips	: 1245.18
machine		: PowerMac10,1
motherboard	: PowerMac10,1 MacRISC3 Power Macintosh 
detected as	: 287 (Unknown Intrepid-based)
pmac flags	: 00000000
L2 cache	: 512K unified
memory		: 256MB
pmac-generation	: NewWorld

There is a 40GB IDE hard disk, two USB ports, one firewire port, built-in sound and a "slot-loading" CD/DVD drive. The power supply, rated at 85W, is provided as an external 'brick'. The unit does not come with a monitor or keyboard - you have to provide them yourself. Both the keyboard and the mouse are USB-based. I had no difficulty getting my Microsoft USB mouse detected, but I had to try a few different brands before I got my USB keyboard working.

There are some minor hardware peculiarities - one is the absence of an 'eject' button for the CD drive. If you are running Linux or MacOS, software eject will work; otherwise, holding the mouse button down during the boot process will do the trick. Another idea is to get into 'OpenFirmware' (similar to the CMOS setup on the PC) during the boot process by holding down the Alt-Windows-O-F keys and then executing the 'eject cd' command. Booting from the CD requires holding down the 'c' key during powerup.

Installing Ubuntu

Ubuntu Linux has a PowerPC edition; the CD image can be downloaded from http://ubuntulinux.org. The Mac Mini comes pre-installed with MacOS X in the single partition which occupies the whole of the hard disk. The first step, then, is to run the installation CD and get OS X into a smaller partition (say 20 GB). Once this is done, you can boot with the Ubuntu installation CD and create a few partitions for Linux to work with. The rest of the Ubuntu installation process will proceed very smoothly and you will have a MacOS X/Linux dual boot system working perfectly.

Tweaking Ubuntu

Ubuntu is a nice end-user distro; but developers will have to put in some effort to get their favourite tools working. I had to do an:

apt-get install gcc
to have 'gcc' working. I downloaded the 2.6.8.1 kernel from kernel.org and tried compiling it with
make menuconfig
which failed because the 'ncurses-devel' package was missing. The problem was solved by getting ncurses-5.4.tgz from a GNU FTP site and installing it from source.

Once the kernel compilation process is over, you will see a file called 'vmlinux' under the root of the kernel source tree. This has to be copied to the '/boot' directory under a different name, e.g. 'mykernel'. PowerPC systems use the 'yaboot' boot loader whose configuration file '/etc/yaboot.conf' looks similar to LILO's config file. Here is what I added to my 'yaboot.conf':

image=/boot/mykernel
	label=mykernel
	read-only
	initrd=/boot/initrd.img
	append="quiet splash"
The 'ybin' program has to be executed to install the boot loader.

Learning PowerPC Assembly

The PowerPC is more of a processor specification rather than a processor. Originally developed by an Apple-IBM-Motorola alliance, there are a lot of processors in the market which can be called PowerPCs; the Mac Mini uses a processor called 7447A. PowerPC chips are often used in embedded devices as well as in high-end servers.

Understanding the architecture and the assembly language of a microprocessor is crucial in tasks which involve low level interaction with the machine - like designing/debugging operating systems, compilers, embedded applications, etc. The Mac Mini running GNU/Linux can be used by universities and engineering colleges to provide computer architecture education to its students. We shall examine the basics of PowerPC assembly language programming in the rest of this article, mostly with a view towards understanding the code which GCC generates.

Getting Started

The PowerPC is a Reduced Instruction Set Computer (RISC). All instructions are encoded uniformly in 4 bytes and the only instructions which access memory are load and store instructions. There is a large register set consisting of 32 integer registers, 32 floating point registers, a condition register (CR), a link register (LR) and a few others. Programmers familiar with the x86 instruction set will note the absence of special registers like the stack pointer - the idea is that one of the general purpose registers itself can be used as a stack pointer. An Application Binary Interface (ABI) defines the conventions to be adopted; the SVR4 ABI, which ppc32 Linux follows, requires GPR1 (General Purpose Register 1) to be used as a stack pointer. Also, the ABI requires arguments to a function to be passed in registers starting with GPR3. A function can freely modify GPR3 to GPR12 - the caller is expected to save them if necessary.

Listing 1 shows a simple assembly language program. Let's see what each of the instructions does.

The instruction:

li 4, 0x10
loads the immediate (constant) value 0x10 to the general purpose register 4; x86 programmers may be bothered by the use of pure numbers to represent registers rather than more meaningful names like r0, r1 etc. The instruction:
add 4, 4, 5
may be thought of as doing the algebraic operation:
r4 = r4 + r5
That is, sum the contents of general purpose registers 4 and 5 and store the result in GPR4. The instruction:
addi 4, 4, 5
does the operation:
r4 = r4 + 5
ie, simply add the constant value 5 to contents of register r4.

The 'stwu' (store word and update) instruction is a bit tricky. The general format is:

stwu rS, d(rA)
The instruction stores the contents of register rS into a memory location whose effective address has been computed by taking d+rA. At the same time, rA is updated to become equal to the effective address. Note that the general purpose register R1 is taken to be the stack pointer, so 'stwu 1, -16(1)' stores the contents of the stack pointer register to a position at offset -16 from the current top of stack and decrements the stack pointer by 16. A sample interaction with 'gdb' shows that this is indeed the case.

What remains is the instruction 'blr' which should be read as 'branch to link register'. The Link Register (LR) is a special register which holds the return address during a subroutine call. Our 'main' was called from a standard library 'start' routine; LR will have the address of the instruction which main should return to. Doing a 'blr' will result in execution getting transferred to the address contained in the Link Register.

Using GDB to trace programs

The GNU Debugger helps us single-step through assembly language programs; we will also be able to examine the contents of memory locations and registers after executing each instruction. First, we have to compile the program like this:

cc -g listing1.s
and invoke gdb:
gdb ./a.out
Here is a sample interaction with GDB:
Breakpoint 1, main () at listing1.s:5
5		li 4, 0x10
Current language:  auto; currently asm
(gdb) s
6		li 5, 0x20
(gdb) s
7		add 4, 4, 5 
(gdb) p/x $r4
$1 = 0x10
(gdb) p/x $r5
$2 = 0x20
(gdb) s
8		addi 4, 4, 5
(gdb) p/x $r4
$3 = 0x30
(gdb) s
9		stwu 1, -16(1)
(gdb) p/x $r4
$4 = 0x35
(gdb) p/x $r1
$5 = 0x7ffff8e0
(gdb) x/4xb 0x7ffff8e0-16
0x7ffff8d0:	0x7f	0xff	0xf9	0x44
(gdb) s
main () at listing1.s:10
10		addi 1, 1, 16
(gdb) p/x $r1
$6 = 0x7ffff8d0
(gdb) x/4xb $r1
0x7ffff8d0:	0x7f	0xff	0xf8	0xe0
(gdb) p/x $lr
$7 = 0xfebf100
(gdb) s
main () at listing1.s:11
11		blr
(gdb) s
0x0febf100 in __libc_start_main () from /lib/libc.so.6
The 's' (step) command is used for stepping through one instruction. We can print the value of a register, say, GPR4 by doing 'print $r4' or 'p/x $r4' (print in hex). The contents of a memory location can be printed by executing a 'x/4xb' command. We note that executing the 'blr' instruction resulted in control getting transferred to the location 0x0febf100 - this is the address which the Link Register (LR) was holding.

The GDB command 'disas' (short form for 'disassemble') can be used to view the assembly code in a better way - here is the output obtained by running 'disas main':

(gdb) disas main
Dump of assembler code for function main:
0x100003d0 <main+0>:	li	r4,16
0x100003d4 <main+4>:	li	r5,32
0x100003d8 <main+8>:	add	r4,r4,r5
0x100003dc <main+12>:	addi	r4,r4,5
0x100003e0 <main+16>:	blr

The 'objdump' command too can be used to disassemble the machine code.

Subroutine Call

Branching to a subroutine results in the return address being stored in the Link Register - if this subroutine calls another one, the current address in LR will be lost, unless it is saved on the stack. Listing 2 shows a simple C program and Listing 3 is part of its assembly language translation obtained by running:

gcc -S -fomit-frame-pointer listing2.c
Let's try to work out the code line by line.

The first line of 'main' simply decrements the stack pointer by 16 and stores the old value at that location. We are basically building a stack frame to hold the local variables defined within the function. Let's say the initial value of the stack pointer is 1000; after the first line, it becomes 984. The next instruction, 'mflr 0' copies the contents of the link register to general purpose register 0 which is then stored onto the stack by the 'stw' instruction at a location whose address is found by adding 20 to the value of the stack pointer register r1 (ie, location 1004).

The next two lines copy the number 3 to r0 and then stores it at the location whose effective address is computed by adding 8 to the contents of r1 (ie, location 992); this is the variable 'm' defined in our C program. The 'load word and zero' (lwz) instruction loads the register r3 with the value of 'm' and executes a 'branch and link' to function 'fun'. The 'bl' instruction transfers control to the function 'fun' and at the same time loads the Link Register with the address of the instruction immediately after the 'bl' in 'main'. We note that the old value of LR (which is the address to which 'main' is to return to) is overwritten, but that is not a problem because we have already saved this value on the stack.

The function 'fun' sets up its own stack frame and copies the value it received in register r3 onto the stack thereby creating the local variable 'x'. This is then copied into r9, incremented and copied to r3 (the instruction 'mr 3, 0' copies the value in r0 to r3). The function returns by doing a 'blr' - the stack pointer is adjusted back to its initial value before the return is executed.

Back in 'main', the value in r3 (the return value) is copied to the variable 'm' stored on the stack. The old value of the link register saved on the stack is copied to r0 after which the 'mtlr' (move to link register) instruction transfers it to LR. The function then returns by doing a 'blr'. The entire sequence of events can be understood clearly by stepping through the code using GDB.

Invoking System Calls

The 'arch/ppc/kernel/misc.S' file under the PPC32 Linux kernel source tree defines a data structure called a 'sys_call_table' which holds pointers to all the system calls defined in the kernel. Here is a part of this array:

	.data
	.align 4
_GLOBAL(sys_call_table)
	.long sys_restart_syscall /* 0 */
	.long sys_exit
	.long ppc_fork
	.long sys_read
We note that the address of the fork function is stored in slot 2 of this array - the system call number of 'fork' is therefore 2. It's possible to write a simple assembly language program which invokes 'fork' - the idea is to execute the 'sc' instruction after storing the call number in r0 and any arguments in r3, r4 etc. Listing 4 demonstrates the idea. The program goes in a loop (by using the 'branch' instruction, 'b') after invoking 'fork'. If we run 'ps ax' on another console, we would be able to see two copies of 'a.out' running - proof that 'fork' has indeed been invoked!

Taking the address of a variable

Listing 5 is a simple C program in which we store the address of a local variable in a pointer and then dereference the pointer to modify the pointed-to object. Listing 6 is part of the assembly language translation obtained by calling 'gcc -S -fomit-frame-pointer'. We see that the assembly code is not doing anything special. The situation is a bit different when we try to take the address of a global object. The problem is that because each instruction (opcode + operands) is encoded in 32 bits, it is impossible to store a 32 bit address as part of the operand of a PowerPC instruction - we will have to split the address into two parts and add them up in a 32 bit register using two instructions. Listing 7 is part of the assembly output that we would get if the variable 'i' in Listing 5 had been defined as a global.

Let's compile the code into an 'a.out' and execute the command:

nm ./a.out
'nm' shows you the names and addresses of all the globally visible symbols in your program; on my machine, I see that the variable 'i' has been assigned the address 0x100108a8. Split up into two parts and expressed in decimal, the most significant 16 bits is 4097 and the least significant 16 bits is 2216. Coming back to the assembly language program, the two lines we are interested in are:
lis 9, i@ha
la 0, i@l(9)
A disassembled listing of the program displays these lines as:
lis r9, 4097
addi r0, r9, 2216
The assembler has encoded the 'load algebraic' instruction as an 'add' which has the same effect; this is something very common in PowerPC assembly programming. The notation 'i@ha' results in the higher 16 bits of the address of 'i' getting extracted and 'i@l' yields the lower 16 bits. The 'load immediate shifted' (lis) instruction loads 4097 into the most significant bits of r9 and the add instruction simply combines it with the lower 16 bits of the address of 'i'.

Smashing the stack

PowerPC programs too are vulnerable to the same buffer overflow attacks so common on x86 architectures. A function saves the contents of the Link Register on the stack if it calls some other function; it is very easy to overflow a buffer and overwrite this return address. Listing 8 shows a C program in which we overflow the buffer 'a' - we are adding 12 to the contents of a[13]. Reading the assembly code produced by doing

cc -S -fomit-frame-pointer listing8.c
tells us that a[13] refers to the memory location where the Link Register has been saved. Adding 12 to it results in the function returning back to its caller (main) with the next 3 instructions skipped ('m++' takes 3 instructions and each instruction is encoded in 4 bytes). So the program prints 57 instead of 58. Listing 9 shows the relevant assembly code segment.

Further Reading

We have just had a glimpse into the fascinating world of assembly language programming - readers interested in Computer Architecture should refer to the book 'Computer Organization and Design - The Hardware/Software Interface' by Patterson and Hennessy to get some idea of the amazing techniques used by microprocessor designers to convert a slice of silicon to a marvel of engineering.

IBM Developerworks routinely publishes articles on POWER CPU architecture and programming. If you would like to learn how to do this, start off with an introduction to the PowerPC assembly language programming at http://www-106.ibm.com/developerworks/library/l-ppc/. There is also a short article on the Mac Mini from an embedded perspective at http://www-128.ibm.com/developerworks/power/library/pa-macmini1/. If you wish to load Debian on your Mac Mini, you might like to consult http://www.sowerbutts.com/linux-mac-mini/.

Many PowerPC CPU's come with a fast SIMD unit called the Altivec - refer tohttp://www-128.ibm.com/developerworks/power/library/pa-unrollav1/ to learn more about Altivec optimizations. If you are developing multithreaded applications, http://www-128.ibm.com/developerworks/library/pa-atom/ will tell you how to implement atomic operations in PPC assembly.

 


[BIO] As a student, I am constantly on the lookout for fun and exciting things to do with my GNU/Linux machine. As a teacher, I try to convey the joy of experimentation, exploration, and discovery to my students. You can read about my adventures with teaching and learning here.

Copyright © 2005, Pramode C.E.. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 117 of Linux Gazette, August 2005

<-- prev | next -->
Tux