[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oc] Re: Processor Instruction reply for Andreas



On Mon, Dec 10, 2001 at 06:31:57PM -0000, Paul McFeeters wrote:
> Andreas,
> 
> > > Old Way:
> > >
> > > 	CMP AX,$2000
> > > 	JC  #4000
> > >
> > > Two instructions to do that? Why use two clock cycles for that? Can't we
> > > just have a CMPJxx instruction? Takes two values/register combinations
> and
> > > based upon the results of the comparisons jumps to the destination or
> not.
> 
> > There are two things to be done.  A subtraction and a conditional jump.
> > Just because you put them in the same instruction doesn't make them in
> > one cycle.  If they do execute in one cycle, I guess the cycles on that
> > machine are generally longer.
> 
> Wrong. I tested it out and it makes simple sense. When a compare is done
> then
> the HDL copies the value of '1' into the relevant flag. As any HDL
> programmer
> will tell you copying a 32bit value versus 1 bit doesn't take a ps longer so
> my way is quicker.

Ok, I can see that.  Provided that your CPU has a separate unit to
calculate branch targets (so probably not on a simple CPU that uses its
one ALU for determining the target address).

> > > This might also eliminate quite a few status bits which may make the ILP
> > > much easier?
> 
> > So you'd always need an extra compare instead of just using the status
> > bits set by previous arithmetic.  Overall a slower architecture, I
> > assume.
> 
> I don't remember any language except assembler doing a IF...THEN statement
> like they do. If you write it out in as a high level language it looks
> really silly doesn't it?

Looking at some compiler output it seems that cmp/test + conditional
branch is more common than other combinations.  That was in some driver
code however.

> SF is Status Flag
> Value1 and Value2 can obviously be immediate values, registers or memory
> 
> SF = IF Value1 < Value2
> IF SF = 1 THEN
> 	PC = JumpPC ;
> 
> I have never seen (or personally written) an assembler program that waits
> to test the bits set from a 'much' previous compare operation. As any
> instruction
> could (conceivably) alter the status flags the rule in assembler programming
> is test and act.

I didn't mean much earlier.  If you have (in C) something like

	x = something + otherthing;
	if (x == 0) {
		...
	}

then doing a test after the addition is a bit silly.  On m68k a move
sets the flags, so you can also conditionally branch right after a
register load/save.  But of course all that only works on test against
zero/negative, so it isn't as common as I guessed (especially not in
driver code).

> You have a value of 10 in register A, you want to add 5 to it and store the
> result
> in register B, how many CPU cycles does it take?
> 
> Currently 2 but in mine only one as my MOV instructions can also modify the
> data
> in various ways before it arrives at its destination.
> 
> MOV+, A, 5, B

Come on, that isn't the freshest idea.  It's a three operand
architecture and many RISC machines implement that.  So instead of a
confusing MOV+ it's actually something like

	ADD	r1,#5,r2

If you aren't familiar with the ARM instruction set I suggest you take a
look at it.  It implements some funky features like all instructions
being conditional (not just branches, many branches can in fact be
omitted), it's three operand and it can shift/rotate one operand in an
instruction (it doesn't even have a dedicated shift instruction, moves
with modifiers do just fine).

I haven't actually worked with ARMs, but this is one of the coolest
instruction sets I know.

> My new commands are all designed to still execute in exactly the same clock
> cycle
> as there wouldn't be any point in developing an instruction to replace 2
> instructions
> and then taking twice as long now would there?

If you have opcode space to spare it can be a good idea.  I still
maintain that letting instructions grow excessively to implement
unflexible combinations is bad.  Your saved CPU cycles will be wasted on
memory cycles (if you intend to go faster than 60MHz, that is, and don't
have all your memory as fast and expensive SRAM).

It's the CPU's task to reorder and combine instructions.  Or think of it
that way: why define a 64 bit opcode to allow cmp and branch in one
instructions instead of having that in two 32 bit instructions?  If the
current instruction is a compare and the CPU just checks if the next one
is a branch, it can execute them in one.  That's relatively simple (at
least not as complicated as a full instruction scheduler) and doesn't
require to sledgehammer your instruction architecture into shape.

-- 
Andreas Bombe <bombe@informatik.tu-muenchen.de>    DSA key 0x04880A44
--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml