[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Intel Hyper-threading and CPU design, threads etc...




----- Original Message -----
From: "Paul McFeeters" <paul.mcfeeters@ntlworld.com>
To: <cores@opencores.org>
Sent: Friday, December 07, 2001 11:57 PM
Subject: [oc] Intel Hyper-threading and CPU design, threads etc...


> Jim,
>
> Surely people have already done this "two independent processors on one
> chip"?

Please ready my message more closely. What I am proposing is completely
different from Intel's Hyper-threading dual processor in one chip carrier.
Intel's chip would be of absolutely no use on a Win98 system. Their chip
looks like two processors. For their chip to function you need a SMP
capable operating system (e.g. WinNT).

My proposal choriographies multiple dependant processors in the guise of
a single processor. In this manner a single processor O/S operates under
the impression there is a single processor while some slight of hand work
gets significang portions of the code to run on multiple processors.

> I've already expressed my wish to load 4 copies of a processor core
> into one FPGA,

Look at the Altera site. They show examples of placing many of their NIOS
processors in one core.

Again, the point is not simply more processors in one core. The end product
that I propose to make, using a technology I am not willing to expose at
this
time, that permits DOS, Win3.11, Win9x, WinME, WinXP (standard) to
run the O/S and application code as-is but do so with effective utilization
of
multiple processors. The technique relies on advantages that applied on
the virtualization of the processor.


> just waiting for the Damjan to tell me that the OpenRISC
> project is small enough to fit two cores on a 200K Spartan II. Of course
the
> problem still comes to concurrent memory access but even that can be
almost
> eliminated very cheaply with a little thought. The idea won't guarantee
> every CPU gets 100% zero-wait-state service but I would guess at around
90+%
> efficiency for each processor up to say 16 processors. I ran a test a
while
> back on a 4way Xeon box with multithreading and then took 3 processors out
> and ran the same test. The 4 processors managed to complete the tasks 2.4
> times quicker than the single processor so you lose approx. 40% of clock
> cycles due to memory conflicts? When (evil me) rewrote the test software
to
> be cruel and have every processor hitting the same memory block
> alternatively reading and writing to it the 4 processors managed 1.8 times
> quicker than the single processor, loss of 55% clock cycles there. The 4
CPU
> box with the extra processors was over 10 times the price of a single CPU
> box, great value for money there! lol

You give a very good argument of why the current design of an SMP system
built around the Intel design (even considering the Hyper-Threading) shows
the problem of diminishing returns. The MP technique I designed does not
use anything like the Intel design. There will be a diminishing return on my
design as well. However the slope of the curve/stair case will be much
shallower.

Discounting my design enhancements, if you look at the starting point of the
Transmeta design and discount for L1, L2, L3, ... memory caching techniques
as can be applied to both Transmeta and Intel(AMD) designs there is a
dissimilarity that can be exploited in the new design. In the Transmetta
case
the code is morphed into a seperate memory space then executed from there
using their VLIW processor. If the multiple internal strand processors
each had their own seperate morphed code area then each processor can
execuit with little interference of the others processing.

>
> The alternative is for people to use the transputer idea, each CPU has a
> local memory and jobs get passed to it via messages on its high speed
comms
> links.

In practice, with the new process, this is what would happen. Except that
the communication path would not be that of a high speed serial link.

>
> We do need smarter processors but surely they would work better with
smarter
> instructions too?

The problem is that you have over 100 million computers running out there.
New instructions generaly mean new coding. And 100+ million computers
is just too much dead weight to push or shove into a different direction.

The process I've envisioned does not require a change.

> I've got a lot of CPU enhancement ideas but I have to
> concentrate on Buffy-C at the moment so can't implement them. I'll give
you
> a quick demonstration though.
>
> Old Way:
>
> CMP AX,$2000
> JC  #4000
>
> Two instructions to do that? Why use two clock cycles for that? Can't we
> just have a CMPJxx instruction?

You are looking at too narrow of a scope. And making a common mistake
that a CISC system computes instructions in the order presented. This is
usualy not the case. The well written CISC processor may move the execution
of the instructioin preceeding the CMP to occure (begin) afer the CMP and
before the JC but before this load the address of the JC from memory to
register as well as estimating if the JC is preponderant and depending upon
this estimate  fetch and execuit the instruction following the JC to boot.
Instruction execution is not always as it appears.

Jim Dempsey


--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml