[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [oc] Beyond Transmeta...



And did you seen anytime web <a 
href="http://www.alphaprocessors.com">http://www.alphaprocessors.co
m</a> ?

----- Original Message ----- 
From: Suboner@a...  
To: cores@o...  
Date: Wed, 7 Jun 2000 04:18:49 EDT 
Subject: Re: [oc] Beyond Transmeta... 

> 
> 
> > Yes, that's true - I haven't looked at it this way... 
> > So if I understand you correctly you are trying to calculate 
> each 'bit 
> > plane' 
> > as fast as possible (from its point of view). I suppose this 
> could be 
> > theoreticaly 
> > done faster than calculating 32b operands even for sequential 
> programs. 
> > But I am worried that programs would be extremely large. 
> 
> Yeah, that is probably one big issue, except when the network is 
> configured 
> to work like an X86 or RISC processor, then what you have is a 
> large chunk of 
> memory being used to create the necesary hardware in the network 
> itself to be 
> able to run the software in a normal serial manner, this would 
> decrease 
> memory usage but also decrease performance (sound familiar :), the 
> old fight 
> between memory and performance). 
> 
> > In how many cycles can you execute this? How many instructions 
> do you need? 
> > c = add32(a,b) 
> > e = add32(c,d) 
> 
> Well in a network manner with a minimum of 8 1bit processors, it 
> would be 
> like 32 clocks, less processors increase that. But the minimum that 
> could be 
> required could be only 1clock (depending on what bits change), if 
> only the 
> first bit changes, or if 2 clocks if only 1 bit changes, of course 
> if the bit 
> causes a turn over causing many carries, will increase that. The 
> amount of 
> instructions is 1 for the first bit, 3 for the second bit and 4 for 
> every 
> following bit). Of course there are other ways to arrange a 
> network, I 
> believe that one is the most parallel I could create, I may be able 
> to create 
> an even more parallel one but it may not have much in performance 
> gain, it 
> would be a kind of temporal diffrence in that the first initial 
> pass could 
> have 63 instructions to be done in parrallel (2 for every bit 
> except the 
> last), and every pass after that is 2 instructions. There are many 
> ways to 
> configure a network of bits, and they all might have more benefits 
> then 
> others. 
> 
> > BTW: I am not such pesimistic guy trying to criticise 
> everything. When we 
> > were 
> > developing or2k such 'comments' were very welcome. 
> 
> Well, I was not so sure you were... Actually this conversation has 
> been good, 
> there are some things I did not realize about this that were 
> brought up 
> through this discussion. Like the multiprocessor way of viewing the 
> network, 
> if it was not for the questions I would not have tried to look at 
> things in 
> diffrent ways. 
> 
> > I suppose you can link data back to loop start, can't you? 
> Parallely you can 
> > detect 
> > whether loop should be finished. Of course this isn't normal 
> equation 
> > anymore... 
> > But otherwise I don't see a problem here. 
> 
> Oh, yeah... ha... I did not think of that. :) 
> 
> > Yes that is true for basic blocks. But not for functions. 
> Compiler would 
> > create 
> > separate network for each function (there are too many 
> problems otherwise). 
> > You cannot link them dinamicaly together. I won't go in 
> detailed 
> > explanation, 
> > even when detecting parallelity between functions there are 
> certain 
> > problems, 
> > unless program (meaning of program) itself is modified. ILP 
> here stand for 
> > inductive logic programing. 
> 
> What I'm getting at though, is not to modify the programs source 
> code, but to 
> compile it into a network and to shift the network around into a 
> more 
> parallel program. What I think the way it would work would be to 
> shift parts 
> of the network which compose a function into other functions, so 
> that as you 
> shift them around they sort of lose them selves as a descreet and 
> seperate 
> function and instead become an integrated part. 
> 
> I'm not exactly sure what you mean by creating seperate networks 
> for each 
> function. You could mean to create a seperate network for each 
> function 
> call/usage, which I think you mean. For that, I would say that you 
> do not 
> necesarily have to do it that way, you could create a network that 
> acts like 
> a mini-highlevel processor (high level instructions), that will 
> reduce the 
> amount of redundantly creating the function networks, by allowing 
> the 
> function to be called in an high level instruction. Instead of 
> having them 
> directly connected to each other they would act like many mini 
> processors 
> connected together. If you want to think about it diffrently try 
> thinking 
> about as though you could either have billions of 1bit processors 
> working 
> simultaniously (virtually of course since they only work on bits 
> that 
> change), or you could have a few 8bit processors (faked by the 
> network) that 
> do various tasks within the system, or you could have even fewer 
> 32bit 
> processors doing various tasks, or you could have one 64bit 
> processor that 
> does everything (like a normal CPU), the latter taking up the least 
> amount of 
> memory, while the 1bit network takes up the most memory. Its really 
> scalable 
> environment, that allows you to create any kind of processor that 
> is 
> necesary, it will turn your functions into a processor if it needs 
> to, it 
> will basicly balance between consuming a lot of memory and 
> resources to 
> taking very little memory or resources. If it was not for this 
> discussion I 
> don't think I would have realized that. 
> 
> Leyland Needham 
> 
--
To unsubscribe from cores mailing list please visit http://www.opencores.org/mailinglists.shtml