I attended last years Super Computing conference in Portland Oregon, (Nader country), which was quite delightful. The bit which I liked most was the presence of industry, academia and the national labs, all showing off what kind of super computing they are selling or what they are doing with their super computer toys. Both the technical sessions and the show floor were entertaining in the content they presented. This was pretty much the same with this years conference. Unfortunately, there was some overlap from what I saw last year with this year so the, "Wow, this is soooo coool" feeling wasn't quite there, as it was last year. Be that as it may, there was some cool stuff which I want to pass on to those of you who don't have the money, time or both to get yourselves down to Dallas to attend the conference. (The conference fee is $700, which is quite on the steep side for the kind of conferences I attend. Although you end up paying at least twice to go to the full COMDEX conference and you get about a 10 times more information out of the Super Computing conference then you get out of the more main stream COMDEX shows.)
The show lasted 4 days, and I don't want to give you a full detailed account of what I saw, you'll quickly be off surfing to other sites if I did. What I'll do is focus on what I considered to be the highlights of the conference. And as you all now, I'm a bit of a Linux/Open Source/Free Software enthusiast, (Maybe I should rephrase that "Free Software/Open Source/Linux enthusiast (FSOSLE)", to give proper credit to Richard Stallman) and thus I'll tend to concentrate on those topics.
The opening day of the conference was Tuesday, Nov 7th. Yup, voting
day. The key note speaker was Steven Wallach, the guy who helped
design the Data General 32-bit Eclipse MV superminicomputer, and is
now with Center Point venture capital firm. His talk was titled
"Petaflops in the Year 2009" and dealt with how he would envision the
Petaflop computers of the future. The main point of his talk was that
the basic core of the future Petaflop computer is being built right
now to service the backbone of the Internet. I must say, Steve Wallach
did convince me of his arguments. The basic problem right now is that
the chip manufactures or CPU designers or whatever you call these
folk, are starting to reach physical boundaries imposed by Mother
Nature and her laws of physics which govern our universe. Moore's law
only goes so far and there is a barrier which is the speed of
light. It could be that some time in the future, one will be able to
use takions in some kind of semiconductor to operate transistors which
effectively switch faster than the seed of light, (Think about it,
with one of them in your PC, when you surf the Internet, you just
don't click from one hyper link to the next, you get to surf through
space-time. Click here to go to a chat room 2 days from now... Click
here to see the price of your stock 10 minutes from now...) Because of
these limitations, the bottle necks which are forming are the ones
which limit the speed at which you can get data into and out of your
CPU. This is where the work being done by Lucent and others comes into
play. Lucent is trying to get terabytes of data per second through a
routing node. One has to do this by being able to guide the different
wavelengths of light from one input port on the router to an output
port on the router without slowing down the data rate. This
architecture of data in, data out and very high speeds is basically
the inner core of the processor design needed for future super
computer systems. Remember, super computer systems will never be made
up of one big, really fast CPU. They will be made up of many small
nodes, interconnected through some kind of data mesh. Therefore Steve
Wallach emphasizes that in order to break the last bottle neck in
current CPU designs, one needs to push the data around between CPU's
optically and not try to push it in and out electrically. The guys
building the backbone of the Internet are doing this, and thus the
guys building the next generation CPU's should be talking to the guys
over at Lucent. By the way, Steve mumbled something about how Linux
would be running on this Petaflop computer. Look for the announcement
on Slashdot sometime in the year 2009....
The next session after the keynote which I attended was the "Who wants to be a Billionaire" panel discussion. That's a stupid question, of course I want to be a billionaire. The panel discussion was headed up by the same guy who gave the key note, Steve Wallach. There were three guys on the panel. They were Scott Grout of Chorum, Matt Blanton of Startech and Jackie Kimzey of Sevin Rosen Funds. Scott Grout read his introductory comments and didn't say much else. Basically, he worked for some telecommunications company which went through the venture capital funding round and got itself established. Matt Blanton and Jackie Kimzey gave their remarks which again, I can't quite remember the details of. I'm too lazy to check my notes right now.
This concept is so foreign to government research. At least in the government laboratory environment which I work in. Our time is basically worthless and is seldom taken into account when we work on projects. A statement like "the quality of your team" rings rather hollow around here. I think we, as scientists, tend to devalue our time, because of the tight job market for positions where one can freely do research with particle colliders. Thus you put up with the fact that you, Dr. so and so, who just received her Ph.D. in High Energy Physics, has to break out the RJ45 clamp and start cutting CAT-5 cable so that she can wire up the crate controllers for her experiment. That along with having to install and maintain her Linux cluster so that she can store and analyze her data. And forget that trip to XYZ conference, overtime has to be paid to the electricians because if not, her experiment wouldn't be ready on day one when the accelerator turns on and delivers her beam. And believe me, the unionized electricians only work on her experiment if she pays overtime. And then comes the kicker. "Sorry, you spent too much time developing software and hardware and not enough time doing science. Look at your publication record, it stinks! No tenure for you. Go find a job somewhere else...." Don't think I'm kidding, this is why its so hard to attract new talent into High Energy and Nuclear physics.
Then we have the other side of the spectrum, the VC side. Some president of some start-up at this panel discussion, got up and recounted an anecdote regarding a board meeting he attend. He told the board that he managed to save 300,000 bucks or so because he was able to postpone hiring some people. He was expecting some congratulatory remarks and instead he was scolded. "You have a plan to execute, therefore spend the $300K and execute the plan!" he was told. What I got away from this panel discussion is that when serious money is on the line, (850 Million bucks is serious money) you don't f..k around. You make sure the plan is right, hire the best of the best to verify this, (i.e. hire Nobel laureates to review your circuit design and software flow charts) and make sure the guys to which you are giving the money, can stand up to a brutal review. If they can't, the VC's will be throwing their money away on that proposal. The upside of all this is that if you do get your 1st round of funding, then they will be with you to make sure your plan goes right. And, don't expect to retain full ownership of your company, their commission is measured as a percentage ownership of the company you are building. If you don't like it, go down to your local savings and loan and pitch your idea to them, these VC's have another 300 business plans to choose from. You know, this may sound crazy, but after what I've been put through working for the government, I would give my left nut to work in that kind of environment....
First of all, the Blue Gene research team started out by designing a RISC instruction set from scratch. They wanted to use something like the PPC but its instruction set just got too large. This was due to too many people coming and going from the PPC design team and all instructions had to be kept in order to keep backward compatibility. Thus the "typical" RISC architecture had 250 to 300 instructions of which only 50 were really used and some were never used. There were even other instructions, that if used, would break the performance of the chip, and so the instruction had to be turned off by the compiler. After that explanation, it was clear to me that it was a good idea to start the CPU design by tossing out the instruction set and starting from scratch.
The next key concept was to build many small CPU's on one fabrication
die. The idea being that one "CPU chip" would have hundreds of CPU's,
with floating point units scattered throughout the die along with
secondary cache units. Coupled with this idea was the concept that if
one of the CPU's didn't work, the OS would detect this and not use
it. Therefore, if you have a large die or silicon from which you're
going to build your "processor chip", a defect in the fabrication in
the sequencer unit or instruction set memory or whatever would not
cause you to throw out the chip. This is a big problem with today's
current CPU manufacturing. 100 microns of bad silicon in the wrong
spot and you had to throw out the CPU. Monty couldn't give exact
figures, but he said that because of this ability of having the OS
turn off just the bad CPUs then the production yields went from very
low to very high. This is very much the same concept as bad blocks on a
hard disk drive.
The next concept was that of a water cooled system. The amount of air flow needed to cool a Petaflop machine would require a couple of jet turbo engines providing hurricane equivalent wind forces. Therefore, one had to resort to using water to cool the system. As it turns out, there was great resistance to this idea, but Monty prevailed.
The final idea which I remember was how they were going to connect this Petaflop machine together. The idea was to build cubes of processors and then connect the cubes together with some kind of cabling. The problem being that there was a lot of cable to hook up and it needed to be done right. OSHA got in the way because if you build something which humans must traverse, like a hallway, or a conduit under IBM's Blue Gene, you needed to provide space for a guy, 7 feet tall, to be able to run out of in case there is a fire. No getting around this requirement. So they built Blue Gene over a special floor which was broken up into a grid. Each grid element could be raised and lowered. So you have to imagine this. A large floor area where you see hundreds of CPU cubes. The operator has to check the connector on one of the CPU cubes. He goes and clicks on some Java thing or other on his console and grid point XY raises up to arm level. He then goes out there, checks the cable, and when done, goes back and clicks on his Java interface and the cpu cube is lowered back into the grid. Definitely Space Odyssey 2001 stuff.
I believe IBM is on the right track. With this design, they will get their Petaflop computer, at about $100 Million, give or take a factor of 2 or 3. But what really impressed me about Monty's talk is that he didn't bother to prepare a power point presentation like the rest of the speakers did. He just got out there in front of the audience and started talking away. I'm not sure if this is a good or bad thing, but impressive in the least. Sort of like watching a no hitter.
The speaker after Monty was Keiji Tani speaking about the 40Teraflop machine which Japan is building. The bit which struck me about his talk was that for about $500 Million bucks, Japan is building a 40 Teraflopper which will be housed in a building the size of a large basketball stadium and will have about 20,000 Kilometers of cabling. The speaker before him described a petaflopper which will be housed in a large auditorium for about $100 million bucks. The two will be ready in about 2 or 3 years. You do the math, but if I were reviewing the Japanese project, it would be hard for me to justify the cost..... My guess is that the Japanese need to build this machine to show the rest of the world that they are players in the HPC game. Just like the US spends hundreds of millions on their Giga and Tera floppers in the national labs scattered about the country. Forget about what's housed in the NSA research labs.
The telescope is basically fixed, and the ability to point it, is restricted to the positioning of the receiver which sits above the dish of the observatory, and the sweeping of space as the earth turns. Therefore there is a rather elaborate mechanism to move around the receiver above the dish which gives Arecibo its pointing ability. In order to make this movement of the receiver work, there is a counter balance which is needed to stabilize the main receiver. So the SETI people were able to install a second receiver on the counter balance. This made them the parasitic experiment. Those researchers who paid for prime time on the facility got to point the telescope in what ever direction they wished, the SETI people would then pick up what ever signal they could get from where ever their secondary receiver ended up pointing. Sort of like if the guys paying for time on the observatory were looking left, SETI was forced to look right. In the end, this situation worked out OK for them. The SETI researches were able to scan the sky in a random walk, determined by the other experiments running at the time. David explained that they effectively covered the sky in about 6 months time.
With that they solved their data collection
problem. Next they needed to solve their number crunching problem and
with that they thought up of the seti@home project. What really
surprised them was the willingness of people to donate their idle
computer time to the project. They were hoping for about 100k people
to help out. When they posted their announcement to the Internet, they
got over 400K people signing up to their mailing list. When they went
online for the first time, they got over 200K users requesting data to
be analyzed. They were so overwhelmed by the system overload by
having 200K users requesting data to analyze, that it took them 8
months to e-mail out an announcement to the original 400K who signed up
to their e-mail list. Basically they were totally swamped and had to
work very hard to deal with their success. David talked a bit about the
setup of their system which reminded me of the many data acquisition
talks I've given and heard. One of the interesting details of the
seti@home project I found was that they got a lot of funding from
private, non-science institutions like Paramount Pictures. If I
remember correctly, of the $700K they got in funding $200K was from
these private sources. Paramount was interested in this project
because they wanted to get Captain Picard to throw some big power
switch which would start the whole experiment. That never happened,
but the check did clear. Sun Microsystems donated lots of hardware.
David was very grateful of this contribution and spent some time
plugging them.
They had problems with making sure the data which was returned was actually processed by their client code. Since seti@home has been made a bit of a game with respect to processing the data, a lot of people have faked results so that they can climb up the "who has analyzed the most data" ladder. He also spoke about the Open Source controversy. As it turned out, there were some people "out there" in the Open Source community which were very angry that the client code was not Open Source'ed. At some point, there were some web sites which wanted to boycott the project because of this and others wanted to launch some kind of attack against the server unless they open source'ed their client code. I was quite ashamed to hear this. He went on to talk about how some users were also angry that the client code was not optimized for their particular hardware which the code was running on. For example, AMD CPU's have some instruction sets which will help speed up FFTs as does the Intel Pentiums with the MMX instructions. In order to make the code portable, the seti@home guys didn't pay much attention to these issues. So there were some users out there which disassembled the client code, found the portions which did the FFT and they replaced that section of the code with their own optimized FFT routines, optimized for their particular CPU instruction set. Now that is hacking. After the talk, I asked David if he realized that if he open sourced his client code, then people would have provided the optimization code for him instead of forcing the users disassemble the code. He told me that he worried about the integrity of the code and that he couldn't trust the scientific code put into the client. I understood where he was coming from. If I were to do something similar, start a phenix@home project, then I would have to provide a way of verifying the results of the computations every time someone added in some code. This verification process could break the @home usefulness of the project. Also, you would have to somehow guarantee that the code, once complied, was really that same code and not some rogue client which someone put together in order to fake fast data processing time. As it stands now, seti@home has accumulated about 450,000 years of computing time or an instantaneous computing rate of 20 Teraflops. This is half the size of the computer the Japanese are building which essentially cost the SETI research team $40K/Teraflop instead of $12,500K/Teraflop for which the Japanese can build HPC systems. Also, half of the data out of the Berkeley domain belongs to the seti@home project. That's a cool factoid.
The final session I want to cover is the open source panel discussion which took place at the very end of the conference. The topic being, how can the high performance computing (HPC) field take advantage of the open source movement and how should the government funding agencies deal with this matter. As it turns out, there is a committee out there titled the "President's Information Technology Advisory Committee", or PITAC, and they were charged with investigating the matter for the HPC field. The result was the publication of a document titled "Developing Open source Software to Advance High End Computing". The members of the PITAC who worked on this report were present on the panel. The first panel member, Susan Graham of UC Berkeley, basically gave a report on the report. The short side of it was that they recognized the potential of open source software and that the government should take advantage of it and do so now. The government should not take its time on this issue. The next panelist to speak was Todd Needham from Microsoft. This was unique to me, the first time I get to hear a bona fide Microsofter speak about open source software. His general attitude was that Open Source was not pixie dust which you could sprinkle over software and suddenly make it all that more powerful. Which is to say that in general, he was rather negative toward the movement. He had a rather angry and defensive attitude throughout the panel discussion which put me off. I guess it's the fallout of the antitrust lawsuit against Microsoft.
From my notes I was able to get the following from his
introduction. He argued that Open Source is not a development
methodology. In fact, he claimed that many projects are more cathedral
than bazaar. (He gave the Linux kernel development as an example, with
Linus sitting at the top.) He claimed that it is not a security
model. Many eyeballs are not a replacement for a formal design and
review process. (It's interesting to hear that coming from a guy who
works for a company who just had a major break-in which made headlines
around the world...) Open Source does not mean open standards. He also
emphasized that open source license does not mean that you don't have
access to the source code. He did like the idea of managed source
code.
Note: You can find Todd's full presentation in
this .pdf file.
In one of his transparencies he alluded to open source as a way of giving away your intellectual property rights and thus diluting the monetary value of your work. After the introductory talks, there was a question regarding this and he was quite adamant about how bad it was to open source your code and thus lose the dollar value you put into the code. He stated that Microsoft is a company which makes its money off of intellectual property and thus the open source model just doesn't work for them. (If Todd said otherwise, it would be a Slashdot headline for sure....) It must have been interesting to see how the report got out, which recommended the use and adoption of open source software with Todd from Microsoft as one of the committee members.
The next guy who talked was Jose Munoz from DOE. He did a Dave
Letterman by going through the top 10 reason why Open Source software
is bad in reverse order. The last one being, or rather item #1, the
question "Would you want to fly in an airplane whose complete flight
system was developed using Open Source by the lowest bidder?",
followed by a bullet reading "Whom do you sue when the thing goes
wrong? (assuming you're a survivor)". It's unfortunate that the guy
who works for the same government agency which provides my paycheck
gave such a negative perspective to this issue. It was good to listen
to one of the members of the audience make a statement, at the end of
the session, that if given a choice between the plane running open
source software or something running under a Microsoft OS, he would
much prefer the open source one, given the track record of Microsoft
software. There were a couple of chuckles in the audience and a
blushed smile from Todd of Microsoft.
Note: You can find Jose Munoz's full presentation in
this .pdf file.
The last panelist to speak was from Sandia National Laboratories. His talk was basically in favor of the Open Source software license model. I asked two questions of the panelists, first I pointed out to them that the Linux and the Linux distributions have fostered a new generation of companies selling super computers. I told them that if you walk around the show floor, you see many small companies selling racks of Linux machines. I personally didn't see any companies selling racks of Windows NT/2000 machines. They responded that the big companies would sell you a rack of either a Linux or windows NT PC and that there was one demonstration booth which had a rack of windows NT PC's running Beowulf applications. Personally I believe they missed the point I was trying to make, which was that Linux was fostering a new industry made up of young start-ups. The second question that I asked them, it was actually more of a statement than a question, was that they should consider the Internet when they discuss issues relating to open source. "Who owns the Internet? The Internet wouldn't exist if it were owned by anyone." I remember a smile coming across Susan Graham from Berkeley once I finished my statement. Todd from Microsoft decided to answer my question. What I remember of his answer was that he though AOL did a "damn good job" of hiding all that stuff from the user in creating the front end which their user community uses. Again, I believe he missed my point. To me, AOL was useless until they connected themselves to the Internet. First by providing e-mail and then when they provided you with a ppp connection.
My "consider the Internet" statement was the last one given before the panel discussion ended. Of course I could have gone on a rant about my "consider the Internet" statement and kept the panel going for at least another 15 minutes by addressing some of the comments the panelists said, but it was the end of 4 days of conferencing and I had to catch my plane back to New York. Besides, no one wants to hear someone rant on and on and turn a discussion personal. Who knows, I can write up a rant, post that on my web site, and get many orders of magnitude more people to read my rant than the few dozen which were in the conference session at the time....
There were many many more talks and events which happened at the
conference, but it would take much to much time to write about the
whole thing. I tried to touch upon the items which I thought were the
most important. Other talks which were of interest were Dr. Sterlins
talk on Commercial Off The Shelf (COTS) super computers, Eugene
Spafford's talk on security issues on the Internet, and all the stuff
which I saw on the show floor. That's left as a page full of
captioned pictures.
Many thanks go to Duane Clark, Marie Bennington, Tundran and James Burley for submitting e-mails pointing out lots of typos which they found in the text. Again, thank you very much.
I would further like to thank Lee Busby for converting Jose Munoz's and Todd Needham's power point presentations into the more universal format of PDF.
[All response links are off-site. -Ed.]
Frank Love writes in to tell me about my warts. Actually, everyone has these kinds of warts.
Barry Stinson has comments on my DOE buddy, Jose Munoz and the Open Source panel discussion.
Carl Friedberg, a physicist, agrees with my description of what it's like to work for the government.
Andrew Weiss writes in to let me know that the system which I thought was going to Duke University may in fact be going to U. of Delaware. Also the Bird is not extreem Tux but YoUDee, the U. of Delaware mascot. Thanks Andrew for the clarification.
Brad Lucier was the first to write in informing me that the 1U rack of cpu's belong to API networks. Thanks for the clarification Brad.
David Kinney from NASA writes in to inform me that the aerial picture of the airport is of Moffet Field, home of NASA Ames Research Center. Thank you David for figuring out what the eye in the sky was looking at.
Rich Brueckner from Sun Microsystems writes in with some details of the Sun booth and the party they threw for the SC2000'ers.
Patrick J Melody from the Naval Research Laboratory's Center for Computational Science, e-mailed me to tell me that they are the guys who demoed the 1.5 Gigabit streaming video demo and the earth surface scan demo.
Andy Meyer has sent in the most detailed description of the aerial photo of the Moffett Federal Airfield so far. Good work Andy.
L. Busby of Lawrence Livermore National Laboratory has some comments regarding the Open Source panel which are worth the read. Thank you L. Busby for the e-mail.
Marc sent in some rather frank advice regarding Open Source panel discussions. I'll use his advice at the next opportunity. Maybe someone else has better advice as to how to react in a public forum to anti Open Source talk?
Todd Needham from Microsoft, who was on the Open Source panel discussion, e-mailed me some comments about this article. I think it's important to that his views on the panel and this article be shared with the readers. I replied to Todd who then replied back with further comments. You can read my second reply to Todd here.
Chris Torres writes in to thank me for taking the time to write the article. It's because of e-mail like yours Chris, which motivate me to write these articles in the first place. I'm glad you enjoyed the read.
Steve Conway from Conway Communications, sent in a reminder of a very important event which I missed at the show. This being an announcement on "progress on plans for new performance benchmarks for supercomputers and the hiring of DOE/NERSC to develop the new tests." Sorry for missing it and not writing about it in this article.
Casey King, from Australia, writes in a comment or two about the SC2K NOC picture I took. It looks like the networking gurus do aim for that higher stabling standard in the sky... But it's just to high up there to reach.
Gerardo Cisneros of SGI, wrote in to clarify one of my comments I wrote in the Open Source panel discussion regarding OS'es used to fly airplanes. I knew what he was referring too, as did everyone in the audience, so I went ahead and filled in his blanks.
Louis H Turcotte, the SC2000 conference chair(!), read the article and has some interesting insights. As it turns out, the conference is organized by volunteers from around the country. He writes, "I would like to share with your readers that SC is a conference totally organized by volunteers - who work for 2-3 years to create the week's worth of conference activities." Quite an impressive effort Louis.
Koen Holtman, from Caltech, wrote in to clarify Jose Munoz's presentation on the Open Source panel. According to what Koen could remember, Jose was playing the role of the devil's advocate, and thus the negative slat toward his presentation. Thanks for the clarification Koen.
Of all the people out there on the Internet who read this article, (over 10,000 as of 6 Days after the initial posting on www.linuxtoday.com,) it looks like Richard Stallman found some time to read it and write me some comments on the article. He thanks me for recognizing the importance of the Free Software movement. Remember, it's GNU/Linux!