Oak Ridge soon to house fastest Super Computer?

From: Dave Brockman 
Hash: SHA1



- -- 
"Some things in life can never be fully appreciated nor
understood unless experienced firsthand. Some things in
networking can never be fully understood by someone who neither
builds commercial networking equipment nor runs an operational
network."  RFC 1925
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/


=============================================================== From: Tim Youngblood ------------------------------------------------------ From what I have heard the issue with Oak Ridge is the bandwidth into the machine is the big limiter for others outside who would otherwise want to leverage. Can anyone refute this or talk to how this has improved. I'm not bashing just reflecting what informed sources outside our area have told me (aka California) and would like to have the real story. Tim

=============================================================== From: "Chris St. Pierre" ------------------------------------------------------ I worked in HPC Operations at ORNL for a little over 18 months. Everything in this email is public, and some of it is dated, since I haven't been there since March. The problem with supercomputers of that size is pretty much always bandwidth. When you've got a filesystem that writes at 240 GB/s, even 10 Gb Internet2 starts looking pretty slow. And that's the current filesystem; I don't know if numbers on the next generation filesystem are public yet, but it'll be faster. ORNL was part of the Advanced Networking Initiative test that was demoed last year at the Supercomputing conference where they brought up (and saturated) a 100 Gb WAN between ORNL, SUNY-Brockport, LLNL, and a few other national labs. That was just a demo, and AFAIK there's no production 100 Gb network yet, but obviously if they can get that kind of throughput it's not purely a bandwidth issue. The "problem" with ORNL, if you can call it that, is that they do capability computing, not capacity computing. So if you've got a job that can use all 225,000 cores in Jaguar (or however many cores Titan will have) you go to the front of the queue, and they don't backfill with a bunch of smaller jobs. They want the big jobs that can't be run anywhere else. That lets ORNL do some really unique science, but also limits the sheer volume of science they do. Basically: If you want to run a job at a national lab, you're going to have to transfer your data there, probably over ESnet via GridFTP. That's going to use every bit of the I2 connection you probably have, and with the size of dataset you've probably got, it doesn't matter whether that data's going just down the street to LLNL or across the country to ORNL -- it's bandwidth, not latency, that's the limiter. Unless, of course, your institution takes advantage of its proximity to a local national lab and has multiple 10 Gb links to it. That may be what your connections in California were referring to; if they're in the Bay Area, they've got LLNL and LBNL right there, and I'm betting Cal and Stanford have some

=============================================================== From: Tim Youngblood ------------------------------------------------------ Amen to that. Devil is always in the details. Thanks for adding your experience to the conversation as well as explaining the gives and the takes. Tim

=============================================================== From: Aaron Welch ------------------------------------------------------ Hard to outrun dark fiber... -AW e machine is the big limiter for others outside who would otherwise want to l= everage. ust reflecting what informed sources outside our area have told me (aka Cali= fornia) and would like to have the real story. tier-of-supercomputing/

=============================================================== From: John Aldrich ------------------------------------------------------ Hard to outrun dark fiber... Now, I'm no expert, however, my recent experience working at Windstream has taught me one thing... it all depends on the equipment connected to that dark fiber. :D Do you have a 10G or 100G or 1Tb connection to the net? Do you have that connection directly to your target (probably not)? A lot of it depends on what's between you and your target. Unless you've got a direct connection or are connected to "Internet2" (which I know nothing about, and can't speak to) or something, your limitation is going to be your service provider's network.