20 Petaflops by 02012

This webpage contains notes for a lecture given at Scottsdale Community College on 1 April 02009. [April Fools' Day and Conficker Solidification Day]

Introduction
   Title:  20 petaflops by 02012 (exaflops [1,000 petaflops] by 02019?)

   Peta- implies 10^15 
         1,000,000,000,000,000 = 1,000 trillion = 1 quadrillion

   flops implies FLoating-Point Operations Per Second.  But for sake
         of discussion it is okay to replace floating-point with 
         arithmetic calculations.
Rewind Two Years Ago

On 4 April 02007, I gave a Kelly Lecture titled "The Next Era of Computing." The motivation for that lecture was because the computing roadmap was for peta-scale computing sometime in 02008. The roadmap was accurate. During June of 02008, the RoadRunner supercomputer was in service executing 1.026 quadrillion calculations per second (flops).

As 02008 neared its end, TOP500.org published the following.

   "The 32nd edition of the closely watched list of the world's 
    TOP500 supercomputers has just been issued, with the 1.105 
    petaflop/s IBM supercomputer at Los Alamos National Laboratory 
    holding on to the top spot it first achieved in June 2008."

   "The Los Alamos system, nicknamed Roadrunner, was slightly enhanced 
    since June and narrowly fended off a challenge by the Cray XT5 
    supercomputer at Oak Ridge National Laboratory called Jaguar. 
    The system, only the second to break the petaflop/s barrier, 
    posted a top performance of 1.059 petaflop/s."

[side-bar] Observe how from June 02008 to November 2008, the processing speed of the RoadRunner supercomputer increased from 1.026 petaflops to 1.105 petaflops, which is a difference of 79 teraflops (or 79 trillion arithmetic calculations per second).

Dan Stanzione, Director of ASU's High-Performance Computing Initiative, described ASU/TGen's "Saguaro 2" supercomputer's ability to do 50 trillion mathematical operations per second (i.e. 50 teraflops) as follows: "That's the equivalent of taking a calculator and doing one operation per second, by hand, continuously for the next one and a half million years." [Note: 50 trillion is 4.5% of 1.105 quadrillion.]

As of November 02008, the United States has 290 of the world's top 500 supercomputers. The United Kingdom had 46, France 26 , Germany 25, Japan 17, and China 15.

I felt compelled to give another lecture during the spring of 02009 because the computing roadmap is for us to have 20 petabyte computing by 02012.

About the 'P' in HPC

HPC stands for High Performance Computing. Historically, the 'P' in HPC stood for "performance;" however, these days it stands for much more.

The following is my attempt to define HPC in one sentence.

   HPC is a highly pervasive computing environment that 
   enables highly productive computing via a highly persistent 
   cyber-infrastructure that exploits highly parallel computing 
   to provide highly powerful computation.

With less than two years left in decade zero of the 21st century, HPC is really HP6C -- High Performance, Productivity, Pervasive, Persistent, Parallel, Powerful Computing.

HPC systems not only provide peta-scale computation, but they also provide powerful storage systems and, in many cases, powerful visualization systems (e.g. ASU's Decision Theater).

HPC systems (which for the most part are hardware) need software and these days the software is way behind the hardware. In other words, today's software is not even close to exploiting the power of HPC systems.

21st Century Informatics

Centuries ago (mid-17th century?) a "computer" was a person who did mathematical calculations. [computer: a person who computes] During the mid-20th century, computers started having their jobs outsourced to machines and we entered the age of "data processing." Over time "data processing" (DP) morphed into "information technology" (IT) which has morphed into "Informatics."

I prefix the term Informatics with 21st Century because these days data processing is being executed by HPC systems. In a nutshell, 21st century Informatics is data processing on steroids.

21st century Informatics is a never ending work-in-progress. For example, Sir Tim Berners-Lee (the "father" of the WWW [World Wide Web]) is working hard on creating the Semantic WWW. The goal: Turn plain-old-text into information. And, when it comes to information, we're in the Exa-byte Age.

[Note: A byte is the basic unit of measure for information storage. For sake for discussion, it is okay to equate a byte with a character (i.e. letters, digits, the characters generated by typing on a computer keyboard)].

Google is a prime example of a tween-aged 21st century Informatics company and it's mission is to "index" all of the "information" in the world.

   "We did a math exercise and the answer was 300 years," Eric
    Schmidt said in response to an audience question asking for 
    a projection of how long the company's mission will take.  
   "Of the approximately 5 million terabytes of information out 
    in the world, only about 170 terabytes have been indexed."

   "IDC predicts that by 02011, the total volume of electronic 
    data created and stored will grow to 1,000% of the 180 exabytes 
    that existed in 02006."

   [side-bar] How does 5 million terabytes in 02005 compare to 180 exabytes in 02006?

[side-bar] Rob Pike is a Commander at Google and he was quoted saying the following: "The Web is too large to fit on a single machine so it's no surprise that searching the Web requires the coordination of many machines, too. A single Google query may touch over a thousand machines before the results are returned to the user, all in a fraction of a second."

Informatics (the science of information) comes is many forms... financial informatics; meteorlogical informatics; chemical informatics; biological informatics; medical informatics; biomedical informatics (ASU--1st undergraduate degree program); healthcare informatics (Obama's heathcare IT); business, social, community informatics (e.g. LinkedIn, Facebook, MySpace, Ning); environmental informatics; and so on.

[side-bar] I went to a nanotechnology session at ASU and the speaker lectured about nano-enabled data storage. The focus of his talk was in units of bits (binary digits), not bytes (typically 1-byte equals 8-bits). He concluded his talk by asking the question: Can anybody give an example of the power of 96-bits?

We are in the Exabyte Information Age, but can anybody give an example of the power of 140-characters (bytes)?

Cyberinfrastructure of Clouds

Sun Microsystems was founded in 01982 and the company's motto is "the network is the computer."

These days "cloud computing" is a buzz-phrase; however, in its simpliest form, a "computing cloud" is "a network."

Cloud computing, which is available 99.999% of the time, provides "services" (i.e. applications, programs, software) and "storage." Computing becomes utility-like in a way that is similar to how use electricity. We don't by the program foo and install it on our computer; instead, we use program foo (which is stored on a cloud) on an as needed basis and we pay accordingly.

OpenCloudManifesto.org is "dedicated to the belief that the could should be open."

   "The Open Cloud Manifesto establishes a core set of principles 
    to ensure that organizations will have freedom of choice, 
    flexibility, and openness as they take advantage of cloud 
    computing. While cloud computing has the potential to have 
    a positive impact on organizations, there is also potential 
    for lock-in and lost flexibility if appropriate open standards 
    are not identified and adopted." [source: OpenCloudManifesto.org FAQ]

To date, Open Cloud supporters include the following: ATT, IBM, Cisco, Red Hat, VMWare, Sun Microsystems, and many others; however, missing from the list of supporters are: Google, Microsoft, HP, Intel and many others.

The Internet is a network of networks. Vinton Cerf, one of the co-creators of the Internet and Google's Chief Internet Evangelist, says that "interest in cloud computing is appropriate. But the question is, what happens if there's more than one cloud? And how do clouds interact with each other? [...] So intercloud stuff is going to be the next decade's really interesting communications and networking challenge."

Computing clouds require the cyber-infrastructure and the cyber-infrastructure needs lots of work (i.e. it has too few interstates, it is missing road and bridges, and many of its existing roads are full of cracker exploitable pot holes).

Last mile bandwidth remains a problem; however, the larger issue remains the Digital Divide.

[side-bar] I read the following after writing the previous paragraph.

   "A scholar at the Discovery Institute Brett Swanson, kicked off 
    the current round of debate about Internet capacity with a piece 
    in the Wall Street Journal. Swanson warned that the rise in online 
    voice and video were threatening the Internet, especially at its 
    'edges,' those last-mile connections to consumers and businesses 
    where bandwidth is least available. 'Without many tens of billions 
    of dollars worth of new fiber optic networks,' he wrote, 'thousands 
    of new business plans in communications, medicine, education, security, 
    remote sensing, computing, the military and every mundane task that 
    could soon move to the Internet will be frustrated. All the innovations 
    on the edge will die.'" [source: ARStechnica.com]

It is easy to make cloud computing sound like an utopian computing world, but the following is a snippet of what Richard Stallman has to say about cloud computing.

   "It's stupidity. It's worse than stupidity: it's a marketing hype 
    campaign.  Do your own computing on your own computer with your 
    copy of a freedom-respecting program.  If you use a proprietary 
    program or somebody else's Web server, you're defenseless. You're 
    putty in the hands of whoever developed that software."--RMS

During January of 02007, Brett Swanson of the Discovery Institute wrote a Wall Street Journal editorial titled: "The Coming Exflood." But there won't be any exaflood of data (i.e. information) without a powerful world-wide cyber-infrastructure supporting HPC enable computing clouds.

Energy and Computing

HPC systems require lots of energy and the computing industry is playing a lead role in the R&DDD (Research and Development, Demonstration, Deployment) of energy efficient computing.

[side-bar] "In June 02007 Google completed a 1.6MW solar installation at our Mountain View, CA headquarters - the largest U.S. corporate installation at that time."

HPC systems need access to cheap, reliable and abundant sources of power; they must make efficient use of the power; and they must manage the byproduct of waste heat.

Coming Decades: Nanotechnology/Quantum/DNA Computing

We are limited by our imaginations. So much of the 20th century science fiction will become science non-fiction in the 21st century.

Hyperlinks

Creator: Gerald Thurman [gthurman@gmail.com]
Created: 01 April 2009