Está en la página 1de 32

www.acm.org/crossroads Spring 2010 • Issue 16.

3
Spring 2010—Issue 16.3

CROSSROADS STAFF COLUMNS & DEPARTMENTS


EDITOR-IN-CHIEF LETTER FROM THE EDITOR: PLUGGING INTO THE CLOUD . . . . . . . . . . . . . . . . . . . . 2
Chris Harrison, Carnegie Mellon University by Chris Harrison, Editor-in-Chief
DEPARTMENTS CHIEF
ELASTICITY IN THE CLOUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Tom Bartindale, University of Newcastle
by David Chiu
EDITORS
Ryan K.L. Ko, Nanyang Technological University CLOUD COMPUTING IN PLAIN ENGLISH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
James Stanier, University of Sussex by Ryan K. L. Ko
Malay Bhattacharyya,
Indian Statistical Institute FEATURES
Inbal Talgam, Weizmann Institute of Science
Sumit Narayan, University of Connecticut
DEPARTMENT EDITORS
VOLUNTEER COMPUTING: THE ULTIMATE CLOUD
by David P. Anderson 7
Daniel Gooch, University of Bath As a collective whole, the resource pool of all the privately-owned PCs in the world
dwarfs all others. It’s also self-financing, self-updating, and self-maintaining. In short,
David Chiu, Ohio State University
it’s a dream come true for volunteer computing, and the cloud makes it possible.
Rob Simmons, Carnegie Mellon University
Michael Ashley-Rollman,
Carnegie Mellon University
Dima Batenkov, Weizmann Institute of Science
CLOUDS AT THE CROSSROADS: RESEARCH PERSPECTIVES
by Ymir Vigfusson and Gregory Chockler 10
Despite its ability to cater to business needs, cloud computing is also a first-class
COPY CHIEF research subject, according to two researchers from IBM Haifa Labs.
Erin Claire Carson,
University of California, Berkeley
COPY EDITORS
SCIENTIFIC WORKFLOWS AND CLOUDS
by Gideon Juve and Ewa Deelman 14
Leslie Sandoval, University of New Mexico How is the cloud affecting scientific workflows? Two minds from the University
Scott Duvall, University of Utah of Southern California explain.
Andrew David, University of Minnesota
ONLINE EDITORS
Gabriel Saldaña, Instituto de Estudios
THE CLOUD AT WORK:
INTERVIEWS WITH PETE BECKMAN OF ARGONNE NATIONAL LAB 19
AND BRADLEY HOROWITZ OF GOOGLE
Superiores de Tamaulipas, Mexico
by Sumit Narayan and Chris Heiden
Srinwantu Dey, University of Florida
Two leaders in the computing world explain how they view cloud computing
MANAGING EDITOR AND PROFESSIONAL ADVISOR from the research and industry perspectives.
Jill Duffy, ACM Headquarters
INSTITUTIONAL REVIEWERS
Ernest Ackermann, Mary Washington College
STATE OF SECURITY READINESS
by Ramaswamy Chandramouli and Peter Mell 24
Fears about the security readiness of the cloud are preventing organizations from leveraging it,
Peter Chalk, London Metropolitan University
and it’s up to computing professionals and researchers to start closing that gap.
Nitesh Chawla, University of Notre Dame
José Creissac Campos, University of Minho
Ashoke Deb,
Memorial University of Newfoundland
THE BUSINESS OF CLOUDS
by Guy Rosen 26
Steve Engels, University of Toronto Businesses are flocking to cloud computing-based solutions for their business needs.
The best way to understand the magnitude of this mass movement is to look at the hard data.
João Fernandes, University of Minho
Chris Hinde, Loughborough University
Michal Krupka, Palacky University
Piero Maestrini, ISTI-CNR, Pisa Contact ACM and Order Today!
José Carlos Ramalho, University of Minho Phone: 1.800.342.6626 (USA/Canada) Postal Address: ACM Member Services
+1.212.626.0500 (outside USA/Canada) P.O. Box 11405
Suzanne Shontz, Pennsylvania State University
Fax: +1.212.944.1318 New York, NY 10286-1405 USA
Roy Turner, University of Maine Internet: http://store.acm.org/acmstore
Ping-Sing Tsai, Please note the offering numbers for fulfilling claims or single order purchase below.
University of Texas—Pan American
Copyright 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part of this
Andy Twigg, University of Cambridge work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit
Joost Visser, Software Improvement Group or commercial advantage and that copies bear this notice and the full citation on the first page or initial
Tingkai Wang, London Metropolitan University screen of the document. Copyrights for components of this work owned by others than ACM must be
Charles Won, California State University, Fresno honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions
OFFERING #XRDS0163 from Publications Dept., ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.
Crossroads is distributed free of charge over the internet. It is available from:
ISSN#: 1528-4981 (PRINT) http://www.acm.org/crossroads/.
1528-4982 (ELECTRONIC)
Articles, letters, and suggestions are welcomed. For submission information, please contact
Front cover image courtesy of Opte Project. crossroads@acm.org.
Letter from the Editor

Plugging into the Cloud

A fter reading this issue, I had to seriously reevaluate my perception and definition of cloud
computing. Unsurprisingly, given the wide array of computing models it encompasses, agree-
ment among even experts is somewhat elusive.
For end users, cloud computing’s inherent intangibility makes it neurship, all the way to volunteer computing. You’ll also find inter-
tough to get a good grip on what it is and isn’t, where it begins and views with people working on the biggest and best cloud computing
ends. However, one thing is for sure: Cloud computing is hot and will systems (see page 19).
soon have a big presence on your PC.
Google, already a big player in the consumer space with services Presenting XRDS
like Gmail and Google Docs, is readying ChromeOS, a thin operating This issue also marks the last Crossroads that will arrive in the present
system that boots right to a browser. With Chrome OS, document format. We’re very excited to announce Crossroads will be relaunching
storage and heavy computation (like web searches) will all occur in the as of the next issue with an all-new look and tons of fresh content for
cloud. Is this a taste of things to come? Fortunately for programmers students. We’ve placed special emphasis on recurring columns headed
and students, Google has opened up its “app engine” back end, joining up by our new editorial team. Expect everything from code snippets
other powerful services like Amazon’s EC2 and Yahoo!’s BOSS. If and school advice, to historical factoids and lab highlights, to event
you’ve been thinking about getting your feet wet in the cloud, there listings and puzzles.
really isn’t a better time to start tinkering! Heading up these departments is a talented team from all over the
globe: Daniel Gooch (University of Bath), David Chiu (Ohio State Uni-
Open Hack Day versity), Rob Simmons (Carnegie Mellon), Dima Batenkov (Weiz-
In fact, I’m already guilty. As part of Yahoo!’s Open Hack day this past mann Institute of Science, Israel), Michael Ashley-Rollman (Carnegie
October, Bryan Pendleton, Julia Schwarz, and I (all Carnegie Mellon Mellon), Erin Claire Carson (University of California-Berkeley). I am
University students) built a cloud-based application in Python we call also very pleased to announce James Stanier (University of Sussex) is
The Inhabited Web. In the 24 “hacking” hours permitted by the con- now part of the senior editorial team, responsible for soliciting and
test, we built the back end on the Google App Engine (appengine. magazine feature articles, joining Ryan K. L. Ko (Nanyang Technical
google.com), making it massively parallel and distributed. University, Singapore), Inbal Talgam (Weizmann Institute of Science,
The idea, briefly, is to embed a simple visualization into web pages, Israel), Sumit Narayan (University of Connecticut), and Tom Bartin-
next to the browser's scroll bar. Small triangles are used to represent dale (Newcastle University).
users’ positions on the current page (scroll position). Collectively, this —Chris Harrison, Editor-in-Chief
allows you to see where people are congregating on a web page, perhaps Biography
next to a great shopping bargain, interesting news story, or funny video. Editor-in-chief Chris Harrison is a PhD stu-
You can check it out and sign up your web site for the service at dent in the Human-Computer Interaction In-
wwww.inhabitedweb.com. stitute at Carnegie Mellon University. His re-
Speaking of the web, we invite you to join our Facebook group search interests primarily focus on novel input
(ACM Crossroads) and also to let us know what you think via email methods and interaction technologies, espe-
(crossroads@acm.org) and Twitter (hashtag #xrds). cially those that leverage hardware and the
I hope you find the current issue stimulating. The whole Crossroads environment in new and compelling ways.
team has been hard at work for three months on this cloud-centric Over the past four years, he has worked on
edition of the magazine, and we are very excited about the amazing several projects in the area of social computing and input methods at IBM
lineup of feature articles, covering topics from security and entrepre- Research, AT&T Labs, and most recently, Microsoft Research.

2 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Elasticity in the Cloud
By David Chiu

T
ake a second to consider all the essential services and utilities we consume and pay for on a usage
basis: water, gas, electricity. In the distant past, some people have suggested that computing be
treated under the same model as most other utility providers.
The case could certainly be made. For instance, a company that supports its own computing infra-
structure may suffer from the costs of equipment, labor, maintenance, and mounting energy bills. It
would be more cost-effective if the company paid some third-party provider for its storage and pro-
cessing requirements based on time and usage.
While it made perfect sense from the client’s perspective, the overhead of becoming a computing-
as-a-utility provider was prohibitive until recently. Through advancements in virtualization and the
ability to leverage existing supercomputing capacities, utility computing is finally becoming realized.
Known to most as cloud computing, leaders, such as Amazon Elastic Compute Cloud (EC2), Azure,
Cloudera, and Google’s App Engine, have already begun offering utility computing to the mainstream.
A simple, but interesting property in utility models is elasticity, that is, the ability to stretch and
contract services directly according to the consumer’s needs.
Elasticity has become an essential expectation of all utility providers. When’s the last time you
plugged in a toaster oven and worried about it not working because the power company might have
run out of power? Sure, it’s one more device that sucks up power, but you’re willing to eat the cost.
Likewise, if you switch to using a more efficient refrigerator, you would expect the provider to charge
you less on your next billing cycle.
What elasticity means to cloud users is that they should design their applications to scale their
resource requirements up and down whenever possible. However, this is not as easy as plugging or
unplugging a toaster oven.

A Departure from Fixed Provisioning


Consider an imaginary application provided by my university, Ohio State. Over the period of a day,
this application requires 100 servers during peak time, but only a small fraction of that during down
time. Without elasticity, Ohio State has two options: either provision a fixed amount of 100 servers,
or less than 100 servers.
While the former case, known as over-provisioning, is capable of handling peak loads, it also wastes
servers during down time. The latter case of under-provisioning might address, to some extent, the pres-
ence of idle machines. However, its inability to handle peak loads may cause users to leave its service.
By designing our applications to scale servers accordingly to the load, the cloud offers a depar-
ture from the fixed provisioning scheme.
To provide an elastic model of computing, providers must be able to support the sense of having
an unlimited number of resources. Because computing resources are unequivocally finite, is elas-
ticity a reality?

Sharing Resources
In the past several years, we have experienced a new trend in processor development. CPUs are now
being shipped with multi- and many-cores on each chip in an effort to continue the speed-up, as
predicted by Moore’s Law. However, the superfluous cores (even a single core) are underutilized or
left completely idle.
Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 3
David Chiu

System engineers, as a result, turn to statistical multiplexing for for low-grade users, who might be fine with having their service inter-
maximizing the utilization of today’s CPUs. Informally, statistical mul- rupted very infrequently. High-grade users, on the other hand, can pay
tiplexing allows a single resource to be shared by splitting it into vari- a surplus for having the privilege to preempt services and also to pre-
able chunks and allocating each to a consumer. In the meantime, vent from being preempted.
virtualization technology, which allows several instances of operating
systems to be run on a single host machine, has matured to a point of Looking Forward
production. Virtualization has since become the de-facto means With the realization of cloud computing, many stakeholders are
toward enabling CPU multiplexing, which allows cloud providers afforded on-demand access to utilize any amount of computing power
to not only maximize the usage of their own physical resources, to satisfy their relative needs. The elastic paradigm brings with it excit-
but also multiplex their re- ing new development in the
sources among multiple
users. From the consumers’ ❝What elasticity means to cloud users is
that they should design their applications
computing community. Cer-
tainly, scaling applications to
perspective, they are afford- handle peak loads has been
ed a way to allo cate on-
demand, independent, and
to scale their resource requirements up a long-studied issue.
While downscaling has
more im portant, fully-
controllable systems.
and down whenever possible.
❞ received far less attention in
the past, the cloud invokes a
But even with virtualization, the question persists: What if the novel incentive for applications to contract, which offers a new dimen-
physical resources run out? If that ever occurred, the provider would sion for cost optimization problems. As clouds gain pace in industry and
simply have to refuse service, which is not what users want to hear. academia, they identify new opportunities and may potentially trans-
Currently, for most users, EC2 only allows 20 simultaneous form computing, as we know it.
machine instances to be allocated at any time. Another option might
be to preempt currently running processes. Although both are unpop- Biography
ular choices, they certainly leave room for the provider to offer flexi- David Chiu is a student at The Ohio State University and an editor for
ble pricing options. For instance, a provider can charge a normal price Crossroads.

4 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Cloud Computing in Plain English
By Ryan K. L. Ko

I
am not an evangelist of cloud computing, and I must admit that, like many, I was once a skeptic.
As a fledgling researcher, I was also quite appalled at how many seasoned researchers were able
to recently claim that their research “has always been” cloud computing. I suppose many people
believe cloud computing is just a buzzword and are also quite turned off by the ever-growing list of
acronyms plaguing the computer science world. But is “cloud computing” really just another buzz-
word brought forward by the software giants, or is there something more?

Significance of the Cloud Computing Era


Fundamentally, cloud computing is a concept that aims to enable end-users to easily create and use
software without a need to worry about the technical implementations and nitty-gritties such as the
software’s physical hosting location, hardware specifications, efficiency of data processing, and so forth.
This concept is already evident in many current technologies that are not explicitly labeled as
cloud computing. For example, end-users no longer need to learn a new language or worry about
the program’s memory requirements to create a Facebook or MySpace application. A small- to
medium-sized enterprise no longer needs to own and maintain actual physical servers to host Web
applications but are instead able to lease virtual private servers (VPS) for a monthly subscription fee.
With cloud computing, end-users and businesses can simply store and work on data in a “cloud,”
which is a virtual environment that embodies data centers, services, applications, and the hard-
working folks at the IT companies.
The key difference between this and other similar-sounding approaches, such as grid computing
or utility computing, is in the concept of abstracting services from products. This is done by virtu-
alizing the products (for example, the complex network of computers, servers, and applications that
are used in the back end) so that computing is now accessible to anyone with a computing need of
any size. By accessible, we mean that it is easy for a non-technical person to use this software and
even create his or her own.
This marks the change from the focus on full implementation of computing infrastructures before
the year 2000 to the abstraction of the high-level, value-driven activities from the low-level, techni-
cal activities and details in the present and near future. In the words of those advocating cloud
computing, it means that we are now moving toward services instead of focusing on selling prod-
ucts, and practically anyone can utilize computing to the max. (More technical information on these
services can be found in “The Business of Clouds,” page 26.)
So, what does all this mean for common folks like you and me? It means that we are freed
from the need to upgrade hardware and the need to spend more than half of the time trying to
make a product work, but are now able to focus on the real essence of our activities—the value-
adding activities (cf. Michael Porter’s Competitive Advantage).
With cloud computing, a startup company would no longer need to worry about the RAID
configurations and the number of scheduled backup jobs, but instead could focus on more
important details, such as the actual web content, the number of emails to set up for its employ-
ees, and the file structure and permissions to be granted for its content management structure.
Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 5
Ryan K. L. Ko

Now, when we look beneath the sales talk and big promises of While it is my greatest wish for you have a better understanding of
cloud computing and observe the shifts in trends in our computing cloud computing through this article, I hope that I have also opened
approaches, we start to realize that cloud computing is not just up your mind to witnessing the increasing influence of cloud comput-
another buzzword, but something that embodies this innate attempt ing in our daily lives.
by humans to make computing easier. The evolution of computing
languages from the first generation (assembly languages) to the more Biography
human-readable fourth-generation languages (4GLs, SQL), and the Ryan K. L. Ko is a final year PhD candidate at Nanyang Technologi-
evolution from structural/modular programming to object-oriented cal University, Singapore, specializing in the semantic web and business
programming are both earlier evidences of this trend. process management. He is also an editor for Crossroads.
Cloud computing’s focus is on empowering Internet users with the
ability to focus on value-adding activities and services and outsource
the worries of hardware upgrades and technical configurations to the Cloud Computing Starter Kit
“experts residing” in the virtual cloud. While there are plenty of sites and articles describing cloud
In today’s context, cloud computing loosely means that software computing, not many have an objective view of this high-
you use does not reside on your own computer, but rather on a host potential but controversial topic. The following resources have
computer, accessed via the Internet, run by someone else. been selected by Crossroads’ editors in an attempt to help other
Given this fact, there are bound to be many problems and loop- students understand the meaning, concerns, and latest trends
holes. Hence, it is not rare to find researchers claiming that they are of cloud computing.
working in a research area that contributes to cloud computing. With
so much at stake, experts from computer security, service computing,
“Like it or not, cloud computing is the wave
computer networking, software engineering and many other related
of the future.”
By Therese Poletti, MarketWatch.
areas are crucial people in this turn of a new era.
“www.marketwatch.com/story/like-not-cloud-
computing-wave
Imminent Issues
If we are evolving into a cloud-oriented environment and way of doing
“A layman’s summary of the recent cloud computing trend.
business, we will need to urgently address both data privacy and data “Microsoft to battle in the clouds.”
security concerns. By Rory Cellan-Jones, BBC News.
Researchers need to find the right balance between convenience and “http://news.bbc.co.uk/2/hi/technology/7693993.stm
security. It’s a balancing act: when convenience increases, security de- “See in particular the short video clip on Microsoft Azure in
creases, and vice versa. As cloud computing is a highly trust-based sys- this piece from the BBC.
tem, many researchers are now geared toward creating better trust eval-
uation mechanisms and authentication procedures, while the industry is “Storm warning for cloud computing.”
busy figuring out scalability solutions, data integrity, and security issues. By Bill Thompson, BBC News.
Once a hacker or malicious attack successfully penetrates the secu- “http://news.bbc.co.uk/2/hi/technology/7421099.stm
rity boundaries of the cloud, or an employee of a cloud vendor betrays “Highlighting concerns surrounding cloud computing.
the trust of the public, our data and critical information is at the com-
“Cloud computing is a trap, warns GNU founder
❝ Cloud computing loosely means that
software you use does not reside on
Richard Stallman.”
By Bobbie Johnson, Guardian.co.uk
“www.guardian.co.uk/technology/2008/sep/29/
your own computer, but rather on a host
cloud.computing.richard.stallman
computer, accessed via the Internet, “Richard Stallman on why he’s against cloud computing.
run by someone else.
❞ “Click’s Favourite Cloud Links.”
plete mercy of these criminals. To further increase the security, we From Click’s BBC News
would need legislation and laws to catch up with the nature of cloud “http://news.bbc.co.uk/2/hi/programmes/click_online/
computing, as it will be a borderless and large-impact problem. 7464153.stm
“See in particular G.ho.st, a global virtual computer hosting site.
How Can Graduates Approach Cloud Computing?
The best way to approach this field is to have a good balance between
“Dell attempts to copyright ‘cloud computing.’”
By Agam Shah, for IDG News Service,
the quest of knowledge and discernment. Do not bounce on the latest
published on TechWorld
buzzwords you hear. Take a step back and try to see how things fit
together. A good way to do this is to organize and draw what you have
“www.techworld.com/opsys/news/index.cfm?newsid=102279
learned into mind maps. Crossroads has prepared a starter kit (see
“Just for fun, Dell tries to beat other computing companies to
the punchline.
sidebar), introducing some non-technical links to interesting articles
and videos to kickstart your journey.

6 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Volunteer Computing
The Ultimate Cloud
By David P. Anderson

C
omputers continue to get faster exponentially, but the computational demands of science are
growing even faster. Extreme requirements arise in at least three areas.

1) Physical simulation: Scientists use computers to simulate physical re- puting, the resource pool is the set of all privately-owned PCs in the
ality at many levels of scale: molecule, organism, ecosystem, planet, world. This pool is interesting for several reasons.
galaxy, universe. The models are typically chaotic, and studying the For starters, it dwarfs the other pools. The number of privately-
distribution of outcomes requires many simulation runs with per- owned PCs is currently 1 billion and is projected to grow to 2 billion by
turbed initial conditions. 2015. Second, the pool is self-financing, self-updating and self-main-
taining. People buy new PCs, upgrade system software, maintain their
2) Compute-intensive analysis of large data: Modern instruments (op-
computers, and pay their electric bills. Third, consumer PCs, not spe-
tical and radio telescopes, gene sequencers, gravitational wave de-
cial-purpose computers, are state of the art. Consumer markets drive
tectors, particle colliders) produce huge amounts of data, which in
research and development. For example, the fastest processors today are
many cases requires compute-intensive analysis.
GPUs developed for computer games. Traditional HPC is scrambling to
3) Biology-inspired algorithms such as genetic and flocking algorithms use GPUs, but there are already 100 million GPUs in the public pool,
for function optimization. and tens of thousands are already being used for volunteer computing.

These areas engender computational tasks that would take hun- History of Volunteer Computing
dreds or thousands of years to complete on a single PC. Reducing this In the mid-1990s, as consumer PCs became powerful and millions of
to a feasible interval—days or weeks—requires high-performance com- them were connected to the Internet, the idea of using them for dis-
puting (HPC). One approach is to build an extremely fast computer— tributed computing arose. The first two projects, GIMPS and distrib-
a supercomputer. However, in the areas listed above, the rate of job uted.net, were launched in 1996 and 1997.  GIMPS  finds prime
completion, rather than the turnaround time of individual jobs, is the numbers of a particular type, and distributed.net breaks cryptosys-
important performance metric. This subset of HPC is called high- tems via brute-force search of the key space. Both projects attracted
throughput computing. tens of thousands of volunteers and demonstrated the feasibility of
To achieve high throughput, the use of distributed computing,  volunteer computing.
in which jobs are run on networked computers, is often more cost- In 1999 two new projects were launched, SETI@home and
effective than supercomputing. There are many approaches to distrib- Folding@home. SETI@home from University of California-Berkeley
uted computing: analyzes data from the Arecibo radio telescope, looking for synthetic
signals from space. Folding@home, from Stanford, studies how pro-
• cluster computing, which uses dedicated computers in a single lo-
teins are formed from gene sequences. These projects received signif-
cation.
icant media coverage and moved volunteer computing into the
• desktop grid computing, in which desktop PCs within an organiza- awareness of the global public.
tion (such as a department or university) are used as a computing These projects all developed their own middleware, the applica-
resource. Jobs are run at low priority, or while the PCs are not be- tion-independent machinery for distributing jobs to volunteer com-
ing otherwise used. puters and for running jobs unobtrusively on these computers, as well
as web interfaces by which volunteers could register, communicate
• grid computing, in which separate organizations agree to share
with other volunteers, and track their progress. Few scientists had the
their computing resources (supercomputers, clusters, and/or desk-
resources or skills to develop such software, and so for several years
top grids).
there were no new projects.
• cloud computing, in which a company sells access to computers on In 2002, with funding from the National Science Foundation, the
a pay-as-you-go basis. BOINC project was established to develop general-purpose middle-
ware for volunteer computing, making it easier and cheaper for scien-
• volunteer computing, which is similar to desktop grid computing ex-
tists to use.
cept that the computing resources are volunteered by the public.
The first BOINC-based projects launched in 2004, and today there
about 60 such projects, in a wide range of scientific areas. Some of the
Each of these paradigms has an associated resource pool: the com-
larger projects include Milkyway@home (from Rensselaer Polytechnic
puters in a machine room, the computers owned by a university, the
Institute; studies galactic structure), Einstein@home (from University
computers owned by a cloud provider. In the case of volunteer com-
of Wisconsin and Max Planck Institute; searches for gravitational

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 7


David P. Anderson

waves), Rosetta@home (from University of Washington; studies pro- being used for. As a result, public awareness of science is increased,
teins of biomedical importance), ClimatePrediction.net (from Oxford and research projects that are outside of the current academic main-
University; studies long-term climate change),  and  IBM World stream can potentially get significant computing resources.
Community Grid (operated by IBM; hosts 5-10 humanitarian applica- Scientific adoption. Volunteer computing has not yet been widely
tions from various academic institutions). adopted. Sixty research groups are currently using volunteer comput-
ing, while perhaps a hundred times that many could benefit from it.
Cluster and grid computing are much more widely used by scien-
tists. The HPC community, on whom scientists rely for guidance, has
ignored volunteer computing, perhaps because it offers neither con-
trol nor funding. In addition, although BOINC has reduced the bar-
rier to entry, few research groups have the resources and skills needed
to operate a project. The most promising solution to this is to cre-
ate umbrella projects serving multiple scientists and operated at a
higher organizational level (for example, at the level of a university).
Energy efficiency. The FLOP/Watt ratio of a PC is lower than that of
a supercomputer, and it is tempting to conclude that volunteer com-
puting is less energy-efficient than supercomputing. However, this is
not necessarily the case. In cold climates, for example, energy used by
ClimatePrediction.net, from Oxford University, simulates the Earth’s a PC may replace energy used by a space heater, to which the PC is
climate change during the next 100 years. thermodynamically equivalent. No study has been done taking such
factors into account.
Evaluating Volunteer Computing
Volunteer computing can be compared with other high-performance The BOINC Project
computing paradigms in several dimensions. The BOINC software consists of two parts: server software that is
Performance. About 900,000 computers are actively participating used by to create projects and client software. Anyone—academic
in volunteer computing. Together they supply about 10 PetaFLOPS researchers, hobbyists, malicious hackers—can create a project.
(trillion floating-point operations per second) of computing power; Projects are independent. Each one operates its own server and pro-
the fraction supplied by GPUs is about 70 percent and growing. vides its own web site. BOINC has no centralized component other
As a comparison, the fastest supercomputer supplies about 1.4 than a web site from which its software can be downloaded. And in
PetaFLOPS, and the largest grids number in the tens of thousands of terms of software, volunteers install and run client software on their
hosts. So in terms of throughput, volunteer computing is competitive computers. The client software is available for all major platforms,
with other paradigms, and it has the near-term potential to greatly including Windows, Linux, and Mac OS X.
surpass them: if participation increases to 4 million computers, each Having installed the client program, volunteers can then attach it
with a 1 TeraFLOPS GPU (the speed of current high-end models) and to any set of projects, and for each project can assign a resource
computing 25 percent of the time, the result will be 1 ExaFLOPS of share  that determines how the computer’s resources are divided
computing power. Other paradigms are projected to reach this level among the projects.
only in a decade or more. Actually, since 4 million PCs is only 0.4 per- The choice of projects is up to the volunteer. Attaching to a project
cent of the resource pool, the near-term potential of volunteer com- allows it to run arbitrary executables on one’s computer, and BOINC
puting goes well beyond Exa-scale. provides only limited (account-based) sandboxing. So the volunteer
Cost effectiveness. For scientists, volunteer computing is cheaper must assess the project’s authenticity, its technical competence, and its
than other paradigms—often dramatically so. A medium-scale project scientific merit. The ownership of intellectual property resulting from
(10,000 computers, 100 TeraFLOPS) can be run using a single server the project may also be a factor.
computer and one or two staff for roughly $200,000 per year. An BOINC encourages volunteers to participate in multiple projects
equivalent CPU cluster costs at least an order of magnitude more. simultaneously. By doing so, they avoid having their computer go idle
Cloud computing is even more expensive. For example, Amazon if one project is down. Multiple attachment also helps projects whose
Elastic Computing Cloud (EC2) instances provide 2 GigaFLOPS and supply of work is sporadic.
cost $2.40 per day. To attain 100 TeraFLOPS, 50,000 instances would More generally, by making it easy to join and leave projects,
be needed, costing $43.8 million per year. However, studies suggest BOINC encourages volunteers to occasionally evaluate the set of avail-
that cloud computing is cost-effective for hosting volunteer comput- able projects, and to devote their computing resources to that projects
ing project servers. that, in their view, are doing the most important and best research.
Resource allocation policy and public outreach. In traditional HPC BOINC does accounting of credit, a numerical measure of a vol-
paradigms, resources are allocated by bureaucracies: funding agencies, unteer’s contribution to a project. The accumulation of a large amount
institutions, and committees. The public, although it pays for the of credit in a particular project can be a disincentive to try other proj-
resources, has no direct voice in their allocation, and doesn’t know ects. To combat this, BOINC provides a cross-project notion of iden-
how they’re being used. In volunteer computing, the public has direct tity (based on the volunteer’s email address). Each project exports its
control over how resources are allocated, and knows what they’re credit statistics as XML files, and various third-party credit statistics

8 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Volunteer Computing: The Ultimate Cloud

sites import these files and display cross-project credit, that is, the vol- To attract and retain volunteers, a project must perform a variety
unteer’s total credit across all projects. of human functions.  It must develop web content describing its
Even with the modest number (60) of current projects, the process research goals, methods, and credentials. It must provide volunteers
of locating them, reading their web sites, and attaching to a chosen set with periodic updates (via web or email) on its scientific progress. It
is a tedious process, and will become infeasible if the number of proj- must manage the moderation of its web site’s message boards to
ects grows to hundreds or thousands. ensure that they remain positive and useful. It must publicize itself by
BOINC provides a framework for dealing with this problem. A whatever media are available—mass media, alumni magazines, blogs,
level of indirection can be placed between client and projects. Instead social networking sites, and so on.
of being attached directly to projects, the client can be attached to a Volunteers must trust projects, but projects cannot trust volun-
web service called an account manager. The client periodically com- teers. From a project’s perspective, volunteers are effectively anony-
municates with the account manager, passing it account credentials mous. If a volunteer behaves maliciously, for example by intentionally
and receiving a list of projects to attach to. falsifying computational results, the project has no way to identify and
This framework has been used by third-party developers to create punish the offender. In other HPC paradigms, such offenders can be
“one-stop shopping” web sites, where volunteers can read summaries identified and disciplined or fired.
of all existing BOINC projects and can attach to a set of them by
checking boxes. The framework could also be used for delegation of Technical Factors
project selection, analogous to mutual funds. For example, volunteers Volunteer computing poses a number of technical problems. For the
wanting to support cancer research could attach to an American most part, these problems are addressed by BOINC, and scientists
Cancer Society account manager. American Cancer Society experts need not be concerned with them.
would then select a dynamic weighted “portfolio” of meritorious can- Heterogeneity. The volunteer computer population is extremely
cer-related volunteer projects. diverse in terms of hardware (processor type and speed, RAM, disk
space), software (operating system and version) and networking
(bandwidth, proxies, firewalls). BOINC provides scheduling mecha-
nisms that assign jobs to the hosts that can best handle them.
However, projects still generally need to compile applications for sev-
eral platforms (Windows 32 and 64 bit, Mac OS X, Linux 32 and 64
bit, various GPU platforms). This difficulty may soon be reduced by
running applications in virtual machines.
Sporadic availability and churn. Volunteer computers are not ded-
icated. The time intervals when a computer is on, and when BOINC is
allowed to compute, are sporadic and generally unpredictable. BOINC
tracks these factors and uses them in estimating job completion
times. In addition, computers are constantly joining and leaving the
The BOINC client software lets volunteers attach to projects and pool of a given project. BOINC must address the fact that computers
monitor the progress of jobs. with many jobs in progress may disappear forever.
Result validation. Because volunteer computers are anonymous
Human Factors and untrusted, BOINC cannot assume that job results are correct, or
All HPC paradigms involve human factors, but in volunteer comput- that the claimed credit is accurate. One general way of dealing with
ing these factors are particularly crucial and complex. To begin with, this is replication: that is, send a copy of each job to multiple comput-
why do people volunteer? ers; compare the results; accept the result if the replicas agree; other-
This question is currently being studied rigorously. Evidence sug- wise issue additional replicas. This is complicated by the fact that
gests that there are several motivational factors. One such factor is to different computers often do floating-point calculations differently, so
support scientific goals, such as curing diseases, finding extraterrestrial that there is no unique correct result.
life, or predicting climate change. Another factor is community. Some BOINC addresses this with a mechanism called homogeneous
volunteers enjoy participating in the online communities and social redundancy that sends instances of a given job to numerically identi-
networks that form, through message boards and other web features, cal computers. In addition, redundancy has the drawback that it
around volunteer computing projects. Yet another reason people vol- reduces throughput by at least 50 percent. To address this, BOINC has
unteer is because of the credit incentive. Some volunteers are interested a mechanism called adaptive replication that identifies trustworthy
in the performance of computer systems, and they use volunteer com- hosts and replicates their jobs only occasionally.
puting to quantify and publicize the performance of their computers. Scalability. Large volunteer projects can involve a million hosts
There have been attempts to commercialize volunteer computing and millions of jobs processed per day. This is beyond the capabilities
by paying participants, directly or via a lottery, and reselling the com- of grid and cluster systems. BOINC addresses this using an efficient
puting power. These efforts have failed because the potential buyers, server architecture that can be distributed across multiple machines.
such as pharmaceutical companies, are unwilling to have their data on The server is based on a relational database, so BOINC leverages
computers outside of their control. advances in scalability and availability of database systems. The com-
munication architecture uses exponential backoff after failures, so that

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 9


David P. Anderson

the rate of client requests remains bounded even when a server comes Currently, Folding@Home is bundled with the Sony Playstation 3 and
up after a long outage. with ATI GPU drivers.
Security. Volunteer computing poses a variety of security chal- 2. Increased scientific adoption: The set of volunteer projects is
lenges. What if hackers break into a project server and use it to dis- small and fairly stagnant. It would help if more universities and insti-
tribute malware to the attached computers? BOINC prevents this by tutions created umbrella projects, or if there were more support for
requiring that executables be digitally signed using a secure, offline higher-level computing models, such as workflow management sys-
signing computer. What if hackers create a fraudulent project that tems and MapReduce. Two other factors that would increase scientific
poses as academic research while in fact stealing volunteers’ private adoption are the promotion of volunteer computing by scientific fund-
data? This is partly addressed by account-based sandboxing: applica- ing agencies and and increased acceptance of volunteer computing by
tions are run under an unprivileged user account and typically have no the HPC and computer science communities.
access to files other than their own input and outputs. In the future, 3. Tracking technology: Today, the bulk of the world’s computing
stronger sandboxing may be possible using virtual machine technology. power is in desktop and laptop PCs, but in a decade or two it may shift
to energy-efficient mobile devices. Such devices, while docked, could
Future of Volunteer Computing be used for volunteer computing.
Volunteer computing has demonstrated its potential for high- If these challenges are addressed, and volunteer computing experi-
throughput scientific computing. However, only a small fraction of ences explosive growth, there will be thousands of projects. At this
this potential has been realized. Moving forward will require progress point volunteers can no longer be expected to evaluate all projects,
in three areas. and new allocation mechanisms will be needed. For example, the
1. Increased participation: The volunteer population has remained “mutual fund” idea mentioned above, or something analogous to deci-
around 500,000 for several years. Can it be grown by an order of mag- sion markets, in which individuals are rewarded for participating in
nitude or two? A dramatic scientific breakthrough, such as the dis- new projects that later produce significant results. Such “expert
covery of a cancer treatment or a new astronomical phenomenon, investors” would steer the market as a whole.
would certainly help it popularity. Or, the effective use of social net-
works like Facebook could spur more people to volunteer. Another Biography
way to increase participation might be to have computer manufactur- David P. Anderson is a research scientist at the Space Sciences Labo-
ers or software vendors bundle BOINC with other products. ratory at the University of California-Berkeley.

Clouds at the Crossroads


Research Perspectives
By Ymir Vigfusson and Gregory Chockler

D
espite its promise, most cloud computing innovations have been almost exclusively driven by
a few industry leaders, such as Google, Amazon, Yahoo!, Microsoft, and IBM. The involvement
of a wider research community, both in academia and industrial labs, has so far been patchy
without a clear agenda. In our opinion, the limited participation stems from the prevalent view that
clouds are mostly an engineering and business-oriented phenomenon based on stitching together
existing technologies and tools.
Here, we take a different stance and claim that clouds are now An Architectural View
mature enough to become first-class research subjects, posing a range The physical resources of a typical cloud are simply a collection of
of unique and exciting challenges deserving collective attention from machines, storage, and networking resources collectively representing
the research community. For example, the realization of privacy in the physical infrastructure of the data center(s) hosting the cloud com-
clouds is a cross-cutting interdisciplinary challenge, permeating the puting system. Large clouds may contain some hundreds of thousands
entire stack of any imaginable cloud architecture. of computers.
The goal of this article is to present some of the research directions The distributed computing infrastructure offers a collection of core
that are fundamental for cloud computing. We pose various chal- services that simplify the development of robust and scalable services
lenges that span multiple domains and disciplines. We hope these on top of a widely distributed, failure-prone, physical platform. The
questions will provoke interest from a larger group of researchers and services supported by this layer typically include communication (for
academics who wish to help shape the course of the new technology. example, multicast and publish-subscribe), failure detection, resource

10 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Clouds at the Crossroads: Research Perspectives

usage monitoring, group membership, data storage (such as distrib- Technological advances have reduced the ability of an individual to
uted file systems and key-value lookup services), distributed agree- exercise personal control over his or her personal information, making
ment (consensus), and locking. it elusive to define privacy within clouds [5]. The companies that
The application resource management layer manages the alloca- gather information to deliver targeted advertisements are working
tion of physical resources to the actual applications and platforms toward their ultimate product:  you. The amount of information
including higher-level service abstractions (virtual machines) offered known by large cloud providers about individuals is staggering, and
to end-users. The management layer deals with problems related to the lack of transparent knowledge about how this information is used
the application placement, load balancing, task scheduling, service- has provoked concerns.
level agreements, and others. Are there reasonable notions of privacy that would still allow busi-
Finally, we enumerate some cross-cutting concerns that dissect the nesses to collect and store personal information about their customers
entire cloud infrastructure. We will focus on these issues: energy, pri- in a trustworthy fashion? How much are users willing to pay for addi-
vacy and consistency, the lack of standards, benchmarks, and test beds tional privacy?
for conducting cloud related research. We could trust the cloud partially, while implementing mecha-
nisms for auditing and accountability. If privacy leaks have serious
Energy legal repercussions, then cloud providers would have incentives to
Large cloud providers are natural power hogs. To reduce the carbon deploy secure information flow techniques (even if they are heavy-
footprint, data centers are frequently deployed in proximity to hydro- handed) to limit access to sensitive data and to devise tools to locate
electric plants and other clean energy sources. Microsoft, Sun, and the responsible culprits if a breach is detected [17]. How can such
Dell have advocated putting data centers in shipping containers con- mechanisms be made practical? Is the threat of penalty to those indi-
sisting of several thousand nodes at a time, thus making deployment viduals who are caught compromising privacy satisfactory, or should
easier. Although multi-tenancy and the use of virtualization improves the cloud be considered an untrusted entity altogether?
resource utilization over traditional data centers, the growth of cloud If we choose not to trust the cloud, then one avenue of research is
provider services has been rapid, and power consumption is a major to abstract it as a storage and computing device for encrypted infor-
operating expense for the large industry leaders.
mation. We could use a recent invention in cryptography called
Fundamental questions exist of how, where, and at what cost can
fully homomorphic encryption [10]; a scheme allowing the sum and
we reduce power consumption in the cloud. Here we examine three
multiplication (and hence arbitrary Boolean circuits) to be performed
examples to illustrate potential directions.
on encrypted data without needing to decrypt it first. Unfortunately,
Solid-state disks (SSDs) have substantially faster access times and
the first implementations are entirely impractical, but beg the question
draw less power than regular mechanical disks. The downside is that
whether homomorphic encryption can be made practical.
SSDs are more expensive and lack durability because blocks can
Another approach is to sacrifice the generality of homomorphic
become corrupted after 100,000 to 1,000,000 write-erase cycles. SSDs
encryption. We can identify the most important functions that need to
have made their way into the laptop market—the next question is
be computed on the private data and devise a practical encryption
whether cloud data centers will follow [14]. Can we engineer mecha-
scheme to support these functions—think MapReduce [7] on encrypted
nisms to store read-intensive data on SSDs instead of disks?
data. As a high-level example, if all emails in Gmail were encrypted by
Google has taken steps to revamp energy use in hardware by pro-
the user’s public key and decrypted by the user’s web browser, then
ducing custom power supplies for computers which have more than
Gmail could not produce a search index for the mailbox. However, if
double  the efficiency of regular ones  [12].  They even patented a
“water-based” data center on a boat that harnesses energy from ocean each individual word in the email were encrypted, Gmail could produce
tides to power the nodes and also uses the sea for cooling. How can an index (the encrypted words would just look like a foreign language)
we better design future hardware and infrastructure for improved but would not understand the message contents.
energy efficiency? How can we minimize energy loss in the commod- The latter case implies that Gmail could not serve targeted ads to
ity machines currently deployed in data centers? the user. What are the practical points on the privacy versus function-
In the same fashion that laptop processors adapt the CPU frequency ality spectrum with respect to computational complexity and a feasible
to the workload being performed, data center nodes can be powered cloud business model? Secure multiparty computation (SMC) allows
up or down to adapt to variable access patterns, for example, due to mutually distrusting agents to compute a function on their collective
diurnal cycles or flash crowds. Some CPUs and disk arrays have more inputs without revealing their inputs to other agents [19]. Could we
flexible power management controls than simple on/off switches, thus partition sensitive information across clouds, perhaps including a
permitting intermediate levels of power consumption [13]. File systems trusted third-party service, and perform SMC on the sensitive data?
spanning multiple disks could, for instance, bundle infrequently Is SMC the right model?
accessed objects together on “sleeper” disks [9]. More generally, how
should data and computation be organized on nodes to permit soft- Consistency
ware to decrease energy use without reducing performance? In a broad sense, consistency governs the semantics of accessing the
cloud-based services as perceived by both the developers and end
Privacy Concerns users. The consistency issues are particularly relevant to the distributed
Storing personal information in the cloud clearly raises privacy and se- computing infrastructure services (see Figure 1), such as data storage.
curity concerns. Sensitive data are no longer barred by physical ob- The most stringent consistency semantics, known as serializability
scurity or obstructions. Instead, exact copies can be made in an instant. or strong consistency [11], globally orders the service requests and

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 11


Ymir Vigfusson and Gregory Chockler

presents them as occurring in an imaginary global sequence. For Current approaches to supporting strong consistency primarily
example, suppose Alice deposits $5 to a bank account with the initial focus on isolating the problem into “islands” of server replicas. While
balance of $0 concurrently with Bob’s deposit of $10 to the same beneficial for scalability, such an approach creates an extra dependency
account. If Carol checks the account balance twice and discovers it on a set of servers that have to be carefully configured and maintained.
first to be $10 and then $15, then no user would ever see $5 as the Can we make strong consistency services that are more dynamic and
valid balance of that account (since in this case, Bob’s deposit gets easier to reconfigure, providing a simpler and more robust solution?
sequenced before Alice’s). In the database community, this type of
semantics is typically implied by ACID (atomicity, consistency, isola- Standards, Benchmarks, Test Beds
tion, and durability). Technical innovations are often followed by standards wars, and cloud
Intuitively, supporting serializability requires the participants to computing is no exception. There is a plethora of cloud interoperabil-
maintain global agreement about the command ordering. Since cloud ity alliances and consortia (for example, Open Cloud Manifesto,
services are typically massively distributed and replicated (for scalability DTMF Open Cloud Standards Incubator, Open Group’s Cloud Work
and availability), reaching global agreement may be infeasible. Brewer’s Group). The largest incumbents in the market are nevertheless reluc-
celebrated CAP theorem [2] asserts that it is impossible in a large dis- tant to follow suit and have chosen to define their own standards.
tributed system to simultaneously maintain (strong) consistency, avail- Whereas the strategy is understandable, the lack of interoperability
ability, and to tolerate partitions—that is, network connectivity losses. may have adverse effect on consumers who become locked-in on a
Researchers have looked for practical ways of circumventing the single vendor. The worry is that clouds become natural monopolies.
CAP theorem. Most work has so far focused on relaxing the consis- The Internet was built on open standards. The question is whether
tency semantics; basically substituting serializability or (some of ) the clouds will be as well. Making cloud services open and interoperable
ACID properties with weaker guarantees. For instance, it does not may stimulate competition and allow new entrants to enter the cloud
matter if Carol and Bob in the example above would see either $5 or market. Customers would be free to migrate their data from a stagnant
$10 as the intermediate balances, as long as both of them will eventu- provider to a new or promising one without difficulty when they so
ally see $15 as the final balance. choose. Can the smaller players leverage their collective power to
lobby for an open and flexible cloud computing standard that fosters
This observation underlies the notion of eventual consistency [18],
competition while still allowing businesses to profit? Or can this be
which allows states of the concurrently updated objects to diverge pro-
accomplished by the larger companies or governments? What busi-
vided that eventually the differences are reconciled, for example, when
ness models are suitable for an open cloud? On the technical side,
the network connectivity is restored.
could users switch between providers without needing their support,
Apart from eventual consistency, other ways of weakening consis-
for instance by using a third-party service?
tency semantics have looked into replacing single global ordering with
Different cloud providers often adopt similar APIs for physical
multiple orderings. For instance, causal consistency [1] allows differ-
resources and the distributed computing infrastructure. For instance,
ent clients to observe different request sequences as long as each
MapReduce and Hadoop expose a similar API, as do the various key-
observed sequence is consistent with the partial cause-effect order.
value lookup services (Amazon’s Dynamo [8], Yahoo!’s PNUTS [6],
Weaker consistency semantics work well only for specific types of
memcached [4]). Other components have more diverse APIs, for
applications, such as cooperative editing, but do not easily generalize to
instance locking services like Google’s Chubby  [3], Yahoo!’s Zoo-
arbitrary services. (Just imagine what would happen if withdrawals were
keeper [16], and real-time event dissemination services. The broad
allowed in the bank account example above.) Moreover, semantics that question asks what components and interfaces are the “right” way to
are weaker than serializability (or ACID) tend to be difficult to explain provide the cloud properties mentioned previously. A more specific
to users and developers lacking the necessary technical background. question is how we can compare and contrast different implementations
Yet another problem is that for certain types of data, such as the of similar components. For instance, how can we evaluate the proper-
meta-data of a distributed file system, it might be inherently impossi- ties of key-value stores like PNUTS and Facebook’s Cassandra [15]?
ble to compromise on strong consistency without risking catastrophic The most appealing approach is to compare well-defined metrics
data losses at a massive scale. The possible research questions here on benchmark traces, such as the TPC benchmark for databases
would have to address questions such as can we produce a compre- (www.tpc.org). How can we obtain such traces, or perhaps syntheti-
hensive and rigorous framework to define and reason about the cally generate them until real ones are produced? Also, consensus
diverse consistency guarantees. The framework should unify both benchmarks enable researchers outside the major incumbent compa-
weaker and stronger models and could serve as a basis for rigorous nies to advance the core cloud technologies. 
study of various consistency semantics of cloud services and their rel- Developing distributed computing infrastructure layers or data
ative power. It should be expressive enough to allow new properties to storage systems is a hard task, but evaluating them for the massive
be both easily introduced, for example by composing the existing basic scale imposed by clouds without access to real nodes is next to impos-
properties, and understood by both developers and consumers of the sible. Academics who work on peer-to-peer systems (P2P), for exam-
cloud services. It should also help to bridge diverse perspectives on ple, rely heavily on the PlanetLab (www.planet-lab.org) test bed for
consistency that exist today within different research communities like deployment. PlanetLab constitutes more than 1,000 nodes distributed
the database and distributed systems communities. across nearly 500 sites, making it ideal an ideal resource for experi-
Although it is well understood that a cloud architecture should mental validation of geographically networked systems which sustain
accommodate both strongly and weakly consistent services, it is heavy churn (peer arrivals and departures). The nodes in the data cen-
unclear how the two can be meaningfully combined within a single ters underlying the cloud tend to be numerous, hierarchically struc-
system. How should they interact, or what implications would such a tured with respect to networking equipment, and face limited random
model have on performance and scalability? churn but occasionally suffer from large-scale correlated failures.

12 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Clouds at the Crossroads: Research Perspectives

PlanetLab’s focus on wide-area networks is suboptimal for cloud 10. Gentry, C. 2009. Fully homomorphic encryption using ideal lattices. In
platform research, unfortunately, and the same holds true for other Proceedings of the ACM Symposium on Theory of Computing (STOC’09).
similar resources. A handful of test beds appropriate for cloud 11. Gray, J., and Reuter, A. 1993. Isolation concepts. In Transaction Process-
research have made their debut recently, including Open Cirrus from ing: Concepts and Techniques, chap. 7. Morgan Kaufmann.
HP, Intel and Yahoo!, and the Open Cloud Testbed. We encourage
12. Hoelzle, U. and Weihl, B. 2006. High-efficiency power supplies for home
other players to participate and contribute resources  to  cloud computers and servers. Google Inc. http://services. google. com/blog_re-
research, with the goal of providing a standard test bed with open- sources/PSU_white_paper.pdf.
access, at least for academia, including researchers from underrepre-
13. Khuller, S., Li, J., and Saha, B. 2010. Energy efficient scheduling via par-
sented universities. Who will create the future “CloudLab”?
tial shutdown. In Proceedings of ACM-SIAM Symposium on Discrete Al-
gorithms (SODA).
How to Get Involved
Students and researchers who are interested in shaping cloud comput- 14. Narayanan, D., Donnelly, A., Thereska, E., Elnikety, S., and Rowstron, A.
ing should consider participating in the LADIS (www.cs.cornell.edu/ 2009. Migrating server storage to SSDs: Analysis of tradeoffs. In Proceed-
ings of EuroSys.
projects/ladis2010) or HotCloud (www.usenix.org/events/hotcloud10)
workshops, or the upcoming Symposium on Cloud Computing (SoCC: 15. Ramakrishnan, R. 2009. Data management challenges in the cloud. In
http://research.microsoft.com/en-us/um/redmond/events/socc2010). Proceedings of ACM SIGOPS LADIS. http://www.cs. cornell.
Large industry players are currently driving the research bandwagon edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009.pdf.
for cloud computing, but the journey is only beginning. A concerted 16. Reed B. and Junqueira, F. P. 2008. A simple totally ordered broadcast pro-
multi-disciplinary effort is needed to turn the cloud computing prom- tocol. In Proceedings of the 2nd Workshop on Large-Scale Distributed
ise into a success. Systems and Middleware (LADIS’08). ACM.
17. Smith, G. 2007. Principles of secure information flow analysis. In
Biographies Malware Detection, Christodorescu, M., et al. Eds., Springer-Verlag.
Dr. Ymir Vigfusson is a postdoctoral researcher with the Distributed Chap. 13, 291-307.
Middleware group at the IBM Research Haifa Labs. His research is fo- 18. Vogels, W. 2008. Eventually consistent. ACM Queue 6, 6.
cused around distributed systems, specifically, real-world problems
19. Wenliang, D. and Atallah, M. J. 2001. Secure multi-party computation
that embody deep trade-offs. He holds a PhD from Cornell University.
problems and their applications: a review and open problems. In Pro-
Dr. Gregory Chockler is a research staff member in the Distributed Mid- ceedings of the Workshop on New Security Paradigms.
dleware group at the IBM Research Haifa Labs. His research interests span
a wide range of topics in the area of large-scale distributed computing
and  cloud computing. He is one of the founders and organizers of the
ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middle-
ware (LADIS). He holds a PhD from the Hebrew University of Jerusalem.

References
1. Ahamad, M., Hutto, P. W., Neiger, G., Burns, J. E., and Kohli, P. 1995.
Causal memory: Definitions, implementations and programming. Dis-
tributed Comput. 9. 37-49.
2. Brewer, E. 2000. Towards robust distributed systems. In Proceedings of
Principles of Distributed Computing (PODC).
3. Burrows, M. 2006. The Chubby lock service for loosely-coupled
distributed systems. In Proceedings of the 70th USENIX Symposium on
Operating Systems Design and Implementation (OSDI’06). USENIX As-
sociation. 335-350
4. Danga Interactive. memcached: A distributed memory object caching
system. http://www.danga.com/memcached/.
5. DeCandia, G., Hastorun, D., Jampani, et al. 2007. Dynamo: Amazon’s
highly available key-value store. In Proceedings of the 21st ACM SIGOPS
Symposium on Operating Systems Principles (SOSP’07). Association for
Computing Machinery. 205-220.
6. Cavoukian, A. 2008. Privacy in the clouds. White Paper on Privacy and
Digital Identity: Implications for the Internet. http://www. ipc. on. ca/im-
ages/Resources/privacyintheclouds.pdf.
7. Cooper, B., Ramakrishnan, R., et al. 2008. PNUTS: Yahoo!’s hosted data
serving platform. Proc. VLDB Endow. 1, 2. 1,277-1,288.
8. Dean, J. and Ghemawat, S. 2008. MapReduce: Simplified data processing
on large clusters. Comm. ACM 51, 1. 107-113.
9. Ganesh, L., Weatherspoon, H., Balakrishnan, M., and Birman K. 2007.
Optimizing power consumption in large-scale storage systems. In Pro-
ceedings of HotOS.

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 13


Scientific Workflows and Clouds
By Gideon Juve and Ewa Deelman

I
n recent years, empirical science has been evolving from physical experimentation to computation-
based research. In astronomy, researchers seldom spend time at a telescope, but instead access the
large number of image databases that are created and curated by the community [42]. In bio-
informatics, data repositories hosted by entities such as the National Institutes of Health [29] provide
the data gathered by Genome-Wide Association Studies and enable researchers to link particular
genotypes to a variety of diseases.
Besides public data repositories, scientific collaborations maintain be needed. It’s hard to determine what the cost-benefit tradeoffs are
community-wide data resources. For example, in gravitational-wave when running in a particular environment. And it’s difficult to achieve
physics, the Laser Interferometer Gravitational-Wave Observatory [3] good performance and reliability for an application on a given system.
maintains geographically distributed repositories holding time-series Clouds have recently appeared as an option for on-demand com-
data collected by the instruments and their associated metadata. puting. Originating in the business sector, clouds can provide compu-
Along with the large increase in online data, the need to process these tational and storage capacity when needed, which can result in
data is growing. infrastructure savings for a business. One idea driving cloud computing
In addition to traditional high performance computing (HPC) cen- is that businesses can plan only for a sustained level of capacity while
ters, a nation-wide cyberinfrastructure—a computational environ- reaching out to the cloud for resources in times of peak demand. When
ment, usually distributed, that hosts a number of heterogeneous using the cloud, consumers pay only for what they use in terms of com-
resources; cyberinfrastructure could refer to both grids and clouds or putational resources, storage, and data transfer in and out of the cloud.
a mix of the two—is being provided to the scientific community, Although clouds were built primarily with business computing
including the Open Science Grid (OSG) [36] and the TeraGrid [47]. needs in mind, they are also being considered in science. In this arti-
These infrastructures, also known as grids [13], allow access to high- cle we focus primarily on workflow-based scientific applications and
performance resources over wide area networks. For example, the describe how they can benefit from the new computing paradigm.
TeraGrid is composed of computational and data resources at Indiana
University, Louisiana University, University of Illinois, and others. Workflow Applications
These resources are accessible to users for storing data and perform- Scientific workflows are being used today in a number of disciplines.
ing parallel and sequential computations. They provide remote login They stitch together computational tasks so that they can be executed
access as well as remote data transfer and job scheduling capabilities. automatically and reliably on behalf of the researcher. These workflows
Scientific workflows are used to bring together these various data are composed of a number of image-processing applications that dis-
and compute resources and answer complex research questions. cover the geometry of the input images on the sky, calculate the geom-
Workflows describe the relationship of the individual computational etry of the output mosaic on the sky, re-project the flux in the input im-
components and their input and output data in a declarative way. In ages to conform to the geometry of the output mosaic, model the
astronomy, scientists are using workflows to generate science-grade background radiation in the input images to achieve common flux scales
mosaics of the sky [26], to examine the structure of galaxies [46], and, and background levels across the mosaic, and rectify the background that
in general, to understand the structure of the universe. In bioinformat- makes all constituent images conform to a common background level.
ics, researchers are using workflows to understand the underpinnings These normalized images are added together to form the final mosaic.
of complex diseases [34, 44]. In earthquake science, workflows are used Figure 1 shows a mosaic of the Rho Oph dark cloud created using
to predict the magnitude of earthquakes within a geographic area over this workflow.
a period of time [10]. In physics, workflows are used to search for grav- Montage mosaics can be constructed in different sizes, which dic-
itational waves [5] and model the structure of atoms [40]. In ecology, tate the number of images and computational tasks in the workflow.
scientists use workflows to explore the issues of biodiversity [21]. For example, a 4-degree square mosaic (the moon is 0.5 degrees
Today, workflow applications are running on national and interna- square) corresponds to a workflow with approximately 5,000 tasks and
tional cyberinfrastructures such as OSG, TeraGrid, and EGEE [11]. 750 input images. Workflow management systems enable the efficient
The broad spectrum of distributed computing provides unique oppor- and reliable execution of these tasks and manage the data products
tunities for large-scale, complex scientific applications in terms of they produce (both intermediate and final).
resource selection, performance optimization, and reliability. In addi- Figure 2 shows a graphical representation of a small Montage
tion to the large-scale cyberinfrastructure, applications can target workflow containing 1,200 computational tasks. Workflow manage-
campus clusters, or utility computing platforms such as commercial ment systems such as Pegasus [4, 9, 39] orchestrate the execution of
[1, 17] and academic clouds [31]. these tasks on desktops, grids, and clouds.
However, these opportunities also bring with them many chal- Another example is from the earthquake science domain, where
lenges. It’s hard to decide which resources to use and how long they will researchers use workflows to generate earthquake hazard maps of

14 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Scientific Workflows and Clouds

Figure 1: In this 75x90 arcmin view of the Rho Oph dark cloud Figure 3: In this shake map of Southern California, points on
as seen by 2MASS, the three-color composite is constructed using the map indicate geographic sites where the CyberShake
Montage. J band is shown as blue, H as green, and K as red. calculations were performed. The curves show the results of
(Image courtesy of Bruce Berriman and J. Davy Kirkpatrick.) the calculations. (Image courtesy of CyberShake Working Group,
Southern California Earthquake Center including Scott
Southern California [38]. These maps show the maximum seismic Callaghan, Kevin Milner, Patrick Small, and Tom Jordan.)
shaking that can be expected to happen in a given region over a period
3) provide reliability so that scientists do not have to manage the po-
of time (typically 50 years).
tentially large numbers of failures, and
Figure 3 shows a map constructed from individual computational
points. Each point is obtained from a hazard curve (shown around the 4) manage data so that it can be easily found and accessed at the end
map) and each curve is generated by a workflow containing approxi- of the execution.
mately 800,000 to 1,000,000 computational tasks [6]. This application
requires large-scale computing capabilities such as those provided by Science Clouds
the NSF TeraGrid [47]. Today, clouds are also emerging in academia, providing a limited
In order to support such workflows, software systems need to number of computational platforms on demand: Cumulus [49],
Eucalyptus [33], Nimbus [31], OpenNebula [43]. These science clouds
1) adapt the workflows to the execution environment (which, by ne-
provide a great opportunity for researchers to test out their ideas and
cessity, is often heterogeneous and distributed),
harden codes before investing more significant resources and money
2) optimize workflows for performance to provide a reasonable time into the potentially larger-scale commercial infrastructure.
to solution, To support the needs of a large number of different users with dif-
ferent demands in the software environment, clouds are primarily
built using resource virtualization technologies [2, 7, 50] that enable
the hosting of a number of different operating systems and associated
software and configurations on a single hardware host.
Clouds that provide computational capacities (Amazon EC2 [1],
Nimbus, Cumulus) are often referred to as an infrastructure as a serv-
ice (IaaS) because they provide the basic computing resources needed
to deploy applications and services. Platform as a service (PaaS) clouds
such as Google App Engine [17] provide an entire application devel-
opment environment including frameworks, libraries, and a deploy-
ment container. Finally, software as a service (SaaS) clouds provide
complete end-user applications for tasks such as photo sharing,
instant messaging [25], and many others.
Commercial clouds were built with business users in mind, but sci-
entific applications can benefit from them as well. Scientists, however,
often have different requirements than enterprise customers. In par-
Figure 2: A graphical representation of the Montage workflow ticular, scientific codes often have parallel components and use MPI
with 1,200 computational tasks represented as ovals. The lines [18] or shared memory to manage message-based communication
connecting the tasks represent data dependencies. between processors. More coarse-grained parallel applications such as
workflows rely on a shared file system to pass data between processes.

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 15


Gideon Juve and Ewa Deelman

Additionally, scientific applications are often composed of many inter- 3. Elastic Block Store: a block-based storage system that provides net-
dependent tasks and consume and produce large amounts of data work attached storage volumes to EC2. Volumes can be attached to
(often in the Terabyte range [5, 10]). an EC2 instance as block device and formatted for use as reliable, un-
Clouds are similar to grids, in that they can be configured (with shared file system.
additional work and tools) to look like a remote cluster, presenting
4. Simple Queue Service: a distributed queue service for sending mes-
interfaces for remote job submission and data transfer. As such, scien-
sages between nodes in a distributed application, which allows mes-
tists can use existing grid software and tools to get their work done.
sages queued by one node to be retrieved and processed by another.
Another interesting aspect of the cloud is that, by default, it
includes resource provisioning as part of the usage mode. Unlike the 5. SimpleDB: a structured key-value storage service, which enables
grid, where jobs are often executed on a best-effort basis, when run- database records to be stored, indexed and queried by key.
ning on the cloud, a user requests a certain amount of resources and
has them dedicated for a given duration of time. How many resources In addition, Amazon’s cloud provides services for monitoring
and how fast one can request them is an open question. (CloudWatch), parallel computing (Elastic MapReduce), relational stor-
Resource provisioning is particularly useful for workflow-based age (RDS), and others.
applications, where overheads of scheduling individual, inter-depend- There are many ways to deploy a scientific workflow on a cloud,
ent tasks in isolation (as it is done by grid clusters) can be very costly. depending on the services offered by the cloud and the requirements
For example, if there are two dependent jobs in the workflow, the sec- of the workflow management system. Many of the existing workflows
ond job will not be released to a local resource manager on the cluster were developed for HPC systems such as clusters, grids and super-
until the first job successfully completes. Thus the second job will computers. Porting these workflows to the cloud involves either adapt-
incur additional queuing delays. In the provisioned case, as soon as the ing the workflow to the cloud or adapting the cloud to the workflow.
first job finishes, the second job is released to the local resource man- Adapting the workflow to the cloud involves changing the work-
ager and since the resource is dedicated, it can be scheduled right away. flow to take advantage of cloud-specific services. For example, rather
Thus the overall workflow can be executed much more efficiently. than using a batch scheduler to distribute workflow tasks to cluster
Virtualization also opens up a greater number of resources to nodes, a workflow running on Amazon’s cloud could make use of the
legacy applications. These applications are often very brittle and Simple Queue Service. Adapting the cloud to the workflow involves
require a very specific software environment to execute successfully. configuring the cloud to resemble the environment for which the
Today, scientists struggle to make the codes that they rely on for application was created. For example, an HPC cluster can be emulated
weather prediction, ocean modeling, and many other computations in Amazon EC2 by provisioning one VM instance to act as a head node
work on different execution sites. No one wants to touch the codes running a batch scheduler, and several others to act as worker nodes.
that have been designed and validated many years ago in fear of break- One of the great benefits of the cloud for workflow applications is
ing their scientific quality. Clouds and their use of virtualization tech- that both adaptation approaches are possible.
nologies may make these legacy codes much easier to run. With Scientific workflows require large quantities of compute cycles to
virtualization, the environment can be customized with a given OS, process tasks. In the cloud, these cycles are provided by virtual
libraries, software packages, and the like. The needed directory struc- machines such as those provided by Amazon EC2. Many virtual
ture can be created to anchor the application in its preferred location machine instances must be used simultaneously to achieve the per-
without interfering with other users of the system. The downside is formance required for large scale workflows. These collections of
that the environment needs to be created and this may require more VMs, called “virtual clusters” [12], can be managed using existing off-
knowledge and effort on the part of the scientist than they are willing the-shelf batch schedulers such as PBS [34, 48] or Condor [8, 24].
or able to spend. Setting up a virtual cluster in the cloud involves complex configuration
steps that can be tedious and error-prone. To automate this process,
Scientific Workflows software such as Nimbus Context Broker [22] can be used. This soft-
The canonical example of a cloud is Amazon’s Elastic Compute Cloud ware gathers information about the virtual cluster and uses it to gen-
(EC2), which is part of Amazon Web Services (AWS). AWS services erate configuration files and start services on cluster VMs.
provide computational, storage, and communication infrastructure In addition to compute cycles, scientific workflows rely on shared
on-demand via web-based APIs. AWS offers five major services. storage systems for communicating data between workflow tasks dis-
tributed across a group of nodes, and for storing input and output
1. Elastic Compute Cloud (EC2): a service for provisioning virtual
data. To achieve good performance, these storage systems must scale
machine instances from Amazon’s compute cluster, which allows
well to handle data from multiple workflow tasks running in parallel
users to deploy virtual machine (VM) images with customized op-
on separate nodes.
erating systems, libraries, and application code on a variety of pre-
When running on HPC systems, workflows can usually make use
defined hardware configurations (CPU, memory, disk).
of a high-performance, parallel file system such as Lustre [45], GPFS
2. Simple Storage Service (S3): an object-based storage system for the [41], or Panasas [37]. In the cloud, workflows can either make use of a
reliable storage of binary objects (typically files), which provides op- cloud storage service, or they can deploy their own shared file system.
erations to “put” and “get” objects from a global object store that is To use a cloud storage service, the workflow management system
accessible both inside and outside Amazon’s cloud. would likely need to change the way it manages data. For example, to
use Amazon S3, a workflow task needs to fetch input data from S3 to

16 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


Scientific Workflows and Clouds

a local disk, perform its computation, then transfer output data from come in a variety of flavors depending on the level of abstraction
the local disk back to S3. Making multiple copies in this way can desired by the user.
reduce workflow performance. IaaS science clouds could provide access to the kinds of high-per-
Another alternative would be to deploy a file system in the cloud formance infrastructure found in HPC systems such as high-speed
that could be used by the workflow. For example, in Amazon EC2, an networks, and parallel storage systems. In addition they could come
extra VM can be started to host an NFS file system and worker VMs with science-oriented infrastructure services such as workflow serv-
can mount that file system as a local partition. If better performance is ices and batch scheduling services. PaaS science clouds could be sim-
needed then several VMs can be started to host a parallel file system ilar to the science portals and gateways used today. They could
such as PVFS [23, 52] or GlusterFS [16]. provide tools for scientists to develop and deploy applications using
Although clouds like Amazon’s already provide several good alter- domain-specific APIs and frameworks. Such systems could include
natives to HPC systems for workflow computation, communication access to collections of datasets used by the scientists, such as genome
and storage, there are still challenges to overcome. repositories and astronomical image archives. Finally, some com-
Virtualization overhead. Although virtualization provides greater monly used science applications could be deployed using a SaaS
flexibility, it comes with a performance cost. This cost comes from model. These applications would allow scientists from around the
intercepting and simulating certain low-level operating system calls world to upload their data for processing and analysis.
while the VM is running. In addition, there is the overhead of deploy- Additionally, HPC centers are looking at expanding their own
ing and unpacking VM images before the VM can start. These over- infrastructure by relying on cloud technologies to virtualize local clus-
heads are critical for scientific workflows because in many cases the ters, which would allow them to provide customized environments to
entire point of using a workflow is to run a computation in parallel to a wide variety of users in order to meet their specific requirements. At
improve performance. Current estimates put the overhead of existing the same time, HPC centers can also make use of commercial clouds
virtualization software at around 10 percent [2, 15, 51] and VM to supplement their local resources when user demand is high.
startup time takes between 15 and 80 seconds depending on the size
Clearly, clouds can be directly beneficial to HPC centers where the
of the VM image [19, 32]. Fortunately, advances in virtualization tech-
staff is technically savvy. However, the adoption of clouds for domain sci-
nology, such as improved hardware-assisted virtualization, may
entists depends strongly on the availability of tools that would make it easy
reduce or eliminate runtime overheads in the future.
to leverage the cloud for scientific computations and data management.
Lack of shared or parallel file systems. Although clouds provide
many different types of shared storage systems, they are not typically
Biographies
designed for use as file systems. For example, Amazon EBS does not
Gideon Juve is a PhD student in computer science at the University of
allow volumes to be mounted on multiple instances, and Amazon S3
Southern California. His research interests include distributed and high-
does not provide a standard file system interface. To run on a cloud
performance computing, scientific workflows, and computational science.
like Amazon’s, a workflow application must either be modified to use
these different storage systems, which takes time, or they must create Ewa Deelman is a research associate professor at the University of
their own file system using services available in the cloud, which is at Southern California Computer Science Department and a project leader
least difficult and potentially impossible depending on the file system at the USC Information Sciences Institute, where she heads the Pegasus
desired (for example, Lustre cannot be deployed on Amazon EC2 project, which designs and implements workflow mapping techniques for
because it requires kernel modifications that EC2 does not allow). large-scale workflows running in distributed environments.
Relatively slow networks. In addition to fast storage systems, scien-
tific workflows rely on high-performance networks to transfer data References
quickly between tasks running on different hosts. The HPC systems typ- 1. Amazon. Elastic compute cloud. http://aws.amazon.com/ec2/.
ically used for scientific workflows are built using high-bandwidth, low- 2. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neuge-
latency networks such as InfiniBand [20] and Myrinet [27]. In compar- bauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualiza-
ison, most existing commercial clouds are equipped with commodity tion. In Proceedings of the 19th ACM Symposium on Operating Systems
gigabit Ethernet, which results in poor performance for demanding Principles. 164-177.
workflow applications. Fortunately, the use of commodity networking 3. Barish, B. C. and Weiss, R. 1999. LIGO and the detection of gravitational
hardware is not a fundamental characteristic of clouds and it should be Waves. Physics Today 52. 44.
possible to build clouds with high-performance networks in the future. 4. Berriman, G. B., Deelman, E., Good, J., Jacob, J., Katz, D. S., Kesselman, C.,
Laity, A., Prince, T. A., Singh, G., and Su, M.-H. 2004. Montage: A grid en-
abled engine for delivering custom science-grade mosaics on demand. In
Future Outlook
SPIE Conference 5487: Astronomical Telescopes.
While many scientists can make use of existing clouds that were
designed with business users in mind, in the future we are likely to see 5. Brown, D. A., Brady, P. R., Dietz, A., Cao, J., Johnson, B., and McNabb, J.
2006. A case study on the use of workflow technologies for scientific
a great proliferation of clouds that have been designed specifically for
analysis: Gravitational wave data analysis. In Workflows for e-Science,
science applications. We already see science clouds being deployed at Taylor, I., Deelman, E., Gannon, D., and Shields, M., Eds., Springer.
traditional academic computing centers [14, 28, 30]. One can imagine
6. Callaghan, S., Maechling, P., Deelman, E., Vahi, K., Mehta, G., Juve, G.,
that these science clouds will be similar to existing clouds, but will Milner, K., Graves, R., Field, E., Okaya, D., Gunter, D., Beattie, K., and
come equipped with features and services that are even more useful to Jordan, T. 2008. Reducing time-to-solution using distributed high-
computational scientists. Like existing clouds, they will potentially throughput mega-workflows—Experiences from SCEC CyberShake.

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 17


Gideon Juve and Ewa Deelman

In Proceedings of the 4th IEEE International Conference on e-Science 29. NCBI. The database of genotypes and phenotypes (dbGaP). 2009.
(e-SCIENCE’08). http://www.ncbi.nlm.nih.gov/gap.
7. Clark, B., Deshane, T., Dow, E., Evanchik, S., Finlayson, M., Herne, J., and 30. NERSC. Magellan. http://www.nersc.gov/nusers/systems/magellan.
Matthews, J. N. 2004. Xen and the art of repeated research. In Proceed- 31. Nimbus Science Cloud. http://workspace.globus.org/clouds/
ings of the USENIX Annual Technical Conference, FREENIX Track. nimbus.html.
135–144.
32. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L.,
8. Condor. http://www.cs.wisc.edu/condor. and Zagorodnov, D. 2008. Eucalyptus: A technical report on an elastic util-
9. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., ity computing architecture linking your programs to useful systems. Com-
Mehta, G., Vahi, K., Berriman, G. B., Good, J., Laity, A., Jacob, J. C., and puter Science Tech. rep. 2008-10. University of California, Santa Barbara.
Katz, D. S. 2005. Pegasus: A framework for mapping complex scientific 33. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L.,
workflows onto distributed systems. Scientific Program. J. 13. 219-237. and Zagorodnov, D. 2008. The Eucalyptus open-source cloud-computing
10. Deelman, E., Callaghan, S., Field, E., Francoeur, H., Graves, R., Gupta, N., system. In Cloud Computing and its Applications.
Gupta, V., Jordan, T. H., Kesselman, C., Maechling, P., Mehringer, J., 34. Oinn, T., Li, P., Kell, D. B., Goble, C., Goderis, A., Greenwood, M.,
Mehta, G., Okaya, D., Vahi, K., and Zhao, L. 2006. Managing large-scale Hull, D., Stevens, R., Turi, D., and Zhao, J. 2006. Taverna/myGrid:
workflow execution from resource provisioning to provenance tracking: Aligning a workflow system with the life sciences community. In Work-
The CyberShake example. In Proceedings of the 2nd IEEE International flows in e-Science, Taylor, I., Deelman,E., Gannon, D., and Shields, M.,
Conference on e-Science and Grid Computing (e-SCIENCE’06). 14. Eds., Springer.
11. EGEE Project. Enabling Grids for E-sciencE. http://www.eu-egee.org/. 35. OpenPBS. http://www.openpbs.org.
12. Foster, I., Freeman, T., Keahey, K., Scheftner, D., Sotomayer, B., and 36. Open Science Grid. http://www.opensciencegrid.org.
Zhang, X. 2006. Virtual Clusters for Grid Communities. In Proceedings
of the 6th IEEE International Symposium on Cluster Computing and 37. Panasas Inc. Panasas. http://www.panasas.com.
the Grid (CCGRID’06). 513-520. 38. Paul, R. W. G., Somerville, G., Day, S. M., and Olsen, K. B. 2006. Ground
13. Foster, I., Kesselman, C., and Tuecke, S. 2001. The anatomy of the grid: motion environment of the Los Angeles region. Structural Design Tall
Enabling scalable virtual organizations. Int. J. High Perform. Comput. Special Buildings 15. 483-494.
Appl. 15. 200-222. 39. Pegasus. http://pegasus.isi.edu.
14. FutureGrid. http://futuregrid.org/. 40. Piccoli, L. 2008. Lattice QCD workflows: A case study. In Challenging
15. Gilbert, L., Tseng, J., Newman, R., Iqbal, S., Pepper, R., Celebioglu, O., Issues in Workflow Applications (SWBES’08).
Hsieh, J., and Cobban, M. 2005. Performance implications of virtualiza- 41. Schmuck, F. and Haskin, R. 2002. GPFS: A shared-disk file system for
tion and hyper-threading on high energy physics applications in a grid large computing clusters. In Proceedings of the 1st USENIX Conference
environment. In Proceedings of the 19th IEEE International Parallel and on File and Storage Technologies.
Distributed Processing Symposium (IPDPS’05). 42. Skrutskie, M. F., Schneider, S. E., Stiening, R., Strom, S. E.,
16. Gluster Inc. GlusterFS. http://www.gluster.org. Weinberg, M. D., Beichman, C., Chester, T., Cutri, R., Lonsdale, C., and
Elias, J. 1997. The Two Micron All Sky Survey (2MASS): Overview and
17. Google App Engine. http://code.google.com/appengine/.
status. In The Impact of Large Scale Near-IR Sky Surveys, F. Garzon, F.,
18. Gropp, W., Lusk, E., and Skjellum, A. 1994. Using MPI: Portable Parallel Pro- et al, Eds., Kluwer Academic Publishing Company, Dordrecht. 25.
gramming with the Message Passing Interface. MIT Press, Cambridge, MA.
43. Sotomayor, B., Montero, R., Llorente, I., and Foster, I. 2008. Capacity
19. Hyperic Inc. CloudStatus. http://www.cloudstatus.com. leasing in cloud systems using the opennebula engine. In Cloud Comput-
20. InfiniBand Trade Association. InfiniBand. http://www.infinibandta. org/. ing and Applications.

21. Jones, M., Ludascher, B., Pennington, D., and Rajasekar, A. 2005. Data 44. Stevens, R. D., Robinson, A. J., and Goble, C. A. 2003. myGrid: Personalised
integration and workflow solutions for ecology. In Data Integration in bioinformatics on the information grid. Bioinformatics 19.
Life Sciences. 45. Sun Microsystems. Lustre. http://www.lustre.org.
22. Keahey, K. and Freeman, T. 2008. Contextualization: Providing one-click 46. Taylor, I., Shields, M., Wang, I., and Philp, R. 2003. Distributed P2P com-
virtual clusters. In Proceedings of the 4th International Conference on puting within Triana: A galaxy visualization test case. In Proceedings of
eScience (e-SCIENCE’08). the IEEE International Parallel and Distributed Processings Symposium
23. Ligon, W. B. and Ross, R. B. 1996. Implementation and performance (IPDPS’03).
of a parallel file system for high performance distributed applications. 47. TeraGrid. http://www.teragrid.org/.
In Proceedings of the 5th IEEE International Symposium on High Perfor-
48. Torque. http://supercluster.org/torque.
mance Distributed Computing. 471–480.
49. Wang, L., Tao, J., Kunze, M., Rattu, D., and Castellanos, A. C. 2008. The
24. Litzkow, M. J., Livny, M., and Mutka, M. W. 1988. Condor: A hunter of
Cumulus Project: Build a scientific cloud for a data center. In Cloud
idle workstations. In Proceedings of the 8th International Conference on
Computing and its Applications. Chicago.
Distributed Computing Systems. 104-111.
50. Xenidis, J. 2005. rHype: IBM research hypervisor. IBM Research.
25. Microsoft. Software as a service. http://www.microsoft.com/
serviceproviders/saas/default.mspx. 51. Youseff, L., Wolski, R., Gorda, B., and Krintz, C. 2006. Paravirtualization
for HPC systems. In Lecture Notes in Computer Science, vol. 4331, 474.
26. Montage. http://montage.ipac.caltech.edu.
52. Yu, W. and Vetter, J. S. 2008. Xen-based HPC: A parallel I/O perspective.
27. Myricom. Myrinet. http://www.myri.com/myrinet/.
In Proceedings of the 8th IEEE International Symposium on Cluster Com-
28. NASA Ames Research Center. Nebula. http://nebula.nasa.gov. puting and the Grid (CCGrid’08).

18 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


The Cloud at Work,
Interviews with Pete Beckman of Argonne National Lab
and Bradley Horowitz of Google
data, are located remotely, and scientists have to then work out proto-
cols, policies, and security measures necessary to run stuffs remotely
or to get to the data that is remote.
So, grid computing was focused primarily on sharing the resources
among the providers and the genesis of cloud computing came from
some technologies that allowed folks to do this in a very clear and
sandboxed way. They provide a very definitive and well-described
interface and also allow you to run a piece of code or software remotely.
Virtual machine technology in particular has made this much eas-
ier than in the past. Rather than the complicated nature of deciding
which software can be run and which packages are available, now you
are able to ship the entire virtual machine to other sites, thereby allow-
ing for utility computing, another way we refer to cloud computing.
The challenge that still remains is data, how to best share the data.

SN: What are they key new technologies behind cloud computing?
Can you talk a little about the different services available through
cloud computing?
PB: From a technical standpoint, the only one that has a strong root in
technology is the virtual machine-based “infrastructure as a service”
(IaaS). That’s the only one that has a technological breakthrough. All the
others are just a model breakthrough. In other words, the idea that I
could store my data locally or remotely has been around for a long time.
Pete Beckman,
The idea that I can create a web application that makes it look like
director of the Argonne Leadership
I’m running something locally when it is really remote—these are
Computing Facility, Argonne National Laboratory,
model differences in terms of APIs and providing a per-user capacity
interviewed by Sumit Narayan
and so forth. The technology, the one that is really a technological
breakthrough, is using and shipping around virtual machines.
Pete Beckman is the director of the Argonne Leadership Computing
An example of model breakthrough is what people are doing when
Facility at Argonne National Laboratory (ANL) in Illinois. Argonne
they say they are going to run their email on the cloud. To an organi-
National Lab is the United State’s first science and engineering research
zation, it looks like they have their emails present locally like they used
laboratory as well as home to one of the world’s fastest supercomputers.
to. This is because they never really had it close to them on a server
Beckman explains cloud computing from a scientist’s perspective,
down the hall. They were probably POP’ping, or IMAP’ping to a
and speculates where it might be headed next. (He also notes that server that was within their infrastructure, but probably in another
Argonne has a well-developed student internship program, but not building. When people move their email to the cloud, they are now
enough candidates!) getting that as a service remotely and are being charged an incremen-
—Sumit Narayan
tal fee, like a per-user fee.
Sumit Narayan: “Cloud computing” is the new buzz word among There is no provisioning that the site has to do for hosting and run-
computer scientists and technologists. It’s used in different ways to ning their own machines. To the user, it all looks the same, except that
define a variety of things. Tell us a little about the origins of cloud the IMAP server is now some other IMAP server. The people are
computing. What does cloud computing mean to you? doing the same for calendars, HR, travel, etc.
Pete Beckman: Distributed computing, which people have often So all these systems which used to reside locally in an IT organiza-
referred as “computing out there,” as opposed to “computing on your tion on-site are getting cloud versions, which essentially, only requires
local machine,” has been around for a very long time, somewhere a network connection.
around 20 years. Now the virtual machine part, that’s really a technological break-
We went from a period of distributed computing, to meta-com- through that allows me to run anything, not just what they provide in
puting, to grid computing, and now to cloud computing. They’re all a a package like POP or IMAP, but anything I want. That is unique and
little different, but the notion is that the services, either compute or the new thing over the last couple of years.

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 19


SN: Do you think the era of personal computers is coming to a close? machine means they don’t have to worry whether their package is sup-
Will we increasingly rely on cloud services like Google for search, com- ported, or if they have the right version of Perl, or if they added the
putation, on Dropbox/S3 for storage, or Skype for communication? Java bindings to MySQL, they can just put everything together in a vir-
PB: It is changing. Whether or not it comes to a close, probably not. tual machine and ship it around and run it wherever.
But it is changing dramatically.
Let me give you a couple of examples. There already are netbooks. SN: Is there anything about cloud computing that worries you, like
People are migrating into using a really cheap portable device. We security, infrastructure? What are the risks?
have had lightweight Linux for quite some time now. You may also PB: Security is very complex in a virtualized environment. It becomes
know about Google’s Chrome OS. a very big challenge and there are a couple of things we want to be able
I have one of the initial versions of a netbook, and it is really cool. to do. We really want to give people a total virtual machine. That
You can do simple student stuff: web, email, PDFs, Skype, basic means we would be giving them root access on a virtual machine.
spreadsheet, and Word documents. But now, a lot of people are using In the past, the language we have used to describe security has
Google Docs for that. If you look at the future of computing, if we been to say that there is a “user,” and an “escalated privileged user,” like
really get to a ubiquitous network and all these things rely on network, a root. All the documentation, discussion, cyber-security plans differ-
then these lightweight portal devices will become the way many peo- entiate those two very clearly. Users and escalated privileged user—
ple access their machine or their servers, or cloud services. administrator, or a root.
Some things still need intense local hacking, for example, image or In a virtualized environment, when you give someone a virtual
media editing. But, even those are making their way out into the cloud. machine, they have the root access on that virtual machine, not on the
There is a great Photoshop-as-a-service app, that lets you upload a pic- complete infrastructure. As an analogy, you can think of it as someone
ture and then using a web interface, change the color, scratch it, shrink being able to turn a mobile phone on and off. That’s administrator priv-
it, crop it, etc. And again, you don’t really require a high-end notebook ileges on the phone. But you don’t get to control the cell towers. They’re
for that. An inexpensive Atom processor on your netbook is enough. still controlled by someone else: the mobile phone service providers. So
With respect to media editing, Animoto is an example one that is this notion of security really has to change. We have a lot yet to explore
happening in the cloud. However, for most home users, there will still and change, and there will be a lot of research in that space.
be a gap. Uploading photos or videos of my kids to the cloud for edit- Another thing of course will be, if you do something that you are
ing is probably still out of reach. But not for long. not supposed to do when you are using a virtual machine, who is to
blame? Is it the users to whom you handed the virtual machine? Are
SN: Cloud computing is well suited for business needs. As a scientist, they responsible for all the security? If they upload their virtual
why do you think cloud computing is important? What opportunities machine that has a bunch of security holes in it and someone gets into
is cloud computing opening in the scientific community that weren’t that—how do we stop that? How do we manage that risk? How do we
supported by big computing clusters like Blue Gene? scan our own virtual machine? So that’s a pretty big research area, and
PB: Scientists have been using computers and hosting data remotely one we will be exploring at Argonne.
for a long time. They are used to this. It’s kind of funny when people
who are not from the scientific community visit Argonne, because SN: How do you think cloud computing would impact education?
they imagine scientists here come to use our supercomputer. The fact PB: Oh I think cloud computing is just amazingly fun and fantastic for
is, they don’t. They are able to do [their computing] using SSH on their education, largely because of its low barrier to entry. If you look at the
laptop from a coffee shop in Italy. The scientific community has been Beowulf, the cluster mailing lists, there are people who have set up
doing its work remotely for a long time, either on a supercomputer or their own clusters at various places. These are school-aged kids who
mid-range machines. have enormous amounts of energy, and they get a couple of old or new
But now instead of the science community provisioning all these computers and wire them together.
servers and mid-range computers, we will be able to allow the cloud Occasionally, you’ll see stories about folks who have a 16-node
to do that for us. cluster bought from the cheapest parts possible on the planet. These
Of course, supercomputers are still super. They are different from machines have no cases! They’re just sitting on a table with their
cloud resources that are primarily useful for mid-range computing. motherboards, and they work just fine.
Argonne has a project for cloud computing, Magellan, funded by There certainly is value in that sort of hacking, but a lot of colleges
the U.S. Department of Energy for research into the scientific com- do not have that sort of expertise. Yet, they want to teach parallel pro-
munity and the cloud as a way to free up scientists so that they are not cessing, MPI programming and scientific calculations, MATLAB.
provisioning, or setting up two or three servers down the hall. Cloud computing offers promise here.
The other thing that’s changing is, in the past, supercomputers had I can imagine in the future a student just being handed 100 credit-
a very well-defined set of software stacks—the package is either in the hours on the cloud. The professor would say, “We are going to do a
stack or not, in terms of support for that package. But with IaaS cloud homework assignment. We are going to write a code that calculates
architecture, scientists can customize and make their own special the surface area to volume ratio of this system with each particle mov-
stacks. We see this a lot in metagenomics and biology, where scientists ing around.” Now, each student has his or her own credit hours that he
have a very complex workflow of 10 different tools and they want to or she uses in the cloud to do the computation.
create a web interface for that. They want it all together in a package That’s the sort of thing where we are likely to be headed, and it’s
so that they can run their genomics application. Doing it in a virtual fantastic for universities! More and more students can get access to

20 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


The Cloud at Work

resources. You can do a simple MATLAB thing or parallel processing,


or write a new ray-tracer and do some visualization. However, it will
still be mid-range computing. It won’t have the impact of a 5,000-core
supercomputer that has Infiniband, but it’s a great way to get students
started and tinkering.
I can imagine five years from now universities routinely handing
out cloud credit hours to students.

SN: What are the opportunities for students to consider, both at ANL
and outside?
PB: Argonne has a lot of student interns. We have a fantastic student
program. Usually, our problem is that we cannot find enough students!
It’s not that we don’t have enough slots for undergraduates or gradu-
ates who are in computational science or computer science—we don’t
have enough students!
Argonne has a catalog (www.dep.anl.gov) that lists all its projects.
Students can essentially apply to a project for a summer position.
But with respect to the Magellan project, we have a web site that
we are still working on: www.megallen.alcf.anl.gov. There, we will
have a place to apply for cycles or time on the cloud machine. And if a
student has a fantastic idea for exploring cloud computing in some way
that benefits the lab and is in line with the mission of this lab in under-
standing cloud computing, then they can get time on the machine.
Bradley Horowitz,
SN: What is your vision for the future, with respect to cloud comput- vice president of Product Management, Google,
ing? Where do you see cloud computing in ten years? Fifty years? interview by Chris Heiden
PB: We are slowly moving to faster and faster networks, even with
respect to our homes. We can imagine that as we improve that last mile, Bradley Horowitz oversees product management for Google Apps,
more things will be stored in the cloud, everything from your home including Gmail, Calendar, Google Talk, Google Voice, Google Docs,
network and 802.11x, all the way to businesses relying on cloud serv- Blogger, and Picasa. Before joining Google, he led Yahoo!’s advanced
ices that will be spread out around multiple data servers. You’re never development division, which developed new products such as Yahoo!
going to be without email, your pictures, your data because it is repli- Pipes, and drove the acquisition of products such as Flickr and
cated somewhere in the cloud. We are rapidly moving towards that. MyBlogLog.
Now, providing for that mid-range computing is where the science Previously, he was co-founder and CTO of Virage, where he oversaw
will go: We will be hosting climate data, earth data, and so forth, and the technical direction of the company from its founding through its IPO
we’ll allow scientists to slice and dice and explore the data in the cloud and eventual acquisition by Autonomy. Horowitz holds a bachelor’s
here at Argonne and elsewhere. There will always be the need for a degree in computer science from the University of Michigan, and a mas-
supercomputer that goes beyond commodity inexpensive computing ter’s degree from the MIT Media Lab and was pursuing his PhD there
to provide a capability that you can do in a high-end environment. when he co-founded Virage. Here, he discusses the issues affecting cloud
You probably have read stories about Google’s data centers and computing and how we must address them going forward. The ubiqui-
how cheap they are. Google carefully calculates every penny spent, in tous nature of computing today is a perfect fit for computing in the cloud.
terms of motherboards and CPUs and hard disks. For high-perform- —Chris Heiden
ance computing, we do that too, but on a different level, optimizing
different variables. We’re not optimizing the cost to run queries per Chris Heiden: Describe your background and how it brought you into
second that can force an embarrassingly parallel search, but instead, cloud computing.
we’re trying to figure out how fast we can simulate 100 years of climate Bradley Horowitz: Much of my research has involved looking at what
change. We need specialized architecture for that, and we will always computers do well and what people do well, and figuring out how to
need high-end architectures like Blue Gene that are low-powered but marry the two most effectively. Over time, that line has shifted, and
still massively parallel. computers now do many things that used to require a lot of manual
In the future, we will probably see a move toward cloud computing effort. Much of it requires very large networked resources, and very
for mid-range capabilities like email, calendaring, and other services large data sets. A good example is face recognition. These kinds of
that businesses now sometimes host on site. And, we will also have tasks are much more easily accomplished in the cloud, both in terms of
space where we will solve the world’s most challenging problems in cli- computing power and data sets. And Google runs one of the largest—
mate, materials, biology or genomics on very high-performance if not the largest—“cloud computers” on the planet. So it’s great to be
machines like exa-scale machines. able to build applications that run on this massively scaled architecture.

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 21


More broadly, I’ve always been interested in the way people social- pens behind closed doors. The nice thing about cloud computing is that
ize and collaborate online. The most interesting part of cloud com- all these service providers are so publicly accountable for even the slight-
puting is in the network and the interaction between computers—not est glitch. There’s no tolerance for downtime, which is great for users.
in individual computers themselves. Google’s cloud is made up of a highly resilient network of thou-
sands and thousands of disposable computers, with apps and data
CH: What do you see as the single most important issue affecting spread out on them across geographies. Gmail, for example, is repli-
cloud computing today? cated live across multiple datacenters, so if a meteor hits one, the app
BH: The network is getting exponentially faster. It’s Moore’s Law and data keeps on running smoothly on another.
meeting Nielsen’s Law. Not only can you pack tremendous computing And efforts like the Data Liberation Front and Google Dashboard
power into a mobile phone to access the cloud anywhere, but band- make it clear that users maintain control of their data, even when
width is growing exponentially as well, roughly 50 percent per year. they’re using the cloud. They can take their stuff to go anytime they like,
And it’s starting to be available everywhere. At this point you and they can always see how the data is being used. The value to you if
expect the network to be fast and always-on. Airplanes used to be you’re a cloud provider is not in locking in users’ data; rather, it’s in the
off-limits to cloud computing, but not anymore. WiFi is becoming service, the flow of data, transactions, choices. Lock-in and closed for-
standard on planes. People now buy phones expecting 3G data con- mats are a losing strategy. They make you lazy as a provider. And on the
nections—talking has become the “oh yeah” feature at the bottom of a web, laziness is deadly, since competitors are a click away.
list of a dozen web and data features.
CH: Describe how much of an impact cloud computing will have in
CH: What are some of the aspects of cloud computing that you are the next evolution in computing. How will it affect the everyday com-
working on that will revolutionize the field? puter user?
BH: Everything “cloud” is really about collaboration. It’s not just a BH: This shift to cloud computing has already started invisibly for
larger computer—cloud computing gives birth to a new way of work- most users. Most people use webmail without realizing it’s essentially
ing, where the stuff you produce lives in the center of a social circle a “cloud” app. That may be how they start using other cloud apps. They
and can be collaborated on by everyone in that circle. We’re trying to get invited to collaborate on a document online, and they use that doc
make that collaboration  process every day, and then another, and
more seamless, so sharing is easy
and you can work on stuff togeth-
❝ Lock-in and closed formats are a losing one day they wake up and realize it’s
strategy. They make you lazy as a provider. been months since they’ve opened
er without worrying about the up their desktop software.
And on the web, laziness is deadly, since
mechanics of version control and But it’s going to start getting
invites and the like. competitors are a click away.
❞ more obvious as people switch to
I’m not sure people have quite —Bradley Horowitz netbooks  and  smartphones, and
grasped how much of computing as they stop having to worry about
can and will shift to the cloud. It’s not that there isn’t ever-growing all the mechanics of backing of their discs, worrying about where they
computing power on the desktop or in beefy all-purpose servers. But stored something, or hassling with document versions. It’s already
the part of computing people really care about—that developers are making daily life more mobile and more fluid. Even the word developer
developing for, and that yields the most interesting real-world appli- is becoming more and more synonymous with web developer. The web
cations—is the networked part. As an example, you don’t use Google is now the primary platform for building apps, not an afterthought you
Docs to write solo reports and print them out to stick on a shelf some- hook into an app after it’s built.
where. You use it to get 10 people to hack out a plan everyone’s happy Cloud computing is already changing the way businesses, govern-
with in one hour. It’s the networked process that’s revolutionary, not ments, and universities run. The City of Los Angeles just switched to
the app used in isolation. the cloud using Google Apps. Philadelphia International Airport cuts
down on delays and costs by coordinating their operations using
CH: What about concerns people have voiced about trusting cloud Google Docs, and they’re looking at technologies like Google Wave too.
computing? Do you see these concerns as slowing adoption? And it’s individuals, too. We hear about soldiers in the desert in
BH: I’d flip that on its head. The providers actually can’t keep up with Iraq keeping in touch with their families using video chat in Gmail,
user and customer demand. They can’t build this stuff fast enough for and checking their voicemail on Google Voice.
what people want to do in the cloud. At universities, for example, it’s The movement away from the desktop and on-premise solutions
students demanding their administrators switch to Gmail. Universities has been sluggish for software makers with entrenched interests on the
like Arizona State and Notre Dame have switched to cloud-based email, desktop. They’re moving so slowly compared to what users want.
and we’re seeing big businesses like Motorola making the switch, too. These fast, lightweight, social, mobile apps are what people are actually
It helps that the stats are finally starting to get out comparing apples using to communicate with each other and get work done. They
to apples. Desktop and on-premises applications break down far more sprouted in the cloud, they’re built to run in the cloud, and they’re now
often than web apps, even if you don’t hear about it as often since it hap- “growing down” into big enterprises, driven by user demand.

22 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


State of Security Readiness
By Ramaswamy Chandramouli and Peter Mell

C
loud computing is a model for enabling convenient, on-demand network access to a shared
pool of configurable computing resources that can be rapidly provisioned and released with
minimal management effort or service provider interaction. With this pay-as-you-go model
of computing, cloud solutions are seen as having the potential to both dramatically reduce costs and
increase the rapidity of development of applications.
However, the security readiness of cloud computing is commonly Infrastructure as a service (IaaS). The capability provided to the
cited among IT executives as the primary barrier preventing organi- consumer is provision processing, storage, networks, and other fun-
zations from immediately leveraging this new technology. These prob- damental computing resources where the consumer is able to deploy
lems are real and arise from the nature of cloud computing: broad and run arbitrary software, which can include operating systems and
network access, resource pooling, and on-demand service. applications. The consumer does not manage or control the underly-
In this article, we survey some of these challenges and the set of ing cloud infrastructure but has control over operating systems, stor-
security requirements that may be demanded in the context of various age, deployed applications, and possibly limited control of select
cloud service offerings (noted in the article as No. 1, No. 2, and so on). networking components (for example, host firewalls). Examples of this
The security challenges and requirements we survey not only involve include the case of a cloud provider providing physical and virtual
core security operations, such as encryption of data at rest and in tran- hardware (servers, storage volumes) for hosting and linking all enter-
sit, but also contingency-related operations, such as failover measures. prise applications and storing all enterprise data—in other words, the
The survey touches upon the various artifacts or entities involved infrastructure backbone for an enterprise’s data center.
in IT services, such as the users, data, applications, computing plat-
forms and hardware. We call the enterprise or government agency Survey of Security Challenges
subscribing to the cloud services as the “cloud user” and the entity In reviewing the security challenges and requirements of cloud com-
hosting the cloud services as the “cloud provider.” puting, we will look first at the necessary interactions between the cloud
To further refine the definition of cloud computing presented users, the users’ software clients, and the cloud infrastructure or services.
above, we classify cloud computing service offerings into three serv-
ice models. The Users
When an enterprise subscribes to a cloud service, it may have a diverse
Service Models user base consisting of not only its own employees but also its part-
Software as a service (SaaS). The capability provided to the consumer ners, suppliers, and contractors. In this scenario, the enterprise may
is the use of a provider’s applications running on a cloud infrastruc- need an effective identity and access management function and there-
ture. The applications are accessible from various client devices fore require the following security requirements:
through a thin client interface such as a web browser. The consumer
does not manage or control the underlying cloud infrastructure,
• support for a federation protocol for authentication of users (No. 1)
and
including network, servers, operating systems, storage, or even indi-
vidual application capabilities, with the possible exception of limited • support for a standardized interface to enable the cloud user (or the
user-specific application configuration settings. Examples of this cloud user’s system administrator) to provision and de-provision
include the case of a cloud provider offering a software application members of their user base (No. 2).
used for a specific business function, such as customer relationship
management or human resources management, on a subscription or Many commercial cloud services are now beginning to provide
usage basis rather than the familiar purchase or licensing basis. support for the security assertion markup language (SAML) federa-
Platform as a service (PaaS). The capability provided to the con- tion protocol (which contains authentication credentials in the form
sumer is the deployment of consumer-created or acquired applica- of SAML assertions) in addition to their own proprietary authentica-
tions onto the cloud infrastructure. These applications are created tion protocol, and hence we do not see a big obstacle in meeting the
using programming languages and tools supported by the provider. first of the above requirements.
The consumer does not manage or control the underlying cloud infra- As far as the user provisioning and de-provisioning requirement is
structure including network, servers, operating systems, or storage, concerned, many of the cloud providers still use their own proprietary
but has control over the deployed applications and possibly applica- interfaces for user management. There exist common, machine-neu-
tion hosting environment configurations. Examples of this include the tral formats or XML vocabularies for expressing user entitlements or
case of a cloud provider providing a set of tools for developing and access policies, such as the extensible access control markup language
deploying applications using various languages (for example, C, C++, (XACML), and for user provisioning and de-provisioning with capa-
Java) under a whole application framework (JEE, .NET, and so forth). bilities such as the service provision markup language (SPML). Until

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 23


Ramaswamy Chandramouli and Peter Mell

the user management interface of the cloud provider provides sup- In some cases, the data protection may also call for capabilities for
ports for these kinds of protocols, the cloud user’s control of this segmenting data among various cloud storage providers. As a result,
important security function cannot be realized. secure and rapid data backup and recovery capabilities should be pro-
vided for all mission-critical data (No. 7), and common APIs should
Access to Data be required to migrate data from one cloud storage provider to
Data is an enterprise’s core asset. What are the security challenges and re- another (No. 8).
quirements surrounding access to data stored in the cloud infrastructure?
Driven by citizen safety and privacy measures, government agen- Vulnerabilities for PaaS
cies and enterprises (for example, healthcare organizations) may When developing applications in a PaaS cloud environment, especially
demand of a SaaS, PaaS, or IaaS cloud provider that the data pertain- for PaaS solutions, what might leave the application security vulnerable?
ing to their applications be: Vulnerabilities represent a major security concern whether applications
are hosted internally at an enterprise or offered as a service in the cloud.
• hosted in hardware located within the nation’s territory or a specific
In the cloud environment, the custom applications developed by
region, for example, for disaster recovery concerns (No. 3), and
the cloud user are hosted using the deployment tools and run time
• protected against malicious or misused processes running in the libraries or executables provided by the PaaS cloud provider. While it
cloud (No. 4). is the responsibility of cloud users to ensure that vulnerabilities such as
buffer overflows and lack of input validation are not present in their
For many cloud providers, hosting hardware within a specific custom applications, they might expect similar and additional proper-
region can be done easily. However, protecting the data itself from ties, such as lack of parsing errors and immunity to SQL injection
malicious processes in the cloud is often more difficult. For many attacks, to be present in the application framework services provided
cloud providers, the competitiveness of the service offering may by a PaaS cloud provider.
depend upon the degree of multi-tenancy. This represents a threat Additionally, they have the right to expect that persistent programs
exposure as the many customers of a cloud could potentially gain con- such as web servers will be configured to run not as a privileged user
trol of processes that have access to other customers’ data. (such as root). Further, the modern application frameworks based on
Given the challenges in service oriented architec-
protecting access to cloud
data, encryption may pro- ❝Security readiness is commonly cited
among IT executives as the primary barrier
tures provide facilities for
dynamically linking ap-
vide additional levels of plications based on the
security. Some enterpris- dynamic discovery capa-
es, due to sensitive or pro-
preventing organizations from immediately bilities provided by a per-
prietary nature of data
and due to other protec-
leveraging cloud computing.
❞ sistent program called the
Directory Server. Hence
tion requirements such as intellectual property rights, may need to pro- this directory server program also needs to be securely configured.
tect the confidentiality of data and hence may require that both data in Based on the above discussion, two security requirements may
transport and data at rest (during storage) be encrypted (Nos. 5 and 6). arise from cloud users. First, the modules in the application frame-
While encryption of data in transit can be provided through vari- work provided are free of vulnerabilities (No. 9). Second, persistent
ous security protocols such as transport layer security and web serv- programs such as web servers and directory servers are configured
ices-security based on robust cryptographic algorithms, encryption of properly (No. 10).
data at rest requires the additional tasks of key management (for The biggest business factors driving the use of IaaS cloud providers
example, key ownership, key rollovers, and key escrow). The cloud is the high capital costs involved in purchase and operation of high
environment has a unique ownership structure in the sense that the performance servers and the network gears involved in linking up the
owner of the data is the cloud user while physical resources hosting servers to form a cluster to support compute-intensive applications.
the data are owned by the cloud provider. In this environment, best The economy of service offered by an IaaS cloud provider comes from
practices for key management have yet to evolve, and this is one of the the maximum utilization of physical servers and hence it is difficult to
areas the standard bodies or industry consortiums have to address in think of an IaaS cloud offering without a virtual machine.
order to meet the encryption requirements of data at rest. While it’s critical in PaaS to offer services to ensure the security of
Data protection, depending upon the criticality of data, may call for developed applications, in IaaS it’s critical for the cloud provider to
either periodical backups or real time duplication or replication. This is rent to the users secure operating systems. IaaS cloud providers usu-
true in any enterprise IT environment. Hence the cloud user has to look ally offer a platform for subscribers (cloud users) to define their own
for these capabilities in an IaaS provider offering storage service. We virtual machines to host their various applications and associated data
will call this subclass of IaaS cloud provider a cloud storage provider. by running a user-controlled operating system within a virtual
Further, if the cloud storage provider has experienced a data breach machine monitor or hypervisor on the cloud provider’s physical
or if the cloud user is not satisfied with the data recovery features or servers. In this context, a primary concern of a subscriber to an IaaS
data availability (which is also a security parameter) provided by that cloud service is that their virtual machines are able to run safely with-
organization, the latter should have the means to rapidly migrate the out becoming targets of an attack, such as a side channel attack, from
data from one cloud storage provider to another. rogue virtual machines collocated on the same physical server.

24 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


State of Security Readiness

If cloud users are not satisfied with the services provided by the issues. While some issues may have ready answers (such as existing
current cloud provider due to security or performance reasons, they security standards), others may be more problematic (such as threat
should have the capability to de-provision the virtual machines from exposure due to multi-tenancy).
the unsatisfactory cloud provider and provision them on a new cloud The ultimate answer is almost certainly multifaceted. Technical
provider of their choice. Users may need to migrate from one virtual solutions will be discovered and implemented. Security standards will
machine to another in real time, so as to provide a seamless comput- enable new capabilities. Finally, differing models and types of clouds
ing experience for the end users. will be used for data of varying sensitivity levels to take into account
These needs translate to the following security requirements: the residual risk.
• the capability to monitor the status of virtual machines and gener-
ate instant alerts (No. 11),
Biographies
Dr. Ramaswamy Chandramouli is a supervisory computer scientist in
• the capability for the user to migrate virtual machines (in non-real the Computer Security Division, Information Technology Laboratory at
time) from one cloud provider to another (No. 12), and NIST. He is the author of two text books and more than 30 peer-re-
• the capability to perform live migration of VMs from one cloud viewed publications in the areas of role-based access control models,
provider to another or from one cloud region to another (No. 13). model-based test development, security policy specification and vali-
dation, conformance testing of smart card interfaces and identity man-
Tools to continuously monitor the vulnerabilities or attack on vir- agement. He holds a PhD in information technology security from
tual machines running on a server have already been developed or are George Mason University.
under development by many vendors, and hence the first of the above Peter Mell is a senior computer scientist in the Computer Security Di-
requirements can be easily met. Large scale adoption of virtual vision at the NIST, where he is the cloud computing and security proj-
machine import format standards such as open virtualization format ect lead, as well as vice chair of the interagency Cloud Computing Ad-
will enable the user to rapidly provision virtual machines into one visory Council. He is also the creator of the United States National
cloud provider environment and de-provision at another cloud Vulnerability Database and lead author of the Common Vulnerabil-
provider environment which is no longer needed by the cloud user ity Scoring System (CVSS) version 2 vulnerability metric used to secure
and thus meet the second requirement above. credit card systems worldwide.
Further, a virtual machine migrated using a common import for-
mat should not require extensive time to reconfigure under the new
environment. Hence common run time formats are also required to
enable the newly migrated virtual machine to start execution in the
new environment. Live migration of virtual machines (in situations of
peak loads) is now possible only if the source and target virtual
machines run on physical servers with the same instruction set archi-
tecture. The industry is already taking steps to address this limitation.
However, since the majority of virtualized environments run the x86
ISA, this is not a major limitation.

Standards
With respect to standards and cloud security readiness, we have made
four major observations.
First, some requirements are already met today using existing stan-
dards (such as Federation protocols for authentication) and technolo-
gies (automatic real-time duplication of data for disaster recovery).
Second, some requirements can be met if there is more market support
for existing standards (XACML and SPML for user provisioning, open
virtualization format for virtual machines migration). Third, some
requirements such as data location and non multi-tenancy can be met
by restructuring cost models for associated cloud service offerings.
And fourth, some requirements can only be met by developing new
standards (common run time formats for virtual machines, common
APIs for migration of data from one cloud storage provider to another).
While cloud computing presents these challenges, it has the poten-
tial to revolutionize how we use information technology and how we
manage datacenters. The impact may be enormous with respect to IT
cost reduction and increased rapidity and agility of application deploy-
ment. Thus, it is critical that we investigate and address these security

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 25


The Business of Clouds
By Guy Rosen

A
t the turn of the 20th century, companies stopped generating their own power and plugged
into the electricity grid. In his now famous book The Big Switch, Nick Carr analogizes those
events of a hundred years ago to the tectonic shift taking place in the technology industry today.
Just as with electricity, businesses are now turning to on-demand, In an attempt to shed some light on de facto adoption of cloud
mass-produced computing power as a viable alternative to maintain- infrastructure, I conducted some research during 2009 that tries to
ing their IT infrastructure in-house. answer these questions.
In this article, we’ll try to hunt down some hard data in order to The first study, a monthly report titled “State of the Cloud” (see
shed some light on the magnitude of this shift. We’ll also take a look at www.jackofallclouds.com/category/state-of-the-cloud/), aims to esti-
why it is all so significant, examining what the cloud means for busi- mate the adoption of cloud infrastructure among public web sites. It’s
nesses and how it is fueling a new generation of tech startups. relatively straightforward to determine whether a given site is running
on cloud infrastructure, and if so, from which provider, by examining
What is the Cloud? the site’s DNS records as well as the ownership of its IP. Now all we
While the exact definition of cloud computing is subject to heated de- need is a data set that will provide a large number of sites to run this
bate, we can use one of the more accepted definitions from NIST, which test on. For this, we can use a site listing such as that published by
lays out five essential characteristics: on-demand self-service, broad marketing analytics vendor Quantcast.
network access, resource pooling, rapid elasticity and measured service. Quantcast makes available a ranked list of the Internet’s top mil-
Of particular interest to us are the three service models NIST describes: lion sites (see www.quantcast.com/top-sites-1). To complete this sur-
Infrastructure as a service (IaaS) displaces in-house servers, storage vey, we’ll test each and every one of the sites listed and tally the total
and networks by providing those resources on-demand. Instead of pur- number of sites in the cloud and the total number of sites hosted on
chasing a server, you can now provision one within minutes and dis- each provider.
card it when you’re finished, often paying by the hour only for what you In practice the top 500,000 of these million were used.
actually used. (See also “Elasticity in the Cloud,” page 3, for more.) The caveat to this technique is that it analyzes a particular cross
Platform as a service (PaaS) adds a layer to the infrastructure, pro- section of cloud usage and cannot pretend to take in its full breadth.
viding a platform upon which applications can be written and deployed. Not included are back end use cases such as servers used for develop-
These platforms aim to focus the programmers on the business logic, ment, for research or for other internal office systems. This adoption
freeing them from the worries of the physical (or virtual) infrastructure. of the cloud among enterprises and backend IT systems has been
Software as a service (SaaS) refers to applications running on cloud likened to the dark matter of the universe—many times larger but
infrastructures, typically delivered to the end user via a web browser. nearly impossible to measure directly.
The end-user need not understand a thing about the underlying infra- For now, let’s focus on the achievable and examine the results for
structure or platform! This model has uprooted traditional software, the high-visibility category of public web sites. See Figures 1 and 2.
which was delivered on CDs and required installation, possibly even From this data, we can draw two main conclusions:
requiring purchase of a server to run on. First, on the one hand, cloud infrastructure is in its infancy with a
small slice of the overall web hosting market. On the other hand, the
The Hype cloud is growing rapidly. So rapidly in fact, that Amazon EC2 alone
Research outfit Gartner describes cloud computing as the most hyped grew 58 percent in the four months analyzed, equivalent to 294 per-
subject in IT today. IDC, another leading firm, estimated that cloud IT cent annual growth.
spending was at $16 billion in 2008 and would reach $42 billion by
2012. Using Google Trends, we can find more evidence of the growing
interest in cloud computing by analyzing search volume for the term
cloud computing.
It’s extraordinary that a term that was virtually unheard of as
recently as 2006 is now one of the hottest areas of the tech industry.

The Reality
The big question is whether cloud computing is just a lot of hot air. To
add to the mystery, hard data is exceedingly hard to come by. Amazon,
the largest player in the IaaS space, is deliberately vague. In its finan-
cial reports, the revenues from its IaaS service are rolled into the
“other” category. Figure 1: Amazon EC2 has a clear hold on the cloud market.

26 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


The Business of Clouds

Figure 3: The chart shows resource usage of Amazon EC2 in the


Figure 2: The top 500,000 sites by cloud provider are shown. eastern United States in September 2009 over a 24-hour period.

Second, Amazon EC2 leads the cloud infrastructure industry by a these numbers represent the number of resources created and do not
wide margin. Amazon is reaping the rewards of being the innovating provide clues to how many of them exist at any given point in time, be-
pioneer. Its first cloud service was launched as early as 2005 and the cause we do not know which resources were later destroyed and when.
richness of its offering is at present unmatched. The above view is of a single 24-hour period. RightScale, a com-
The second study we’ll discuss examined the overall usage of pany that provides management services on top of IaaS, collected IDs
Amazon EC2, based on publicly-available data that had been over- from the logs it has stored since its inception and broadened the analy-
looked. Every time you provision a resource from Amazon EC2 (for sis to a much larger timeframe—almost three years (see http://blog.
example, request to start a new server instance) that resource is rightscale. com/2009/10/05/amazon-usage-estimates).
assigned an ID, such as i-31a74258. The ID is an opaque number that With this perspective, we can clearly witness the substantial
is used to refer to the resource in subsequent commands. growth Amazon EC2 has seen since its launch, from as little as a few
In a simplistic scenario, that ID would be a serial number that hundred instances per day in the early days to today’s volumes of
increments each time a resource is provisioned. If that were the case, 40,000-50,000 daily instances and more.
we could perform a very simple yet powerful measurement: we could
provision a resource and record its ID at a certain point in time. Is the Cloud Good for Business?
Twenty-four hours later, we could perform the same action, again What’s driving adoption is the business side of the equation. We can
recording the ID. The difference between the two IDs would represent fold the benefits of the cloud into two primary categories: economics
the number of resources provisioned within the 24-hour period. and focus.
The first and foremost of the cloud’s benefits is cost. Informal polls
Unfortunately, at first glance Amazon EC2’s IDs appear to have no
among customers of IaaS suggest that economics trumps all other fac-
meaning at all and are certainly not trivial serial numbers. A mixture
tors. Instead of large up-front payments, you pay as you go for what
of luck and investigations began to reveal patterns in the IDs. One by
you really use. From an accounting point of view, there are no assets
one, these patterns were isolated and dissected until it was discovered
on the company’s balance sheet: CAPEX (capital expenditure)
that underlying the seemingly opaque ID there is, after all, an incre-
becomes OPEX (operating expenditure), an accountant’s dream!
menting serial number.
When it comes to IT, economies of scale matter. Maintaining
For example, the resource ID 31a74258 can be translated to reveal
10,000 servers is cheaper per server than maintaining one server alone.
the serial number 00000258. (This process was published in detail and
Simple geographic factors also come into play: whereas an on-premise
can be found in the blog post Anatomy of an Amazon EC2 Resource
server must be powered by the electricity from your local grid, cloud
ID: www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-
datacenters are often constructed near sources of low-cost electricity
resource-id.) With these serial numbers now visible, we can perform such as hydroelectric facilities (so the cloud is good for the environ-
our measurement as described above. Indeed, during a 24-hour period ment as well). These cost savings can then be passed on to customers.
in September 2009 the IDs for several types of resources were The second reason you might use the cloud is in order to focus on
recorded, translated and the resource usage calculated from the dif- your core competencies and outsource everything else. What is the
ferences. See Figure 3. benefit of holding on to on-premise servers, air conditioned server
Over the 24-hour period observed, the quantities of resources pro- rooms and enterprise software—not to mention the IT staff necessary
visioned were: to maintain them—when you can outsource the lot? In the new model,
• Instances (servers): 50,242 your company, be it a legal firm, a motor company, or a multinational
bank, focuses on its core business goals. The cloud companies, in turn,
• Reservations (atomic commands to launch instances): 41,121
focus on their core competency, providing better, more reliable and
• EBS volumes (network storage drives): 12,840 cheaper IT. Everyone wins.
• EBS snapshots (snapshots of EBS volumes): 30,925.
Start-Up Companies Love the Cloud
These numbers are incredible to say the least. They show the use of One sector particularly boosted by cloud computing is the tech start-
Amazon EC2 to be extensive as well as dynamic. We should recall that up space. Just a few years ago, building a web application meant you

Crossroads www.acm.org/crossroads Spring 2010/ Vol. 16, No. 3 27


Guy Rosen

had to estimate (or guesstimate) the computing power and bandwidth It’s become so cheap to take a shot that more and more entrepreneurs
needed and purchase the necessary equipment up front. In practice are choosing the bootstrap route, starting out on their own dime.
this would lead to two common scenarios: When they do seek external investment, they find that investors are
forking over less and less in initial funding, out of realization that it
1) Underutilization: Before that big break comes and millions come
now takes less to get a start-up off the ground.
swarming to your web site, you’re only using a small fraction of the
resources you purchased. The rest of the computing power is sitting
idle—wasted dollars.
A Bounty of Opportunity
Cloud computing isn’t just an enabler for start-ups—the number of
2) Overutilization: Finally, the big break comes! Unfortunately, it’s big- start-ups providing cloud-related services is growing rapidly, too. The
ger than expected and the servers come crashing down under the colossal change in IT consumption has created a ripe opportunity for
load. To make up for this, teams scramble to set up more servers and small, newly formed companies to outsmart the large, well-estab-
the CEO, under pressure, authorizes the purchase of even more lished, but slow-to-move incumbents.
costly equipment. To make things worse, a few days later the surge The classic opportunity is in SaaS applications at the top of the
subsides and the company is left with even more idle servers. cloud stack. The existing players are struggling to rework their tradi-
tional software offerings into the cloud paradigm. In the meantime,
If there’s something start-up companies don’t have much of, it’s start-ups are infiltrating the market with low-cost, on-demand alter-
money, particularly up front. Investors prefer to see results before natives. These start-ups are enjoying both sides of the cloud equation:
channeling additional funds to a company. Additionally, experience on the one hand the rising need for SaaS and awareness of its validity
shows that new companies go through a few iterations of their idea
from consumers; on the other hand the availability of PaaS and IaaS
before hitting the jackpot. Under this assumption, what matters is not
which lower costs and reduce time-to-market. Examples of such
to succeed cheaply but to fail cheaply so that you have enough cash
organizations include Unfuddle (SaaS-based source control running
left for the next round.
on the Amazon EC2 IaaS) and FlightCaster (flight delay forcaster run-
Along comes cloud computing. Out goes up-front investment and
ning on the Heroku PaaS).
in comes pay-per-use and elasticity. This elasticity—the ability to scale
The second major opportunity is down the stack. Although pro-
up as well as down—leaves the two scenarios described above as moot
viding IaaS services remains the realm of established businesses, a cat-
points. Before the big break, you provision the minimal number of
egory of enabling technologies is emerging. Users of IaaS tend to need
required servers in the cloud and pay just for them. When the floods
more than what the provider offers, ranging from management and
arrive, the cloud enables you to provision as many resources as needed
integration to security and inter-provider mechanisms. The belief
to handle the load, so you pay for what you need but not a penny more.
After the surge, you can scale your resources back down. among start-ups and venture capitalists alike is that there is a large
One of the best-known examples of this is a start-up company market for facilitating the migration of big business into the cloud.
called Animoto. Animoto is a web-based service that generates ani- Examples of such companies include RightScale, Elastra, and my own
mated videos based on photos and music the user provides. Video start-up, Vircado.
generation is a computation-intensive operation, so computing power The third and final category of start-ups aims to profit from the
is of the utmost importance. increased competition between IaaS providers. These providers are in
At first, Animoto maintained approximately 50 active servers run- a constant race to widen their portfolio and lower their costs. Start-
ning on Amazon EC2, which was enough to handle the mediocre suc- ups can innovate and be the ones to deliver that sought-after edge, in
cess they were seeing at the time. Then, one day, its marketing efforts areas ranging from datacenter automation to virtualization technolo-
on Facebook bore fruit, the application went viral, and the traffic went gies and support management. Examples in this category include
through the roof. Over the course of just three days, Animoto scaled Virtensys and ParaScale.
up its usage to 3,500 servers. How would this have been feasible, prac- I for one am convinced that beyond the hype and excitement the
tically or economically, before the age of cloud computing? Following world of IT is undergoing a very real period of evolution. Cloud com-
the initial surge, traffic continued to spike up and down for a while. puting is not a flash flood: it will be years before its full effect is realized.
Animoto took advantage of the cloud’s elasticity by scaling up and
down as necessary, paying only for what they really had to. Biography
The Animoto story illustrates the tidal change for start-ups. It’s not Guy Rosen is co-founder and CEO of Vircado, a startup company in
surprising to see, therefore, that the number of such companies is con- the cloud computing space. He also blogs about cloud computing at
sistently on the rise. If you like, cloud computing has lowered the price JackOfAllClouds.com, where he publishes original research and analy-
of buying a lottery ticket for the big game that is the startup economy. sis of the cloud market.

28 Spring 2010/ Vol. 16, No. 3 www.acm.org/crossroads Crossroads


acm STUDENT MEMBERSHIP APPLICATION AND ORDER FORM
Join ACM online: www.acm.org/joinacm
CODE: CRSRDS
Name Please print clearly INSTRUCTIONS

Address Carefully complete this application and return


with payment by mail or fax to ACM. You must
City State/Province Postal code/Zip be a full-time student to qualify for student rates.

Country E-mail address CONTACT ACM

Area code & Daytime phone Fax Member number, if applicable phone: 800-342-6626
(US & Canada)
MEMBERSHIP BENEFITS AND OPTIONS +1-212-626-0500
• Electronic subscription to Communications of • ACM e-news digest TechNews (thrice weekly) (Global)
the ACM magazine hours: 8:30am–4:30pm
• Free e-mentoring service from MentorNet US Eastern Time
• Electronic subscription to Crossroads magazine • ACM online newsletter MemberNet (twice monthly) fax: +1-212-944-1318
• Free software and courseware through the ACM • Student Quick Takes, ACM student e-newsletter email: acmhelp@acm.org
Student Academic Initiative (quarterly) mail: Association for Computing
• 2,500 online courses in multiple languages, • ACM's Online Guide to Computing Literature Machinery, Inc.
General Post Office
1,000 virtual labs and 500 online books • Free "acm.org" email forwarding address plus P.O. Box 30777
• ACM's CareerNews (twice monthly) filtering through Postini New York, NY 10087-0777

PLEASE CHOOSE ONE: For immediate processing, FAX this


application to +1-212-944-1318.
❏ Student Membership: $19 (USD)
❏ Student Membership PLUS Digital Library: $42 (USD)
❏ Student Membership PLUS Print CACM Magazine: $42 (USD) PAYMENT INFORMATION
❏ Student Membership w/Digital Library PLUS Print CACM Magazine: $62 (USD) Payment must accompany application

Member dues ($19, $42, or $62) $


P R I N T P U B L I C AT I O N S Please check one
Check the appropriate box and calculate Issues Member Member Rate To have Communications of the ACM
amount due. per year Code Rate PLUS Air* sent to you via Expedited Air Service,
• acmqueue (online only) 6 143 Visit www.acmqueue.org for more info. add $50 here (for residents outside of
• Computers in Entertainment (online only) 4 247 $44 ❐ N/A $
North America only).
Computing Reviews 12 104 $60 ❐ N/A
• Computing Surveys 4 103 $37 ❐ $62 ❐ $
• Crossroads 4 XRoads $35 ❐ $53 ❐ Publications
• interactions (included in SIGCHI membership) 6 123 $55 ❐ $84 ❐
Total amount due $
Int’l Journal of Network Management (Wiley) 6 136 $85 ❐ $110 ❐
Int’l Journal on Very Large Databases 4 148 $85 ❐ $110 ❐
• Journal of Educational Resources in Computing (see TOCE) 4 239 N/A N/A Check or money order (make payable to ACM,
• Journal of Experimental Algorithmics (online only) 12 129 N/A N/A Inc. in U.S. dollars or equivalent in foreign currency)
• Journal of Personal and Ubiquitous Computing 6 144 $86 ❐ $119 ❐
$56 ❐ $107 ❐
• Journal of the ACM
• Journal on Computing and Cultural Heritage
6
4
102
173 $50 ❐ $68 ❐
❏ Visa/Mastercard ❏ American Express
• Journal on Data and Information Quality 4 171 $50 ❐ $68 ❐
• Journal on Emerging Technologies in Computing Systems 4 154 $43 ❐ $61 ❐
• Linux Journal (SSC) 12 137 $27 ❐ $60 ❐ Card number Exp. date
• Mobile Networks & Applications 4 130 $64 ❐ $89 ❐
• netWorker 4 133 $56 ❐ $81 ❐
• Wireless Networks 4 125 $64 ❐ $89 ❐ Signature
Transactions on:
• Accessible Computing 4 174 $50 ❐ $68 ❐ Member dues, subscriptions, and optional contributions
• Algorithms 4 151 $53 ❐ $71 ❐ are tax deductible under certain circumstances. Please
• Applied Perception 4 145 $44 ❐ $62 ❐ consult with your tax advisor.
• Architecture & Code Optimization 4 146 $44 ❐ $62 ❐
• Asian Language Information Processing 4 138 $40 ❐ $58 ❐
• Autonomous and Adaptive Systems 4 158 $42 ❐ $60 ❐ EDUCATION
• Computational Biology and Bio Informatics 4 149 $36 ❐ $77 ❐
• Computer-Human Interaction 4 119 $46 ❐ $64 ❐
• Computational Logic 4 135 $45 ❐ $63 ❐
• Computation Theory (NEW) $50 ❐ $68 ❐ Name of School
• Computer Systems 4 114 $48 ❐ $66 ❐ Please check one: ❐ High School (Pre-college, Secondary
• Computing Education (formerly JERIC) 277 N/A N/A
• Database Systems 4 109 $47 ❐ $65 ❐ School) College: ❐ Freshman/1st yr. ❐ Sophomore/2nd yr.
• Design Automation of Electronic Systems 4 128 $44 ❐ $62 ❐ ❐ Junior/3rd yr. ❐ Senior/4th yr. Graduate Student: ❐
• Embedded Computing Systems 4 142 $45 ❐ $63 ❐ Masters Program ❐ Doctorate Program ❐ Postdoctoral
• Graphics 4 112 $52 ❐ $70 ❐
Program ❐ Non-Traditional Student
• Information & System Security 4 134 $45 ❐ $63 ❐
• Information Systems 4 113 $48 ❐ $66 ❐
• Internet Technology 4 140 $43 ❐ $61 ❐
• Knowledge Discovery From Data 4 170 $42 ❐ $60 ❐ Major Expected mo./yr. of grad.
• Mathematical Software 4 108 $48 ❐ $66 ❐
• Modeling and Computer Simulation 4 116 $52 ❐ $70 ❐ Age Range: ❐ 17 & under ❐ 18-21 ❐ 22-25 ❐ 26-30
• Multimedia Computing, Communications, and Applications 4 156 $43 ❐ $61 ❐
• Networking 6 118 $55 ❐ $108 ❐ ❐ 31-35 ❐ 36-40 ❐ 41-45 ❐ 46-50 ❐ 51-55 ❐ 56-59 ❐ 60+
• Programming Languages & Systems 6 110 $60 ❐ $85 ❐
• Reconfigurable Technology & Systems 4 172 $50 ❐ $68 ❐ Do you belong to an ACM Student Chapter? ❐ Yes ❐ No
• Sensor Networks 4 155 $43 ❐ $61 ❐
• Software Engineering and Methodology 4 115 $44 ❐ $62 ❐ I attest that the information given is correct and that I will
• Speech and Language Processing (online only) 4 153 N/A N/A abide by the ACM Code of Ethics. I understand that my
• Storage 4 157 $43 ❐ $61 ❐ membership is non transferable.
• Web 4 159 $42 ❐ $60 ❐
marked • are available in the ACM Digital Library
*Check here to have publications delivered via Expedited Air Service. PUBLICATION SUBTOTAL:
For residents outside North America only. Signature

También podría gustarte