Está en la página 1de 36

PR va T

BE CTIC sting
Ja
A e
ST ES
:
VOLUME 6 • ISSUE 3 • MARCH 2009 • $8.95 • www.stpcollaborative.com

Organically Grown
High-Speed Apps page 10

Sow, Grow and Har vest


Live Load -Test Data

Automate Web Service


Performance Testing
VOLUME 6 • ISSUE 3 • MARCH 2009

Contents A Publication

COVER STORY
10 Cultivate Your Applications
For Ultra Fast Performance
To grow the best-performing Web applications, you must nurture them
from the start and throughout the SDLC. By Mark Lustig and Aaron Cook

Sow and Grow


18 Live Test Data
Step-by-step guidance for entering the
maze, picking the safest path and find-
ing the most effective tests your data
makes possible. By Ross Collard

Departments
4 • Editorial 9 • ST&Pedia
If your organization isn’t thinking about Industry lingo that gets you up to speed.
or already employing a center of excel-
Automate Web
27 Service Testing;
Be Ready to Strike
lence to reduce defects and improve
quality, you’re not keeping up with the
IT Joneses.
33 • Best Practices
When the Java Virtual Machine comes
into play, garbage time isn’t just for bas-
ketball players.
Automation techniques from the real 6 • Contributors By Joel Shore
world will pin your competitors to the Get to know this month’s experts and 34 • Future Test
floor while your team bowls them over the best practices they preach. The future of testing is in challenges,
with perfect performance. opportunities and the Internet.
By Sergei Baranov 7 • Out of the Box By Murtada Elfahal
News and products for testers.

Software Test & Performance (ISSN- #1548-3460) is published monthly by Redwood Collaborative Media, 105 Maxess Avenue, Suite 207, Melville, NY, 11747. Periodicals postage paid at Huntington, NY and additional mailing offices. Software Test & Performance is
a registered trademark of Redwood Collaborative Media. All contents copyrighted © 2009 Redwood Collaborative Media. All rights reserved. The price of a one year subscription is US $49.95, $69.95 in Canada, $99.95 elsewhere. POSTMASTER: Send changes of address
to Software Test & Performance, 105 Maxess Road, Suite 207, Melville, NY 11747. Software Test & Performance Subscribers Services may be reached at stpmag@halldata.com or by calling 1-847-763-1958.

MARCH 2009 www.stpcollaborative.com • 3


Ed Notes
VOLUME 6 • ISSUE 3 • MARCH 2009

Become a Center Editor


Edward J. Correia
ecorreia@stpcollaborative.com

Of Excellence Contributing Editors


Joel Shore
Matt Heusser
Chris McMahon

How many defects should peo- nance costs in a 12-month peri- Art Director
ple willing to put up with before od, due largely to the elimina- LuAnn T. Palazzo
they say “To heck with this Web tion of production defects.” lpalazzo@stpcollaborative.com
site”? I suppose the answer Establishing a CoE also
would depend on how impor- makes sense for smaller compa- Publisher
tant or unique the Web site was, nies. “Just as large enterprise Andrew Muns
or how critical its function to realized massive efficiencies of amuns@stpcollaborative.com
the person using it. scale by consolidating opera-
The point isn’t the number tional roles into shared service Associate Publisher
David Karp
of errors someone gets before organizations in the ‘90s, for-
dkarp@stpcollaborative.com
they say adios. The point is that ward-thinking IT organizations
Edward J. Correia
your applications should con- today are achieving similar bene-
Director of Events
tain zero defects, should produce zero errors fits by implementing Performance Centers
Donna Esposito
and should have zero untested use cases at of Excellence,” says Theresa Lanowitz,
desposito@stpcollaborative.com
deployment time. founder of voke and author of


Can you imagine that? You the study. She added that such Director of
might if your company were to organizations also realized Marketing and Operations
implement a Center of Excel- benefits “that scaled across Kristin Muns
lence. their entire organization.” kmuns@stpcollaborative.com
A survey of large and small The cost to Let’s face it, we all know
companies instituting such that testers get a bad rap. Test
Reprints
centers revealed that a stagger- maintain a departments have to constant-
Lisa Abelson
ing 87 percent reported “im- ly defend their existence, pro-
abelson@stpcollaborative.com
proved quality levels that sur-
passed their initial expecta-
production-like tect their budget and make
due with less time and less
(516) 379-7097

tions.” That study, called the respect than development


Market Snapshot Report: Perfor-
environment for teams are generally afforded.
Subscriptions/Customer Service
stpmag@halldata.com
mance Center of Excellence (CoE), But it seems to me that pro-
was released last month by ana-
performance posing (and implementing) a
847-763-1958

lyst firm voke. CoE has only upside. You Circulation and List Services
The study defines a Perfor- testing is increase productivity, efficien- Lisa Fiske
mance Center of Excellence as cy, communication and institu- lfiske@stpcollaborative.com
“the consolidation of oriented prohibitive. tional knowledge through cen-
resources, which typically in- tralization of a department Cover Illustration by
cludes the disciplines of test- However, the dedicated to application per- The Design Diva, NY

ing, engineering, manage- formance, you reduce costs


ment and modeling. The CoE consequences of and increase quality.
helps to centralize scarce “The cost to an organiza-
and highly specialized re- failing to may tion to build and maintain a
sources with in the perform- production-like environment
ance organization as a whole.” be catastrophic. for performance testing is
It questioned performance often prohibitive,” the study


experts from companies ac- points out. “However, the con-
President Chairman
ross the U.S., two-thirds of sequences of failing to have an Andrew Muns Ron Muns
which were listed in the For- accurate performance testing
105 Maxess Road, Suite 207
tune 500. environment may be cata- Melville, NY 11747
Among the key findings was strophic.” +1-631-393-6051
fax +1-631-393-6057
that companies reported “substantial ROI as Think of it as your company’s very own
stimulus package. ý
www.stpcollaborative.com
measured by their ability to recoup mainte-

4 • Software Test & Performance MARCH 2009


Contributors

AARON COOK and MARK LUSTIG once


again provide our lead feature. Beginning
on page 10, the test-automation dynamic
duo describe how to ensure performance
across an entire development cycle, begin-
ning with the definition of service level
objectives of a dynamic Web application.
They also address real world aspects of per-
formance engineering, what factors that constitute real performance measurement
and all aspects of the cloud.
Aaron is the quality assurance practice leader at Collaborative Consulting and
has been with the company for nearly five years. Mark is the director of per-
formance engineering and quality assurance at Collaborative.

In part two of his multipart series on live-data load testing, ROSS


COLLARD tackles the issues involved with selecting and cap-
turing the data, then explores how to apply the data in your test-
ing to increase the reliability of predictions. Ross’s personal writ-
ing style comes alive beginning on page 18, as he taps into his
extensive consulting experiences and situations.
A self-proclaimed software quality guru, Ross Collard says he
functions best as a trusted senior advisor in information tech-
nology. The founder in 1980 of Collard & Company, Ross has been called a Jedi
Master of testing and quality by the president of the Association of Software Testing.
He has consulted with top-level management from a diverse variety of companies
from Anheuser-Busch to Verizon.

Get the cure for those bad This month we’re fortunate to have the tutelage SERGEI
software blues. Don’t fret B A R A N O V on the subject of Web service performance-test
about design defects, automation. On page 27 you’ll find his methodology for cre-
ating test scenarios that reflect the tendencies of real-world
out-of-tune device drivers, environments. To help you apply these strategies, Sergei intro-
off-key databases or duces best practices for organizing and executing automat-
flat response time. ed load tests and suggests how these practices fit into a Web
services application’s development life cycle.
Sergei Baranov is a principle software engineer at test-tool maker Parasoft
Software Test & Performance
Corp. He began his software career in Moscow, where as an electrical engineer
is your ticket to runtime from 1995 to 1996 he designed assembly-language debuggers for data-acquisi-
rock and roll. tion equipment and PCs. He’s been with Parasoft since 2001.

TO CONTACT AN AUTHOR, please send e-mail to feedback@stpcollaborative.com.

Index to Advertisers
Advertiser URL Page

Hewlett-Packard www.hp.com/go/alm 36
Lionbridge www.lionbridge.com/spe 25
Seapine www.seapine.com/testcase 5
Subscribe Online! Software Test & Performance www.stpcollaborative.com 6

www.stpcollaborative.com Software Test & Performance Conference


Test & QA Newsletter
www.stpcon.com
www.stpmag.com/tqa
2
26
Wildbit www.beanstalkapp.com 35

6 • Software Test & Performance MARCH 2009


Out of the Box

The ‘Smarte’
Way To
Do Quality
Management
If you’re a user of Hewlett-Packard’s
Quality Center test management plat-
form and been bamboozled by its clunky
or nonexistent integration with JUnit,
NUnit and other unit testing frameworks,
you might consider an alternative
announced this month by SmarteSoft.
In SmartQM's Test Management module, test cases are mapped to one or more requirements that the
The test automation tools maker on
test is effectively validating, providing the test coverage for the requirement(s). Each test case includes
March 1 unveiled Smarte Quality
all the steps and individual actions necessary to complete the test, according to the company.
Manager, which the company claims
offers the same capabilities as HP’s ubiq-
uitous suite for about a tenth the cost. enabling companies to integrate existing and start doing load testing with that pro-
Shipping since January, the US$990 per- third-party, open source or proprietary tocol. That’s unique in the industry,”
seat/per-year platform is currently at ver- software. “That’s also a big difference Macgregor claimed.
sion 2.1. from the competition. Open API allow The plug-in capability also works
SmarteQM is a browser-based plat- for connecting to all manner of test with proprietary protocols. “You can’t
form that uses Ajax to combine manage- automation frameworks.” Out of the box, just buy a load testing tool off the shelf
ment of requirements, releases, test SmarteQM integrates with JUnit, NUnit, that supports your custom protocol.
cases, coverage, defects, issues and tasks PyUnit and TestNG automated unit-test- This is especially relevant to firms that
with general project management capa- ing frameworks. It also works with have proprietary protocols, such as
bilities in a consistent user interface. QuickTestPro and Selenium; integration defense and gaming. Let’s say you
According to SmarteSoft CEO Gordon with LoadRunner is planned. SmarteQM have some protocols for high-perform-
Macgregor, price and interface are also can export bugs to JIRA, Bugzilla and ance gaming. You could plug them in
among its main competitive strengths. Microsoft TFS. with very little effort. We were able to
“With the Rational suite, for example, provide [Microsoft] Winsock support
RequisitePro, Doors, ClearQuest, SmarteLoad Open to Protocols in 24 hours, and we don’t charge extra
ClearCase, all have to be separately SmarteSoft also on March 1 released an for that.”
learned and managed.” update to SmarteLoad, its automated SmarteLoad pricing varies by the
Another standout feature, Macgregor load testing tool. New in version 4.5 is number of simulated users, starting at
said, is its customizable user dashboard. the ability to plug-in your communica- $18,600 for 100 users for the first year,
“We’re not aware of that in [HP’s] tion protocol of choice. “Now you can including maintenance and support.
TestDirector/Quality Center.” The plat- take any Java implementation of a proto- SmarteQM 2.1 and SmarteLoad 4.5 are
form also is built around an open API, col engine and plug it into SmarteLoad available now.

MS Search Strategy: FAST, FAST, FAST


Microsoft in February unveiled a pair of SharePoint Portal Server. Interested par- “new capabilities for content integration
new search products, central elements of ties can license some of the capabilities and interaction management, helping
an updated roadmap for its overall enter- now through ESP for SharePoint, a spe- enable more complete and interactive
prise search strategy. cial product created for this purpose that search experiences,” according to a
Set for beta in the second half of this includes license migration to the new Microsoft news release issued on Feb. 10,
year is FAST Search for SharePoint, a new product, when it’s released. from the company’s FAST Forward 09
server that extends the capabilities of Also and extension of FAST ESP and Conference in Las Vegas. Pricing for
Microsoft’s FAST ESP product, and adds going to beta in the second half will be FAST for SharePoint will reportedly start
its capabilities to Microsoft’s Office FAST Search for Internet Business, with at around US$25,000 per server.

MARCH 2009 www.stpcollaborative.com • 7


If you’re also using the Server Edition, tion are still designed by hand,” said
Ajax Goes Down you can link with your continuous inte- Kalekos. “By using Qtronic to automate
Smooth With gration system and automate test exe- the test design phase, our customers dra-
cution for functional and acceptance- matically reduce the effort and time
LiquidTest test coverage. required to generate test cases and test
A new UI-testing framework released in A Tester Edition is for test and QA scripts.”
February is claimed to have been built teams that might have less technical Among the major changes, published
with Ajax testing in mind. It’s called knowledge than developers. It outputs on the company’s Web site, is the separa-
LiquidTest, and according to concise scripts in LiquidTest Script, a tion of a single user workspace into a
JadeLiquid Software, it helps developers Groovy derivative that “is powerful but computational server (for generating
and testers “find defects as they occur.” not syntactically complicated,” Scotney tests) and an Eclipse-based client for
An Eclipse RCP app, LiquidTest records wrote, adding that LiquidTests record- Linux, Solaris and Windows.
FireFox and IE browser actions and out- ed with the Tester Edition can be The platform also now supports mul-
puts the results as test cases for Java and replayed with the Developer Edition tiple test-design configurations, each
C#, JUnit, NUnit and TestNG, as well as and vice-versa, enabling close collabo- with its own coverage criteria and selec-
Ruby and Groovy, the company says. It ration between development and tion of script back-ends. “While genera-
supports headless operation through a test/QA teams. tion of test cases is possible without hav-
server component and is also available ing a script back-end (abstract test
as an Eclipse plug-in. case), a user can now configure more
JadeLiquid’s flagship is WebRen- With A Redesigned than one scripting back-end in a test
derer, a pioneering standards-based Java Qtronic, Conformiq design configuration for executable test
rendering engine for Web browsers. scripts,” said the company. Test cases for
According to a post on theserver- Comes to the U.S. multiple test design configurations are
side.com by JadeLiquid’s Anthony Add one to the number of companies generated in parallel, making test gen-
Scotney, many automated testing prod- established in Finland that came to the eration faster by sharing test generation
ucts fall flat when it comes to Ajax. U.S. seeking their fortunes. Conformiq, results between multiple test design
“LiquidTest, however, was architected to which designs test-design automation configurations.
support Ajax from day one. We devel- solutions, last month opened an office in Also new is incremental test-case
oped LiquidTest around an ‘Expec- Saratoga, Calif., and named A.K. Kalekos generation with local test case naming.
tation’ model, so sleeps are not president and CEO; he will run the Generated test cases are stored in a per-
required,” he wrote, referring a com- North American operations. Also part of sistent storage, and previously generat-
mand sometimes used when developing the team as CTO will be Antti Huima, ed and stored test cases can be used as
asynchronous code. The following is a formerly the company’s managing direc- input to subsequent test generation
test case he had recorded against tor and chief architect. Huima was the runs. It’s also now possible to name and
finance.google.com that uses the Ajax- brains behind Qtronic, the company’s rename generated test cases. Version
based textfield: flagship automatic model-to-test case 2.0 improves handling of coverage cri-
public void testMethod() generator. teria, with fine grained control of cov-
{ Qtronic automates the design of erage criteria; structural features can
browser.load("finance.google.com");
browser.click("searchbox", 0);
functional tests for software and sys- be individually selected; coverage crite-
browser.type("B"); tems. According to the company, ria can be blocked, marked a target or
browser.expectingModificationsTo("id('ac- Qtronic also generates browsable docu- as "do not care;" and coverage criteria
list')").type("H");
browser.expectingLoad().click("id('ac- mentation and executable test scripts in status is updated in real time and always
list')/DIV[2]/DIV/SPAN[2]"); Python, Tcl and other standard formats. visible to testers.
assertEquals("BHP Billiton Limited (ADR)",
browser.getValue("BHP Billiton Limited (ADR)")); The tool also allows testers to design Testers can now browse and analyze
} their own output format, for situations generated test cases (and model defects)
when proprietary test execution or in the user interface, including graphical
“As you can see LiquidTest spots the management platforms exist. I/O and execution trace. A simplified
modifications that are happening to the Conformiq in January released plug-in API is now Java compatible, and
DOM as we type "BH," Scotney wrote of Qtronic 2.0, a major rewrite of the eases the task of developing new plug-ins.
the code. Qtronic architecture. The way the com- In February Conformiq received
LiquidTest is available in three edi- pany describes it, the system went “from US$4.2 million in venture funding from
tions. The Developer Edition is intend- single monolithic software to client-serv- investors in Europe and the U.S., led by
ed to help “integrate functional tests er architecture.” Nexit Ventures and Finnish Industry
(as unit tests) into a software develop- “The back-end of the test process, the Investment.
ment process”, he wrote. Headless test- execution of tests, has already been auto-
case execution also permits regression mated in many companies. But the test Send product announcements to
news@stpcollaborative.com
tests at every step of the build process. scripts needed for automated test execu-

8 • Software Test & Performance MARCH 2009


ST&Pedia
Translating the jargon of testing into plain English

How Fast is Fast Enough?


We often hear that before thus pointing to a bottle- linked to by the popular news site
any coding begins, the proj- neck. It’s not uncommon Slashdot. A system might perform per-
ect owners should specify to encounter situations in fectly well and meet all specifications
the number of users for which we’re brought in to under normal conditions, but fail when
each feature, the number of do performance testing it meets with unexpected success.
transactions on the system, only to find that the slow
and the required system parts of the system are A few relevant techniques for Web
response times. We would Matt Heusser and already identified quite performance management:
like to see this happen. We Chris McMahon nicely in the system logs. WHEN PEOPLE CALL A TESTER
would also like a pony. Testers generally get called in to do
Predicting user behavior is not really BETA/STAGING SYSTEM "testing" when a performance or scal-
possible no matter how much testers Many companies exercise their software ing problem already exists. In such
wish that it were. But once we acknowl- themselves for profiling purposes. One cases, you might not need more meas-
edge that, we find that a tester can often video game company we know of per- ures of performance, but simplyto fix
add a great deal of value to a situation forms its profiling every Tuesday at 10:00 the problem and retest.
that has a vague and ambiguous prob- am. Everyone in the company dropped
lem like "is the software fast enough?" what they were doing, picked up a game USER FLOW ANALYSIS
Instead of giving you easy answers, we controller, and played the company's Simulating performance involves predict-
want to make your job valuable without video game product while the network ing what the actual customer will do, run-
burning you out in the process. So we admin simulated network load and the ning with those predictions and evaluat-
introduce you to patterns of perform- test manager compiled profile informa- ing the results. A useful approach is to use
ance testing. tion and interviewed players. real customer data in the beta or produc-
Here are this month’s terms: tion like environment. To quote Edward
BOTTLENECK SIMULATION Keyes paraphrasing Arthur C. Clarke:
It’s typical to find that one or more small We usually recommend analyzing data "Sufficiently advanced performance mon-
parts of an application are slowing down from actual use of the system. When that itoring is indistinguishable from testing."
performance of the entire application. is not possible, there are tools that will sim-
Identifying bottlenecks is a big part of ulate various kinds situations such as net- QUICK WINS
performance testing. work load, HTTP traffic, and low memory If you have a log, import the data into a
conditions. Excellent commercial and spreadsheet, sort it by time-to-execute
USER FLOW ANALYSIS open-source tools exist for simulation. commands, and fix the slowest command
A general map of the usage patterns of first. Better yet, examine how often the
an application. An example user flow BACK OF NAPKIN MATH commands are called and fix operations
might show that 100 percent of users go Refers to the use of logic and mathe- that are slow and performed often.
to the Login screen, 50 percent go to matics to take known performance
the Search screen, 10 percent use the behaviors such as the amount of time SERVICE LEVEL CAPABILITIES
Checkout screen. between page loads for a typical user or We have had little success actually pulling
the ratio of reads to updates. and calcu- out expected user requirements (some-
PROFILE late the amount of load to generate and times called Service Level Agreements or
A map of how various parts of the appli- simulate a certain number of users. SLAs). We find more success in evaluat-
cation handle load. Profiling is often ing the software under test and express-
useful for identifying bottlenecks. PERFORMANCE AND SCALE ing a service level capability. By under-
Performance generally refers to how standing what the software is capable of,
LOG the application behaves under a single senior management can determine
Almost all applications have some sort user; Scale implies how the software which markets to sell to and whether
of logging capability, usually a text file behaves when accessed by many users at investing in more scale is required. ý
or a database row that keeps track of the same time.
what happened when. Adding "... and Matt Heusser and Chris McMahon are career soft-
how long" to the log is a standard devel- SLASHDOT EFFECT ware developers, testers and bloggers.They’re col-
opment task. Timing information When software is suddenly over- leagues at Socialtext, where they perform testing
parsed by a tool or spreadsheet can whelmed with a huge and unforeseen and quality assurance for the company’s Web-
based collaboration software.
identify particularly slow transactions, number of users. It’s origin is from sites
MARCH 2009 www.stpcollaborative.com • 9
Cultivate
Your Crop For
High-Performance
From The
Ground Up
By Aaron Cook and Mark Lustig

primary goal for IT o rganizations is to create an efficient,


A flexible infrastructure. Organizations struggle with the
desire to be more proactive in address- measures of business process perform-
ing and resolving issues, but often take ance (e.g., response and execution
a reactive approach. Conventional time thresholds of transaction execu-
behavior in IT is to manage discrete tion time).
silos (e.g., the middleware layer, the Across the lifecycle, the focus areas of
database layer, the UNIX server layer, multiple stakeholders are clearly
the mainframe layer). To become defined. The engineering group con-
more proactive and meet business centrates on design and development
needs across multiple infrastructure /implementation. The QA and PE
layers, the goal must become proactive- group focuses on testing activities (func-
ly managing to business goals. tional, integration, user acceptance,
Performance engineering (PE) is performance), while Operations focuses
not merely the process of ensuring a on system deployment and support.
delivered system meets reasonable per- Performance engineering activities
formance objectives. Rather, PE occur at each stage in the lifecycle,
emphasizes the “total effectiveness” of beginning with platform/environment
the system, and is a discipline that validation. This continues with per-
spans the entire software development formance benchmarking, perform-
lifecycle. By incorporating PE practices ance regression, and performance
throughout an application’s life, scala- integration. Once the system is run-
bility, capacity and the ability to inte- ning in production, proactive produc-
grate are determined early, when they tion performance monitoring enables
are still relatively easy and inexpensive visibility into system performance and
to control. overall system health.
This article provides a detailed
description of the activities across the Service Level Objectives
complete software lifecycle, starting Avoid the culture of “It’s not a problem until
with the definition and adherence to users complain.”
service level objectives. This article Business requirements are the pri-
also addresses the real world aspects of mary emphasis of the analysis phase of
performance engineering, notably: any system development initiative.
• What is realistic real-world per- However, many initiatives do not track
formance for today’s dynamic web non-functional requirements such as
applications? response time, throughput, and scalabil-
• What is the real measure of per- ity. Key performance objectives and
formance? internal incentives should ideally define
• What aspects of the cloud need to and report against service level compli-
be considered (first mile, middle ance. As the primary goal of IT is to
mile, last mile)? service the business, well-defined serv-
The Software Development Life ice level agreements (SLAs) provide a
Cycle includes five key areas, begin- clear set of objectives identifying activi-
ning with business justification and ties that are most appropriate to moni-
requirements definition. This is fol- tor, report, and build incentives around.
lowed by the areas of system design, A key first step toward defining and
system development/implementation, implementing SLAs is the identifica-
testing, and deployment/support. As tion of the key business transactions,
portrayed in Figure 1 (next page), key performance indicators (KPIs) and
requirements definition must include volumetrics. Development and PE
service level definition; this includes teams should begin the discussion of
non-functional requirements of service level agreements and deliver a
response time, throughput, and key draft at the end of the discovery phase.
For example, these may include the
Aaron Cook and Mark Lustig work for transaction response times, batch pro-
Collborative Consulting, a business and tech- cessing requirements, and database
nology consultancy.
backup. This also helps determine if a

www.stpcollaborative.com • 11
FAST FARM

performance test or proof-of-concept with anticipated impacts to the infra- tion, network or database configura-
test is required in order to validate if structure environment. These impacts tions, or user profiles.
specific service levels are achievable. include utilization, response time, To identify and measure the specif-
Many organizations rarely, if ever, bandwidth requirements, and storage ic benchmarks, the performance test
define service level objectives, and requirements, to name a few. team needs to develop a workload
therefore cannot enforce them. Service The primary goal of a platform val- characterization model of the SUT’s
level agreements should be designed idation is to provide an informed esti- real-world performance expectations.
with organization costs and benefits in mate of expected performance, This provides a place to initiate the
testing process. The team can modify
FIG. 1: PLOTTING REQUIREMENTS and tune it as successive test runs pro-
vide additional information. After the
performance test team defines the
Platform/ Performance Performance Performance Production
Environment Benchmarking Regression Integration Performance workload characterization model, the
Validation Monitoring team needs to define a set of user pro-
DEVELOPMENT PROCESS
files that determine the application
Define Configuration/
service level Customization Implementation Application Testing
Product pathways that typical classes of users
Support
objectives Design
will follow. These profiles are lever-
Engineering Group QA/Performance Operations aged and combined with estimates
Engineering Group from business and technical groups
throughout the organization to define
the targeted SUT performance behav-
mind. Setting the agreements too low enabling a change/refinement in ior criteria. Profiles may also be used
negatively affects business value. architecture direction, based on the in conjunction with predefined per-
Setting them too high can unnecessari- available factors. Platform validation formance SLAs as defined by the vari-
ly increase costs. Establishing and must consider workload characteriza- ous constituent business organiza-
agreeing on the appropriate service lev- tions such as: tions.
els requires IT and the business groups • Types of business transactions Once the profiles are developed
to work together to set realistic, achiev- • Current and projected business and the SLAs determined, the per-
able SLAs. transaction volumes formance test team needs to develop
• Observed/measured performance the typical test scenarios that will be
Platform/Environment Validation (e.g., response time, processor and modeled and executed in a tool such
Once the service levels are under- memory utilization, etc.) Assump- as LoadRunner or OpenSTA. The
stood, platform/environment valida-
tion can occur. This will aid in deter- FIG. 2: CULTIVATING PERFORMANCE
mining whether a particular technical
architecture will support an organiza-
tion’s business plan. It works by Define business
Review infrastructure activity profiles & Design & build tests
employing workload characterization & architecture service levels
and executing stress, load, and
endurance tests against proof of con- • Identify risk areas • Types & numbers of • Test data generation
cept architecture alternatives. • review configuration users • Create test scripts
settings, topology & • Business activities & • User & transaction
For example, a highly flexible dis- sizing frequencies profiles
tributed architecture may include a • Define points of • Infrastructure
web server, application server, enter- measurement configuration
prise service bus, middleware broker, Iterate testing & tuning
database tier, and mainframe/legacy
systems tier. As transactions flow
through this architecture, numerous tions must be made for values of main requirement of the tool is that it
integration points can impact per- these factors to support the allows the team to assemble the run-
formance. Ensuring successful execu- model’s workload characterization. time test execution scenarios that it
tion and response time becomes the will use to validate the initial bench-
focus of platform validation. While Performance Benchmarking marking assumptions.
these efforts may require initial invest- Performance benchmarking is used as The next critical piece of perform-
ment and can impact the development a testing technique to identify the cur- ance benchmarking is to identify the
timeline, they pale in comparison to rent system behavior under defined quantity and quality of test data
the costs associated with load profiles as configured for your required for the performance test
retrofitting/reworking a system after production or targeted environment. runs. This can be determined by
development is complete. This technique can define a known answering a few basic questions:
In addition, by performing proac- performance starting point for your • Are the test scenarios destructive to the
tive ‘pre-deployment’ capacity plan- system under test (SUT) before mak- test-bed data?
ning activities (i.e., modeling), costs ing modifications or changes to the • Can the database be populated in a
can be empirically considered along test environment, including applica- manner to capture a snapshot of the

12 • Software Test & Performance MARCH 2009


FAST FARM

database before any test run and


FIG. 3: EXPECTED YIELD
restored between test runs?
• Can the test scenarios create the data
that they require as part of a set-up
script, or does the complexity of the data
require that it be created in advance
and cleaned up as part of the test sce-
narios?
One major risk to the test data
effort is the risk that any of the test
scripts fail during the course of test-
ing. If using actual test scripts, the test
runs and the data might have to be
recreated anyway using external tools
or utilities.
As soon as these test artifacts have
been identified, modeled, and devel-
oped, the performance test bench-
mark can begin with an initial test run,
modeling a small subset of the poten-
tial user population. This is used to
shake out any issues with the test Source: www.keynote.com
scripts or test data used by the test
scripts. This is validates the targeted particular configuration parameter, The goal for performance regres-
test execution environment including and then re-run the test. sion testing is repeatability. This
the performance test tool(s), test envi- requires establishing the same data-
ronment, SUT configuration, and ini- Performance Regression base sizing (number of records) dur-
tial test profile configuration parame- Performance regression testing is a ing the test run, using the same test
ters. In effect, this is a smoke-test of technique used to validate that SUT scenarios to generate the results, lever-
the performance test run-time envi- changes have not impacted the exist- aging as much of the same application
ronment. ing SLAs established during the per- footprint during the test run, and
Once the PE smoke test executes formance benchmarking test phase.
successfully, it is time to reset the envi- Depending on the nature of your SUT,
ronment and data and run the first of this can be an important measure of
a series of benchmark test scenarios. continued quality as the system under-
This first scenario will provide signifi- goes functional maintenance, defect
cant information and test results that specific enhancements, or perform-
can be leveraged by the performance ance related updates to specific mod-
test team defining the performance ules or areas of the application.
benchmark test suites. Performance regression testing
The performance test benchmark requires the test team to have per-
is considered complete when the test formed, at a minimum, a series of
team has captured results for all of the benchmark tests designed to establish
test scenarios making up the test suite. the current system performance
The results must correspond to a behavior. These automated test scripts
repeatable set of system configuration and scenarios, along with their associ-
parameters as well as a test bed of data. ated results, will need to be archived
Together, these artifacts make up the for use and comparison to the results
performance benchmark. generated for the next version of the
Figure 2 outlines our overall ap- application or the next version of the
proach used for assessing the perform- hardware environment. One powerful
ance and scalability of a given system. use of performance regression testing
These activities represent a best prac- is when an application’s data center is
tices model for conducting perform- upgraded to add capacity or moved to
ance and scalability assessments. a new server. By executing a series of
Each test iteration attempts to iden- tests using the same data and test
tify a system impediment or prove a parameters, the results can be com-
particular hypothesis. The testing phi- pared to ensure that nothing during
losophy is to vary one element and the upgrade/migration was glossed
observe and analyze the results. For over, missed, or adversely impacted
example, if results of a test are unsatis- the modified application run-time
factory, the team may chose to tune a environment.

MARCH 2009 www.stpcollaborative.com • 13


FAST FARM

when the underlying application change or a functional system change


architecture or development platform that the business or end-user commu-
changes. During those test cycles, the nity has requested. This is considered
performance engineers need to work a success for this phase of perform-
closely with the application developers ance testing.
to ensure that the new tests being exe-
cuted match closely the preexisting Performance Integration
benchmarked test results so that com- Performance integration testing is a
parisons and contrasts can be identi- technique used to validate SLAs for
fied easily. application components across a suite
The mechanism for executing the of SUT modules. To successfully inte-
performance regressions follows the grate and compare the performance
same model as the initial performance characteristics of multiple application
benchmark. The one significant differ- modules, the performance test team
ence is that the work required to iden- must first decompose the SUT into its
tify the test scenarios and create the constituent components and perform-
test data has been performed as part ance-benchmark each one in isolation.
of the performance benchmark exer- This might seem futile for applications
cise. Once the test environment and using legacy technologies, but the this
system are ready for testing, the rec- approach can be used to develop a
ommended approach is to run the predictive performance characteriza-
same smoke test that was used during tion model across an entire suite of
the initial performance benchmark modules.
test. Once the smoke test runs success- For example, in a simplistic transac-
fully, you can execute the initial tion, there may be a number of com-
benchmark test scenarios and capture ponents called via reference that com-
the results. Ensure that the SUT is con- bine into one logical business transac-
figured the same way, or as similarly as tion. For the purpose of illustration,
possible, and capture the test run let’s call this business transaction
results. “Login.” Login may take the form of a
Compare the regression test results UI component used to gather user cre-
to the initial performance test bench- dentials including user ID, password,
mark results. If the results differ sig- and dynamic token (via an RSA-type
nificantly, the performance test team key-fob). These are sent to the appli-
using as similar a hardware configura- should investigate the possible rea- cation server via an encrypted HTTP
tion during the test run. The chal- sons, rerun any tests required, and request. The application server calls
lenge arises when these are the specif- compare the results again. The goal an RSA Web service to validate the
ic items being changed. Typically, this for the regression tests is to validate token, and an LDAP service to validate
occurs most often when introducing a that nothing from a performance per- the user ID and password combina-
defect-fix or new version of the appli- spective has changed significantly tion. Each of these services returns a
cation. unless planned. Sometimes, the success value to the application server.
In such cases, the number of items regression test results differ signifi- The app server then passes on a suc-
that are different between test runs is cantly from the initial benchmark by cess token to the calling Web page,
easily managed. The real challenge for design. In that case, the regression authenticating or denying the user
measuring and validating results arises results have validated a configuration access to the application landing page.
While the business considers Login
FIG. 4: BUMPER CROP as a single transaction, the underlying
application breaks it down into a min-
imum of three discrete request
/response pairs which result in six
exchanges of information. If the end
user community expects a Login trans-
action to take less than five seconds,
for example, and the application when
modeled and tested responds within
10 seconds 90 percent of the time, a
performance issue has been identified
and needs to be solved.
The performance test team will
have mocked up each of the request
/response pairs and validated each
Source: www.gomez.com
one individually in order to identify

14 • Software Test & Performance MARCH 2009


FAST FARM

the root cause of the potential per- instantaneous. During service level defi- (e.g., Akamai). One company that meas-
formance bottleneck. Without per- nition, it is common for the goals set ures and response times is Keynote
forming this level of testing, the appli- forth by the business to be more in line Systems (www.keynote.com). Average
cation developer may have limited vis- with the ideal world, as opposed to the response time in a recent Keynote
ibility into component response times real world. The business must define Business 40 report (Figure 3, page 13)
when integrated with other compo- realistic service levels, and the engineer- was 1.82 seconds.
nents. It is up to the performance test ing and operations group must validate Dynamic transactions traverse multi-
team to help identify and vali- ple architectural tiers, which typ-
date with a combination of ically might include a Web serv-
performance integration and
performance regression test-
ing techniques. • er, application server, database
server and backend /mainframe
server(s). Execution of a dynam-
ic transaction is non-trivial.
Production Performance While more layers and integration While more layers and integra-
Monitoring tion points allow for a more flex-
To be proactive, companies points allow for a more flexible ible system implementation,
need to implement controls each integration point adds
and measures that enable system implementation, each adds response and execution time.
awareness of potential prob- This overhead may include mar-
lems or target the problems response and execution time. shalling/un-marshalling of data,
themselves. Production per- compression /un-compression,


formance monitoring ensures and queuing/dequeuing. Inde-
that a system can support serv- pendently these activities might
ice levels such as response time, take only milliseconds, but col-
scalability, and performance, lectively can add up to seconds.
but more importantly, enables Common complex dynamic
the business to know in advance when a them. In a Web-based system, discrete transactions include account details and
problem will arise. When difficulties service levels must be understood by search. Figure 4 (previous page) shows
occur, PE, coupled with systems man- transaction and by page type. the best response times from a recent
agement, can isolate bottlenecks and Homepages, for example, are optimized credit card account detail report generat-
dramatically reduce time to resolution. for the fastest and most reliable ed by Gomez (www.gomez.com). Re-
Performance monitoring allows proac- response time. These typically contain sponses range between 8 and 17 seconds,
tive troubleshooting of problems when static content and highly optimized and with an average response time of 14 sec-
they occur, and developing repairs or strategically located caching services onds. Users have become accustomed to
“workarounds” to minimize business dis-
ruption. FIG. 5: SITE SCOUTING
Unfortunately, the nature of distrib-
uted systems has made it challenging to
build in the monitors and controls need-
ed to isolate bottlenecks and to report on
metrics at each step in distributed trans-
action processing. This problem has been
the bane of traditional systems manage-
ment. However, emerging tools and tech-
niques are beginning to provide end-to-
end transactional visibility, measurement,
and monitoring.
Tools such as dashboards, perform-
ance monitoring databases and root
cause analysis relationships allow tracing
and correlation of transactions across the
distributed system. Dashboard views pro-
vide extensive business and system
process information, and allow execu-
tives to monitor, measure and prepare
against forecasted and actual metrics.

‘Good’ Performance And


A Web Application
In an ideal world, response time would be
immediate, throughput would be limit-
Source: www.gomez.com
less, and execution time would be

MARCH 2009 www.stpcollaborative.com • 15


FAST FARM

helpful when describing to the business


FIG. 6:THE FIELD
community what the observed perform-
ance characteristics are for the SUT. The
Middle-mile challenge is that the business may not
(RTT) have insight into the underlying techni-
cal implementation of a “transaction.”
Global internet ISP
What we find in the real world is that
a transaction needs to be defined for
Carrier/NSP each performance test project and then
IX ISP adhered to for the duration of the proj-
ect testing cycle. This means that a dis-
Carrier/ crete transaction may be defined for the
First-mile NSP performance integration test phase and
ISP Peerin then used in concert with additional dis-
ISP crete transactions to create a business
process transaction. This technique
Last-mile
requires that the performance test team
Data Center
combine results and perform a bit of
mathematical computation. The tech-
nique has worked successfully in a num-
Remote End Users ber of performance engagements.

First Mile, Middle Mile, Last Mile


this length of execution time and expec- port 10,000 simultaneous logins if the When considering the performance of
tations are effectively managed by means end users can’t execute the most com- Web-based systems, there are variables
of progress bars, messages animated .gif mon application functions as defined by beyond what is controlled, and by whose
files or other such methods. your business groups? control it is under. Aspects of the
For media outlets, which typically Most testers have heard the complaint Internet cloud, often referred to as the
employ content management engines that “the application is slow.” The first first mile, middle mile, and last mile,
and with multiple databases, Gomez question often heard after that is, “What is become a primary consideration (Figure
tracks search response times (Figure 5, slow, exactly?” If the user 6). Root causes of ‘cloud
previous page). These range from four answers with something bottlenecks’ often include
seconds to more than 15 seconds, with a
average of around 11 seconds.
Reports such as these provide real
performance data that you can use to
like “logging into the appli-
cation,” you now have
something to go on. The
business user has just
• high latency, inefficient
protocols, and sometimes,
network packet loss.
As the majority of appli-
compare with your UIs. In our consult- defined what matters to The average cations are dynamic (and
ing engagements, we ideally strive for a them, and that is the key to hence not able to be
response time of 1-2 seconds—realistic successfully designing a dynamic Web cached on proxy servers),
for static web content. However, for series of performance tests. the cloud becomes a bot-
today’s complex dynamic transactions, a Of course, this example page contains tleneck that is difficult to
more realistic response time across stat- implies a client/server sys- control. The average dy-
ic and dynamic content should be tem with a UI component. 20 to 30 namic Web page contains
between three and eight seconds. While the example does 20 to 30 components, each
Managing the user experience through not speak specifically to a components, each requiring an HTTP con-
the use of techniques including content batch or import-type sys- nection. The maximum
caching, asynchronous loading tech- tem, the same methodolo- requiring an round trip time (RTT) can
niques and progress bars all aid in effec- gy applies. be as much as 60-times the
tively managing user expectations and When trying to define HTTP average RTT based on inef-
overall user satisfaction. the real measure of per- ficient routing in the U.S.
formance, the next step is connection. Optimizing application
The Real Measure of Performance to define a transaction. behavior is typically
What are we actually measuring when we There are a number of focused on the distributed
talk about performance of an applica-
tion? How do we determine what matters
and what doesn’t? Does it matter that
your end user population can execute
schools of thought. The
first school states that a
transaction is a single
empirical interaction with
• infrastructure within our
control, including the Web
server and application and
database servers. The com-
500 transactions per second if only 10 can the SUT. This definition may be helpful plete user experience encompasses the
log on concurrently and the estimates for when designing your performance inte- client user’s connection to the data cen-
the user distribution call for 10,000 simul- gration test suites. The second school ter. For internal users, this is within the
taneous logins? Conversely, does it matter states that a transaction is defined as a control of the development team. For
if your application can successfully sup- business process. This can be extremely external users, the optimization model

16 • Software Test & Performance MARCH 2009


FAST FARM

M
is much more complex. To address this
challenge, proxy services companies AKING THE CASE FOR PE
such as Akamai have emerged.
Companies and users buy the last
mile from their local Internet Service Performance engineering has matured beyond load testing, tuning and performance opti-
Provider. Companies like Akamai and mization. Today, PE must enable business success beyond application delivery into the oper-
Yahoo buy the first mile of access from ational life cycle, providing the entire enterprise—both business and information technolo-
major corporate ISPs. The middle mile gy—with proactive achievement of company objectives.
is unpredictable, and is based on dynam-
ic routing logic and rules that are opti-
Performance engineering is a proactive discipline. When integrated throughout an initia-
mized for the entire Internet, as
tive—from start to finish—PE provides a level of assurance for an organization, allowing it
opposed to optimized access for your
to plan systems effectively and ensure business performance and customer satisfaction.
users to your application.
The challenges for the middle mile
are related to the network; delays at the With budgets shrinking, proactive initiatives can be difficult to justify as their immediate
peering points between routers and return on investment is not readily visible. Emphasis on the business value and ROI of PE
servers within the middle mile. No one must become the priority. Advantages of PE are well understood, including:
entity is accountable or responsible for
this middle-mile challenge. • Cost reduction by maximizing infrastructure based on business need.
The latency associated with the • Management of business transactions across a multi-tiered infrastructure.
cloud’s unpredictability can be • The quality and service level of mission-critical systems can be defined and measured.
addressed, in part, with proxy services, • Implementation of SLAs to ensure that requirements are stated clearly and fulfilled.
which emphasize reduction in Internet
• Forecasting and trending are enabled.
latency. By adding more servers set at
the ‘edge’, Tier 1 ISPs and Local ISPs, all
But where is the ROI of PE as a discipline? Yes it’s part of maximizing the infrastructure,
static content is delivered quickly, and
oftentimes, pre-cached dynamic content and yes it’s part of systems stability and customer satisfaction, but these can be difficult to
can also be delivered. This greatly quantify. By understanding the costs of an outage, we can objectively validate the ROI of per-
reduces the number of round trips, formance engineering, as operational costs ‘hide’ the true costs of system development.
enhancing performance significantly. In
addition, proxy services strive to opti- Costs of downtime in production include recovery, lost productivity, incident handling, unin-
mize routing as a whole, with the goal of tended negative publicity and lost revenue. In an extreme example, a 15 second timeout in
reducing overall response time. an enterprise application might result in calls to an outsourced customer support center,
The typical breakdown of response which, over the course of time, could result in unanticipated support costs in the millions of
time is based on the number of round
dollars.
trips in the middle mile. The more
dynamic a Web page, the more round
An additional illustration of hidden costs that can be objectively measured to support ROI
trips required. Optimizing cloud vari-
ables will optimize overall response time calculations are the costs of designing and developing a system once, versus the cost of mak-
and user experience. ing performance modifications to a system after it is in production and has failed to meet
Performance engineering is a proac- service level expectations.
tive discipline. While an investment in
PE might be new to your organization, its Non-functional business requirements are not always captured thoroughly. Some examples
cost is more than justified by the effi- include:
ciency gains it will produce. It is clearly
more practical and affordable to invest • A multi-tiered application that can scale to meet the expected load with the proper load-
in systems currently in production, balancing scheme and that can fail over properly to meet the service levels for availability
enhancing their stability, scalability and
and reliability.
customer experience. This almost always
• A technical architecture that was engineered to meet the service levels of today and tomor-
costs less than building a new system
row (as volume and utilization increase).
from scratch, though doing so is clearly
the best way to ensure peak performance
across the SDLC. Companies need assur- As IT organizations struggle to drive down maintenance costs and fund new projects, an
ances that their systems can support cur- average IT organization can easily spend 75 percent of its budget on ongoing operations and
rent and future demands, and perform- maintenance. IT shops are caught in ‘firefighting’ mode and inevitably dedicate a larger por-
ance engineering is an affordable way to tion of their budgets to maintenance, diverting resources from efforts to deliver new value to
provide those assurances. By gathering the business. Taking a proactive stance will serve to enable reduced operating costs, higher
objective, empirical evidence of a sys- systems availability, and better performing systems today and in the future. Performance
tem's performance characteristics and engineering is that proactive capability.
behavior, and taking a proactive recom-
mendations for its maintenance, the PE
investment will surely pay for itself. ý

MARCH 2009 www.stpcollaborative.com • 17


By Ross Collard

his is the second article in a series that began last month


T with an introduction to live data use in testing, categoriza-
tion of test projects and the types and
able alternatives and trade-offs, and the
definition of “best” can be highly context-
dependent.

sources of live data. Main Alternatives


Once you’ve decided that live data fits One alternative to using copies of live
your testing efforts, you’ll soon be pre- data is to devise test scenarios and then
sented with three new questions to script or program automated tests to sup-
answer: port these scenarios. Another alternative
1. What live data should we use? is to fabricate test data with a data gener-
2. How do we capture and manipulate ation tool. A third alternative is to fore-
the data? cast future data by curve fitting and
3. How do we use it in our testing? trending. This can be done with live or
Each of these questions presents its fabricated data. Other alternatives are
own series of variables, the importance of hybrids, e.g., a extract of an operational
which will depend on your own situation. database can be accessed by fabricated
Tables 1, 2 and 3 present the commonly transactions coordinated to match the
encountered issues for each of the three database.
questions. In theory, we do not need live data if
As you review each issue listed, try we define performance testing as check-
making an initial determination of its ing system characteristics critical to per-
importance on a scale of critical, impor- formance (ability to support a specified
tant, minor or irrelevant. If you do not number of threads, database connec-
know its importance yet, place a “?” by tions, scaling by adding extra servers, no
the issue. If you do not understand the resource leaks, etc). We need only a load
brief explanation of the issue, place a which will show performance is in line
“??”.In a later article, you will be able to with specifications.
compare your choices to a group of In practice, I favor a mix of data
experts’ opinions. sources. While the judgment of experi-
enced testers is invaluable, we all have
Issue 1 unrecognized assumptions and biases.
To assess the value of live data, we Even fabricated data that matches the
need to know the alternatives. expected usage patterns tends not to
Everything is relative. Selecting the best uncover problems we assume will never
type and source of data for a perform- happen.
ance test requires awareness of the avail- The most appropriate framework for
comparison may not be live data vs. alter-
Ross Collard is founder of Collard &
native data sources. Live data in black box
Company, a Manhattan-based consulting firm
that specializes in software quality.
testing is “black data”: we are using
unknown data to test unknown software

18 • Software Test & Performance MARCH 2009


behavior. The data source alternatives are
not the full story. The system vulnerabili-
Data Generators
GIGO (garbage-in, garbage-out) is the Step-by-Step
ties and comparability of test and produc- predominant way that data generators
tion environments also are significant to
assessing value.
are utilized. The tool output — called
fabricated or synthetic test data — Guidance For
often is focused for the wrong reasons.
Allocating Resources
What mix of data from different
Over-simplifying the problem, unfa-
miliarity in using these tools, tool
Entering The Maze,
sources, live and otherwise, is most quirks, knowing the test context only
appropriate? If we do not consider all
potential sources of data, our perspec-
superficially, and lack of imagination
are not unusual. All can lead to
Picking The Safest
tive and thus the way we test may be hidden, unwanted patterns in the
limited.
Testers benefit by allocating their
fabricated data that might give false
readings.
Path And Emerging
efforts appropriately among different Fabricated data often lacks the rich-
test approaches and sources, and
understanding the alternatives helps
ness of reality. Fabricating data is more
difficult when data items have referen-
With The Most
improve these decisions. Though allo- tial integrity or data represents recog-
cations often change as a test project
progresses, having a realistic sense of
nizable entities — no random charac-
ter string can fully replace a stock tick-
Effective Tests
the alternatives at the project initia- er symbol or customer name. Another
tion helps us plan. pitfall with fabricated data is con-
sciously or unconsciously massaging
Possible
Scripted Tests the data so tests always pass. Our job is
Compared to live data extracts, script- to try to break it.
ed test cases tend to be more effective
because each is focused, aimed at con- Determining Value
firming a particular behavior or uncov- How do we assess and compare value?
ering a particular problem. But they “Value to whom?” is a key question.
work only if we know what we are look- The framework in the first article in
ing for. this series identifies characteristics
Compared to a high-volume ap- which influence value. The value of a
proach using an undifferentiated del- data source can be measured by the
uge of tests, the total coverage by a problems it finds, reducing the risk
compact suite of scripted test cases is and impact of releases to stakeholders.
likely to be low. However, the coverage Value depends on what is in the
of important conditions is high because data – what it represents — and what is
of the focusing. The cost of crafting being tested. If the data has a low user
and maintaining individual test cases count, it will not stress connection
usually is high for each test case. tables. The same repeated data

MARCH 2009 www.stpcollaborative.com • 19


LIVE DATA II

TABLE 1: SELECTING THE LIVE DATA TO USE IN TESTING that do not exist in the real world.
These problems originate from bad
Issue Importance to You
assumptions, biases or artifacts of
1. To assess the value of live data, we need to know the alter- the simulation.
natives. 2. Failing to determine actual peak
demands in terms of volume and
2. The live data chosen for testing does not reveal important
behaviors we could encounter in actual operation. mix of the load experienced by
the live system, and to correlate
3. Unenhanced, live data has a low probability of uncovering a this to known periods of inade-
performance problem. quate performance.
4. Test data enhancement is a one-time activity, not ongoing, In either case, mining data from the
agile and exploratory. live system is critical in guiding the
scope and implementation of perform-
5. The data we want is not available, or not easy to derive from
what is available.
ance testing.

6. Background noise is not adequately represented in live data. Live Data Limitations
Live data does not always have the varia-
stream, even if real, likely won’t have ary conditions, for example, is usually tion to adequately test the paths in our
much effect on testing connections not worth the extra effort. applications. We may not always catch the
per second or connection scavenging results of an outlier if we use only unen-
in a stateful device. Unique Benefits of Live Data hanced live data. In a new organization
Value is related to usefulness and The great benefit of live data is undeni- there may not be enough live data —-
thus is relative to the intended use. able: it is reality-based, often with a seem- the quantity of data available is not
Live data is effective some areas, such ingly haphazard, messy richness. The enough to effectively test the growth
as realistic background noise, but less
for others, such as functional testing. TABLE 2: OBTAINING THE LIVE DATA
On the other hand, fabricated data Issue Importance to You
can be designed to match characteris-
tics desirable for a given situation. 7. Live data usually can be monitored and collected only from
limited points, or under limited conditions.
The value of the crafted data is high
for that purpose, usually higher than 8. Tools and methods influence the collection process, in ways
live data. not well understood by the testers.

9. The capture and extract processes change the data, regard-


Baselines less of which tools we employ. The data no longer is repre-
A baseline is the “before” snapshot of sentative, with subtle differences that go unrecognized.
system performance for before-and-
10. The live data is circumscribed.
after comparisons, and is built from
live data. Baseline test suites can be 11. The live data has aged.
effective in catching major architec-
tural problems early in prototypes, 12. The data sample selected for testing is not statistically valid.
when redesign is still practical. 13. Important patterns in the data, and their implications, go
Fabricated data also has proved useful unrecognized.
in uncovering basic problems early,
though considerable time can be data variety, vagaries and juxtapositions potential, (or for the pessimistic, is not
spent solving problems that would are difficult to replicate, even by the most enough pressure to find where the appli-
never occur in realistic operating con- canny tester. cation breaks).
ditions. In areas like building usage profiles In health care or financial organiza-
and demand forecasting, there is no sub- tions, among others, using live data
Trade-Offs stitute for live data. Capacity planning, could expose your company to law
The goal of a load test 90% of the time for example, depends on demand fore- suits. To remain Sarbanes Oxley com-
is to “simulate a real-world transaction casting, which in turn depends on trends. pliant, it may be worthwhile to scram-
load, gauge the scalability and capacity Live data snapshots are captured over a ble live data in the test environment
of the system, surface and resolve (relatively glacial) duration, and com- while retaining data integrity, and still
major bottlenecks”. While thoughtful pared to see the rates of change. The test with enough data points.
augmentation of live data can surface trends are then extrapolated into the
performance anomalies, the largest future, to help determine the trigger Issue 2
degradation tends to occur from appli- points for adding resources. The live data chosen for testing does
cation issues and system software con- not reveal important behaviors we
figuration issues. Unless you have no Common Blunders could encounter in actual operation.
live data at all and must create test Two common blunders in performance This is a risk, not a certainty. Black box
data from scratch, the time spent cre- testing are; live data may or may not reveal important
ating your own data to test data bound- 1. Creating and then solving problems issues. Often it makes sense to supple-

20 • Software Test & Performance MARCH 2009


LIVE DATA II

ment black box , end-to-end tests with Corner cases must be tested. No mat- For brevity, the remainder of the
ones focusing on a particular compo- ter what live data is chosen, it will not nec- issues are summarized and may not be
nent, subsystem, tier or function. There essarily be representative of real world sit- specifically called out.
are probably enough other aspects of the uations — the proper mix of applica-
system that aren’t quite “real” that the tions, user actions and data. For example, Issue 3
representativeness of the data is just one no network is simple, and no simulated Unenhanced, live data has a low
of many issues. combination of traffic will ever exercise it probability of uncovering a per-
fully. Production is where the rubber formance problem.
Testers’ Limited Knowledge meets the road. Smart testers may choose Most live data has been captured under
Most testers do not know the live data to stress as much of the device or infra- “normal” working conditions. The data
content and its meaning, unless they structure as possible, to see how each needs to be seeded with opportunities to
already are closely familiar with the situa- device operates and how it affects other fail, based on a risk assessment and a fail-
tion, or have extensive time and motiva- devices in the network. ure model. Running a copy of live data as
tion to learn about the live data. Although no amount of testing is a test load seems practical but may not
An extensive learning effort often is guaranteed to reveal all important behav- produce reliable performance data.
required because of the live data’s rich- iors, captured live data can reveal many Confluent events may trigger telltale
ness, volume and complexity. that occur in live operation. This is true symptoms in the test lab, such as a cusp
Running a volume test with no under- whether or not we investigate and under- (sudden increase in gradient) in a
standing of the data does not prove any- stand these behaviors. Unrecognized response time curve. If the confluence
thing — whether it passes or fails. behaviors are not detected, but nonethe- and its symptoms are unknown, testers
Developers will not be thrilled with the less are present and possibly will be dis- do not know what behaviors to trigger or
extra work if the test team is not capable covered later. Live data can only trigger patterns to look for.
of determining whether the data created an incomplete sample of operational Live data will reveal important behav-
a valid failure scenario. behaviors. Other behaviors in operation iors, and will also throw false positives
Confluent events may trigger telltale are not included, and some are likely to and negatives. “Important” is in the eye
symptoms of problems in the test lab. But be important. “Important” is in the eye of of the beholder. You could argue that you
if the confluence and its symptoms are the beholder. might need some better testers who
unknown, testers do not know what data In summary, live data can reveal understand the data. Running a volume
values and pattern or sequence to look important behaviors — and also mislead test with no understanding of the data
for. The testers thus cannot check for the by throwing false positives and negatives. isn’t going to prove anything - whether it
pattern’s presence in the live test data. Much depends on the details, for exam- passes or fails. And, a development team
ple, of how we answer questions like will not be thrilled to have to do all the
Over-Tuning Test Data these. What period of time is chosen, and analysis because the test team is not capa-
The representativeness of the test data is why? What are the resonances in a new ble of determining if the data has created
just one of many compromises. Other system being tested? Does the live data a valid failure or success scenario.
aspects of the system often are not suffi- reflect any serialization from the logging
ciently realistic. For example, we might mechanism? What errors resulted in not Issues 4, 5 and 6
better add monitoring capabilities to the logging certain test sessions, or caused Test data enhancements, data unavail-
product than fiddle with the test data. issues from thread safety challenges or ability and background noise.
Beware of becoming too sophisticated and memory leaking? More a craft than a science, test load
losing track of your data manipulations. For Scaling up live data can be difficult if design (TLD) prepares work loads for
example, while live data can provide cov- we are interested in finding a volume- use in performance and load testing. A
erage for random outlying cases in some related failure point, and the data is load is a mix of demands, and can be
situations, it can be a trap to incorporate closely correlated to the volume. denominated in a variety of units of
changes in usage over time.
TABLE 3: USING THE LIVE DATA IN TESTING
Problems That Data Refinement Issue Importance to You
Can't Fix
Using live data reveals one important 14. Running data in a test lab requires replay tools which may
piece of information, which is whether interfere with the testing.
the application will perform problem- 15. Capture and replay contexts are not comparable.
free with that specific data and in a spe-
cific test lab. As for live data revealing any 16. Even if the same test data is re-run under apparently the
same conditions, the system performance differs.
other behaviors, that depends on how
you use your live data. For example, if 17. The test results from using the live data are not actionable.
you read live data from the beginning of
a file for each test run, you expect pre- 18. The test data is not comparably scalable.
dictably repeatable response time graphs. 19. Live data selected for functional testing may not be as effec-
If you randomly seed your live data (start tive if used for performance testing, and vice versa.
at a random point in the file each time),
20. Coincidental resonances and unintended mismatches occur
you may not experience repeatable
between the live data, the system and its environment.
behavior.

MARCH 2009 www.stpcollaborative.com • 21


LIVE DATA II

measure: clicks, transactions, or whatever willing to assume that the behavior of the confusing failures and faults – their rela-
units fit the situation. TLD is one of the system is essentially the same (“equiva- tionships can distract, but unless you are
more important responsibilities of per- lent”) under these three different condi- a Sufi philosopher the causes and effects
formance testers, and many see it as a tions. Similarly, if the first transaction fails fall into place. (Sufis do not believe in
critical competency. to print a check within 15 seconds (to cause and effect.) Remember that one
TLD is situational and heuristic, with John Smith for $85.33), can we assume failure can be caused by many different
four main approaches: that the other two test transactions will faults, and one fault can trigger none to
• Develop test scenarios and script test fail too, and therefore not bother to an indefinitely large number of failures.
cases. (Typically based on docu- process them? Most testers would say yes. There is no one-to-one relationship.
mented or assumed performance Of course, these are only assumptions,
requirements.) not known facts. It could be, though we TLD with Live Data
• Generate volumes of fabricated data. don’t know this, that John Smith has With live data, TLD has a different fla-
(Typically using a homebrew or exactly $85.33 in his bank account. The vor than traditional test case design.
commercial test data generation first transaction for $85.33 works, but a Instead of building a new test case or
tool.) request to print a second check for John modifying an existing one, we seed the
• Copy, massage, enhance live data. Smith for $85.34 or more will not be hon- situation, i.e., embed it into the test
(This depends on the availability of ored. The response time becomes the data from live operations.
live data in a usable form.) duration to deposit sufficient funds in We can survey the pristine data to
• A combination of the first three. John Smith’s account or infinite if the identify opportunities to exploit the sus-
new deposit does not happen. pected vulnerability. If not, we modify or
Performance Requirements What if the system prints one check add opportunities to the original data. If
Requirements are more about user satis- correctly, but because of a misinterpreted the surveying effort is a hassle, we can
faction than metrics targets. Though they requirement is designed to not print a skip it and enrich the original data.
may need to be quantified to be measura- second check for the same person on the
ble, it is more important that the require- same day? We would not find this per- Agile Feedback in Testing
ments reflect the aspirations of users and formance bug if we assume equivalence We may not be aware of test data prob-
stakeholders. If the requirements are not and use only one test transaction. lems unless we organize a feedback
adequate, a common situation, we may Most of us instinctively use equiva- loop. Testing often has a one-way pro-
have to expand the test project scope to lence while we are testing. If one test case gression, with little feedback about
include specifying them. We can capture results in a certain behavior, whether how well the test data worked.
aspirations by conducting a usability test acceptable or not, we simply assume Feedback loops can be informal to the
and a user satisfaction survey. other equivalent test cases would behave point of neglect, not timely and
a similar way without running them. action-oriented, or encumbered with
Using Equivalence and Partitioning To paperwork. Only when live data leads
Confirm Bellwethers Modeling Performance Failures to obviously embarrassing results do
We use equivalence to group similar test Test design is based consciously or other- we question the accepted approach.
situations together, and pick a represen- wise on a theory of error, also called a fail- To skeptics, this acceptance of the
tative member from each group to use as ure model or fault model. status pro is sensible: if it is not bro-
a test case. We want the best representa- In functional testing, an example of ken, don’t fix it. Perhaps the real prob-
tive test case, as within an equivalence an error is returning an incorrect lem is overly anxious nit-pickers with
class (EC) some are more equal than oth- value from a computation. In per- too much time on their hands.
ers. Despite our careful attempts at formance testing, an example is Perhaps our testing is really not that
demarcation, uncertainties mean many returning the correct value but too sensitive to data nuances.
ECs are fuzzy sets (i.e., with fuzzy bound- late to be useful. Another: not being Ask your skeptics: “How credible
aries – membership of the set is not black able to handle more than 1,000 users are your current test results? How con-
and white). The costs to develop test when the specs say up to 10,000 must fident are you when you go live? Do
cases vary. The best representative test be supported concurrently. you risk service outages, or over-engi-
case is the one which reliably covers the When we craft an individual test neer to avoid them?” Many have
largest population of possible test cases at case, we assign an objective to it, either thoughtful insights; more stumble
the lowest cost. to confirm an expected behavior that when asked.
For example, let’s assume that we cre- may be desirable or undesirable, or to Obtaining useful feedback does not
ate a test case to print a check, to pay a try to find a bug. In the latter case, we have to be cumbersome. We build a
person named John Smith $85.33. The effectively reverse-engineer the test prototype test load from live data, run
system prints the check correctly within design, starting with a failure, working it in trials and examine the outcomes
15 seconds, which is our response time backwards to the faults, i.e., possible for complications. Here “build”
goal. Since the system worked correctly causes of failure, and then to the condi- includes “massage and improve”.
with this test transaction, do we need to tions and stimuli that trigger the failure. Without feedback there is no learning,
investigate its behavior if we request a Test cases are then designed to exercise and pressure to deliver a perfect test
check for John Smith in the amount of the system by trying to exploit the spe- load the first time.
$85.32 or $85.34? Probably not. cific vulnerability of interest. I plan at least three iterations of
If everything else like background Our test design is driven by our theo- refining the data. If you have 10 hours
noise remains the same, most of us are ry of error. Do not worry initially about to spend on getting the data right, do

22 • Software Test & Performance MARCH 2009


LIVE DATA II

not spend 8 hours elaborately captur- anomalies are detected. • Data capture and extraction entails
ing it before you have something to These steps are iterative. less work when we regression-test
review. Instead, plan to invest your minor changes to existing systems.
hours in the pattern: 1-2-3-4. Expect The Availability of Live Data
several trials, have a prototype ready Of the six steps above, Step 2 (capturing Typical Live Data to Collect
within one hour, and reserve more or extracting the live data), arguably is Live data is not undifferentiated,
time at the end for refinement than the critical one. This step is feasible only though to the untutored it may
the beginning. appear to come in anonymous sets of
bits. Not all data is equally good for
Agile Feedback in Live Operations our purposes. If this claim is true,
Actionable, responsive and timely feed-
back matters more in live operation than
in the relative safety of a test lab, because
the feedback cycle durations tend to be
• then what are the desirable character-
istics of live data from a performance
testing perspective? To answer, I will
drill down from the performance
much shorter. Systems are labeled as Eliciting requirements goals to the characteristics to monitor
unstable (a) if they have or are suspected or measure, then to the atomic data of
will have unpredictable behavior, and (b) is a major scope interest to us.
when that behavior happens, our correc-
tive reaction times are too slow to prevent increase in testing Performance Goals
damage. The data we want to gather is based on
Put another way, systems with uncer- projects. Do not the testing goals and thus ultimately on
tainty and inertia are hard to control. We the system performance goals. If these
can’t effectively predict their future expand your goals are not explicit, we can elicit
behavior, analyze data in real-time to them in requirements inter views.
understand what events are unfolding,
nor quickly correct behavior we don’t
project without (Caution – eliciting requirements is a
major scope increase in testing proj-
like. By the time we find out that the sys- ects. Do not expand your project with-
tem performance or robustness is poor,
carefully assessing out careful assessing your options.)
conventional responses like tuning, Or the performance goals may be
rewriting and optimizing code, and
your options. outlined in documents like product
adding capacity may be inadequate. marketing strategies, user profiles, fea-
Question: What type of live data helps
facilitate or impede timely feedback?

Preparing Test Loads from Live Data


• ture comparisons and analysis of com-
petitive products. Examples of perform-
ance goals:
• Our users’ work productivity is
While the specifics vary, typically the superior to the comparable pro-
process of preparing a test load from live ductivity of competing organiza-
data includes these six steps. Some or all with comparable prior or parallel experi- tions or competing systems.
of the steps typically are repeated in a ence, and its difficulty decreases with the • System response times
series of cycles: more experience we have. under normal working con-
Step 1: Determine the characteristics Obtaining test data for a breakthrough is ditions generally are within
required in the test load, i.e., the mix of hardest. the desired norms (e.g.,
demands to be placed on the system • If a system or feature is radically product catalog searches
under test (SUT). Often these are spec- innovative, there is no precedent to average 50% faster than our
ified in rather general terms. E.g., serve as a live data source. 5 fastest competitors; “gen-
“Copy some live data and download it to • Something completely new is rare, erally” implies that in some
the test lab.” though, so at least a modicum of instances the response
Step 2: Capture or extract data with comparable history is likely. times are not superior.
the desired characteristics, or re-use exist- Testing a new system tends to be easier. None of these inferior
ing extracts if compatibility and data • If a new system replaces an exist- response times can be for
aging are not problems. ing one, it is unlikely that the high $ value transactions for
Step 3: Run the performance or load database structure, transactions our premium customers;
test using the extract(s). Sometimes the and data flows exactly mirror the realistic level of background
same test run fulfills both functional test existing ones. noise is assumed.)
and performance / load test objectives. Testing a new version of an existing system • The number of concurrent-
Step 4: Review the output test results and its infrastructure usually is easiest. ly active users supported
for anomalies. Often the anomalies are • If live data does not already exist, it and the throughput are
not pre-defined nor well understood. A can be generated using a prior ver- acceptable (e.g., at least
common approach: “If there’s a glitch, I sion of the system being tested. 1,000 active users; at least 1
will know it when I see it.” • Similar data from other parallel situ- task per user completed by
Step 5: Provide feedback. ations can be captured and may be 90% or more of these users
Step 6: Take corrective actions if usable with little conversion. in ever y on-line minute;

MARCH 2009 www.stpcollaborative.com • 23


LIVE DATA II

“user task” needs to be needs. The relationship can be reversed Atomic Data to Harvest
defined). – the data availability influences the test If a characteristic or metric is not
• Response times under occa- goals, sometimes inappropriately. ready to gather, we may be able to cal-
sional peak loads do not culate it from more fundamental data
degrade beyond an accept- Characteristics to Monitor or Measure — if that data is available. A depend-
able threshold (e.g., test We test to evaluate whether a system’s ent variable is one which is derived
peak is set to the maximum performance goals have been met satis- from one or more independent vari-
expected weekly load of factorily. Effective goals are expressed ables. An independent one by defini-
2,500 users; in this mode, in terms of the desired values of per- tion is atomic. We calculate perform-
the average degraded formance characteristics, averages to be ance characteristics from atomic data.
response time is no more met, ratios and thresholds not to Whether atomic or not, data of inter-
than 25% slower than the exceeded, etc. Observing or calculating est would include user, work and event
norm). the values of the characteristics is vital data and counts, and resource utiliza-
Goals need to be quantified, for to this evaluation. tion stats.
objective comparisons between actual Characteristics of interest can be static Sometimes the atomic data is not
values and the targets. I have not both- (e.g., the rated bandwidth of a network available, but derivatives are. The low-
ered to quantify all the goals above, to link, which does not change until the est-level dependent data that we can
highlight how vague goals can be with- infrastructure is reconfigured or can access effectively becomes our basis
out numeric targets, and because it react to changing demands), but are for calculation. Examples:
introduces another layer of distracting more likely to be dynamic, The values of Timings
questions. Equally important, the con- many dynamic performance characteris- • Expected cause-and-effect rela-
text — i.e., the specific conditions in tics depend on (a) the loads and (b) the tionships among incidents.
which we expect the system to meet resources deployed. Measuring perform- • Duration of an event.
the goals — must be spelled out. ance is pointless without knowing the • Elapsed time interval between a
load on the system and the resources uti- pair of incidents.
Performance Testing Goals lized at the time of measurement. • Synchronization of devices.
Within the framework of the SUT (sys- Static characteristics include the allo- Rates of change
tem under test) and its performance cated capacities (unless the system and • Number of user log-ons during an
goals, the testing goals can vary con- infrastructure are self-tuning): memory interval.
siderably. For example, if the test capacity, for each type of storage and at • Number of log-offs in the same
objective is to verify that capacity fore- each storage location, processing capaci- interval.
casting works versus let’s say predict- ty, e.g., their rated speeds, and network
ing a breakpoint, different though capacity (rated bandwidths of links). Fighting the Last War
related metrics need to be tracked. In Other static characteristics include Using live data is like driving an automo-
both cases, the on / off availability of pertinent fea- bile by looking in the rear vision mirror.
the metrics tures like load balancing, firewalls, server The data reflects the past. For example, if
are compli- cluster failover and failback, and topolo- the growth rate at a new website exceeds
cated by non- gy (i.e., hub vs. spoke architecture) 50 percent a week, a two-week-old copy of
linearity. Dynamic characteristics include: live data understates current demands by
Capacity fore- • Response times, point-to- more than 75 percent.
casting seeks to point or end-to-end delays, The growth rate is the rate of change
predict what addi- wait times and latencies. from when the live data was captured to
tional resources • Throughput, e.g., units of when the test is run. Growth rates are
are needed, and work completed per unit both positive and negative. Negative
when and where of time, such as transac- growth, of course, is a decline.
they need to be tions per second. The volumes of some types of data
added, to main- • Availability of sys- may grow while others decline, as the
tain an acceptable tem features and mix rotates. If they cancel each other
level of service, let’s say capabilities to users. out, the net growth is zero. We cannot
in compliance with an SLA • Number of concur- work with change in the aggregate
(service level agreement). rently active users. unless we are confident the conse-
Predicting a breakpoint, • Error rates, e.g., by type of trans- quences are irrelevant, but must sepa-
by contrast, involves testing with action, by level of severity. rately consider the change for each
increasing load, monitoring how met- • Resource utilization and spare main type of work.
rics like response time and throughput capacity, queue lengths, number The boundary values, e.g., a growth
change with the increasing load, and and frequency of buffer overflows. rate of +15%, are not fixed by scientif-
extrapolating the trends (hopefully • Ability to meet service level agree- ic laws but are approximate.
not with a straight line), until the ments. Over time, you'll accumulate experi-
response time approaches infinity or • Business-oriented metrics like $ ence and data from your own projects.
the throughput approaches zero or revenue per transaction, and the And when you're confident in the accu-
both. cost overheads allocated to users. racy of your growth rates, replace the
Testing goals influence the data approximate values with your own. ý

24 • Software Test & Performance MARCH 2009


Don’t
Miss
Out

Test & QA Report


On Another Issue of The

eNewsletter!
Each FREE biweekly issue includes original articles that
interview top thought leaders in software testing and
quality trends, best practices and Test/QA methodologies.
Get must-read articles that appear
only in this eNewsletter!

Subscribe today at www.stpmag.com/tqa


To advertise in the Test & QA Report
eNewsletter, please call +631-393-6054
dkarp@stpcollaborative.com
automate performance tests
Bowl Over Competitor Web Sites
With Techniques From The Real World

By Sergei Baranov

he successful development
T of scalable Web ser vices
requires thorough perfor mance
performance in a complex, multi-lay-
ered, rapidly-changing Web services
environment. Because of the complex-
ity of Web services applications and an
and resolved. In order to satisfy these
requirements, Web services perform-
ance tests have to be automated.
Applying a well-designed, consistent
testing. The traditional performance increasing variety of ways they can be approach to performance testing
testing approach—where one or more used and misused, an effective Web automation throughout the develop-
load tests are run near the end of the services performance testing solution ment lifecycle is key to satisfying a
Photographs by Joe Sterbenc

application development cycle—can- will have to run a number of tests to Web services application’s perform-
not guarantee the appropriate level of cover multiple use case scenarios that ance requirements.
the application may encounter. These This article describes strategies for
Sergei Baranov is principle software engineer tests need to run regularly as the appli- successful automation of Web services
for SOA solutions at test tools maker
cation is evolving so that performance performance testing and provides a
Parasoft Corp.
problems can be quickly identified methodology for creating test scenar-

MARCH 2009 www.stpcollaborative.com • 27


SPARE PERFORMANCE

ios that reflect tendencies of the real- question of whether the application
world environment. To help you apply meets its performance requirements
these strategies, it introduces best unanswered for most of the develop-
practices for organizing and executing ment cycle. Unaware of the applica-
automated load tests and suggests how tion’s current performance profile,
these practices fit into a Web services developers are at risk of making wrong
application’s development life cycle. design and architecture decisions that
could be too significant to correct at
Choosing a Performance the later stages of application develop-
Testing Approach ment. The more complex the applica-
Performance testing approaches can be tion, the greater the risk of such design
generally divided into three categories: mistakes, and the higher the cost of
the “traditional” or “leave it ’till later” straightening things out. Significant
approach, the “test early test often” performance problems discovered
approach, and the “test automation” close to release time usually result in
approach. The order in which they are panic of various degrees of intensity,
listed is usually the order in which they followed by hiring application perform-
are implemented in organizations. It is ance consultants, last-minute purchase
also the order in which they emerged of extra hardware (which has to be
historically. shipped overnight, of course), as well as
The “traditional” or “leave it ’till performance analysis software.
later” approach. Traditionally, compre- The resolution of a performance
hensive performance testing is left to problem is often a patchwork of fixes to
the later stages of the application devel- make things run by the deadline. The
opment cycle, with the possible excep- realization of the problems with the
tion of some spontaneous performance “leave it till later” load testing practice
evaluations by the development team. led to the emergence of the “test early,
Usually, a performance testing team test often” slogan.
works with a development team only The “test early, test often” ap-
during the testing stage when both proach. This approach was an intuitive
teams work in a “find problem – fix step forward towards resolving signifi-
problem” mode. cant shortcomings of the “traditional”
Such an approach to performance approach. Its goal is reducing the the automation of application perform-
testing has a major flaw: it leaves the uncertainty of application performance ance testing.
during all stages of development by The “performance test automation”
FIG. 1: STRIKE TEAM catching performance problems before approach. The performance test auto-
they get rooted too deep into the fabric mation approach provides the means to
of the application. This approach pro- enforce regular test execution. It
Developers moted starting load testing as early as requires that performance tests should
application prototyping and continu- run automatically as a scheduled task:
New source code, ing it through the entire application most commonly as a part of the auto-
code changes
lifecycle. mated daily application “build-test”
Source Code Repository However, although this approach process.
promoted early and continuous testing, In order to take the full advantage of
Entire application it did not specify the means of enforcing automated performance testing, howev-
source code
what it was promoting. Performance er, regular test execution is not enough.
Nightly Build System testing1 still remained the process of An automated test results evaluation
manually opening a load testing appli- mechanism should be put into action to
Application cation, running tests, looking at the simplify daily report analysis and to
results and deciding whether the report bring consistency to load test results
Functional Performance table entries or peaks and valleys on the evaluation.
Regression Regression performance graphs mean that the test A properly-implemented automated
Tests Tests succeeded or failed. performance test solution can bring the
This approach is too subjective to be following benefits:
Test Reports consistently reliable: its success largely • You are constantly aware of the
depends on the personal discipline to application’s performance profile.
Reporting System run load tests consistently as well as the • Performance problems get detect-
knowledge and qualification to evaluate ed soon after they are introduced
performance test results correctly and due to regular and frequent test
Test results analyzed and
processed by the Reporting Stytem reliably. Although the “test early, test execution.
often” approach is a step forward, it falls • Test execution and result analysis
short of reaching its logical conclusion: automation makes test manage-

28 • Software Test & Performance MARCH 2009


SPARE PERFORMANCE

services load testing in your organiza- CPU utilization of the application server
tion, it is time to consider the principles was greater than 90 percent on average at
of how your performance test infra- a certain hit per second rate, the test
structure will function. should be declared as failed.
This type of decision making is not
Automating Build-Test Process applicable to automated performance
A continuous or periodic daily/night- testing. The results of each load test
ly build process is common in forward- run must be analyzed automatically
looking development organizations. If and reduced to a success or failure
you want to automate your perform- answer. If this is not done, daily analysis
ance tests, implementing such a of load test reports would become a
process is a prerequisite. Figure 1 time-consuming and tedious task.
shows the typical organization of a Eventually, it would either be ignored
development environment in terms of or become an obstacle to increasing
how source code, tests, and test results the number of tests, improving cover-
flow through the automated build-test age, and detecting problems.
infrastructure. To start the process of automating
It makes sense to schedule the load test report analysis and reducing
automated build and performance results to a success/failure answer, it is
test process to run after hours—when helpful to break down each load test
the code base is stable and when idling report analysis into sub-reports called
developer or QA machines can be uti- quality of service (QoS) metrics. Each
lized to create high-volume distrib- metric analyzes the report from a specific
uted loads. If there were failures in the perspective and provides a success or fail-
nightly performance tests, analyzing ure answer. A load test succeeds if all its
the logs of your source control reposi- metrics succeed. Consequently, the suc-
tory in the morning will help you iso- cess of the entire performance test batch
late the parts of the code that were depends on the success of every test in
changed and which likely caused the the batch:
performance degradation. It is possi- • Performance test batch succeeds if
ble that the failure was caused by some • Each performance test scenario suc-
hardware configuration changes; for ceeds if
ment very efficient. this reason, keeping a hardware main- • Each QoS metric of each scenario
Because of this efficiency gain, the tenance log in the source control succeeds
number of performance tests can be repository will help to pinpoint the It is convenient to use performance
significantly increased. This allows you problem. Periodic endurance tests test report QoS metrics because they
to: that take more than12 hours to com- have a direct analogy in the realm of Web
• Run more use case scenarios to plete could be scheduled to run dur- services requirements and policies. QoS
increase tests coverage. ing the weekend. metrics can be implemented via scripts or
• Performance test sub-systems and
components of your application in FIG. 2: QUALITY OF SERVICE METRICS
isolation to improve the diagnostic
potential of the tests.
• Automated test report analysis
makes test results more consistent.
Your performance testing solution is
less vulnerable to the personnel
changes in your organization since both
performance tests and tests success cri-
teria of the existing tests are automated.
Of course, implementing perform-
ance test automation has its costs. Use
common sense in determining which Automating Performance Test tools of a load test application of your
tests should be automated first, and Results Analysis preference and can be applied to the
which come later. In the beginning, you In a traditional manual performance test- report upon the completion of the load
may find that some tests can be too ing environment, a quality assurance test. Another advantage of QoS metrics is
time- or resource-consuming to run reg- (QA) analyst would open a load test that they can be reused. For instance, a
ularly. Hopefully, you will return to report and examine the data that was col- metric that checks the load test report for
them as you observe the benefits of per- lected during the load test run. Based on SOAP Fault errors, the average CPU uti-
formance test automation in practice. system requirements knowledge, he or lization of the server, or the average
Once you’ve made a decision to she would determine whether the test response time of a Web service can be
completely or partially automate Web succeeded or failed. For instance, if the reused in many load tests. A section of a

MARCH 2009 www.stpcollaborative.com • 29


SPARE PERFORMANCE

sample load test report that uses QoS looking software development organi- • Increase the number of tests to
metrics is shown in Figure 2. zations. The same practice could be improve performance test coverage.
successfully applied to performance
Collecting Historical Data tests as well. Improving Diagnostic Ability
Load test report analysis automation cre- The best way to build performance Of Performance Tests
ates a foundation for historical analysis of tests is to reuse the functional applica- As a rule, more generic tests have
performance reports. Historical analysis tion tests in the load test scenarios. greater coverage. However, they are also
can reveal subtle changes that might be With this approach, the virtual users of less adept at identifying the specific
unnoticeable in daily reports and pro- the load testing application run com- place in the system that is responsible
vides insight into the application’s per- plete functional test suites or parts of for a performance problem. Meta-
formance tendencies. As new functional- functional test suites based on the vir- phorically speaking, such tests have
ity is introduced every day, the change in tual user profile role. When creating greater breadth, but less depth. More
performance may be small from one day load test scenarios from functional isolated tests, on the other hand, pro-
to the next, but build up to significant dif- tests, make sure that the virtual users vide less coverage, but are better at
ferences over a long period of time. Some running functional tests do not share pointing to the exact location of a prob-
performance degradations may not be resources that they would not share in lem in the system internals. In other
big enough to trigger a QoS metric to the real world (such as TCP Sockets, words, because they concentrate on a
fail, but can be revealed in performance SSL connections, HTTP sessions, specific part of the system, they have
history reports. Figure 3 shows an exam- SAML tokens etc.). greater depth but less breadth. An effec-
tive set of performance tests would con-
FIG. 3: TEAM HANDICAP tain both generic high-level (breadth)
tests and specific low-level (depth) tests
that complement each other in improv-
ing the overall diagnostic potential of a
performance test batch.
For instance, a high-level perform-
ance test that invokes a Web service via
its HTTP access point might reveal
that the service is responding too slow-
ly. A more isolated performance test
on an EJB component or an SQL
query that is being invoked as a result
of the Web service call would more
precisely identify the part of the appli-
cation stack that is slowing down the
service. With the automated perform-
ance testing system in place, you can
ple of a QoS metric performance history Following either the traditional and easily increase the number of tests and
report. the test early, test often performance test- augment the high-level tests that
Once you have established an auto- ing approaches usually results in the invoke your Web services via their
mated testing infrastructure, it is time to creation of a small number of perform- access points with more isolated, low-
start creating load test scenarios that will ance tests that are designed to test as level tests that target the performance
evaluate the performance of your system. much as possible in as few load test of the underlying tiers, components,
runs as possible. Why? The tests are run sub-systems, internal Web services, or
Creating Performance Test and analyzed manually, and the fewer other resources your application
Scenarios–General Guidelines load tests there are, the more manage- might depend on.
Performance test scenarios should be able the testing solution is. The down- In practice, you don’t have to cre-
created in step with the development of side of this approach is that load test ate low-level isolated tests for all com-
the application functionality to ensure scenarios which try to test everything in ponents and all tiers to complement
that the application’s performance pro- a single run usually generate results the high-level tests. Depending on the
file is continuously evaluated as new that are hard to analyze. available time and resources, you can
features are added. To satisfy this If performance testing is automated, limit yourself to the most important
requirement, the QA team should work the situation is different: you can create ones and build up isolated perform-
in close coordination with the develop- a greater number of tests without the ance tests as problems arise. For exam-
ment team over the entire application risk of making the entire performance ple: while investigating a high-level
life cycle. Alternatively, the develop- testing solution unmanageable. You can Web service test failure, let's say that a
ment team can be made responsible for take advantage of this in two ways: performance problem is discovered in
performance test automation of its own • Extend high-level Web services per- an SQL query. Once the problem is
code. The practice of creating a unit formance tests with subsystem or resolved in the source code, secure
test or other appropriate functional test even component tests to help isolate this fix by adding an SQL query per-
for every feature or bug fix is becoming performance problems and improve formance test that checks for the
more and more common in forward- the diagnostic ability of the tests. regression you just fixed. This way,

30 • Software Test & Performance MARCH 2009


SPARE PERFORMANCE

your performance test librar y will FIG. 4: CATEGORY BREAKDOWN


grow “organically” in response to the
arising needs.
regular use misuse malicious use
Increasing Performance
Test Coverage
The usefulness of the performance type of use
tests is directly related to how closely WSDL requests virtual user
they emulate request streams that the Web Service
Web ser vices application will content type Load Test emulation mode
Scenario request per
encounter once it is deployed in the service requests
second
production environment. In a com-
plex Web services environment, it is of type of load
the essence to choose a systematic
approach in order to achieve adequate average load peak load stress test endurance test
performance test coverage. Such an
approach should include a wide range
of use case scenarios that your applica-
tion may encounter. Measure service performance with cally, during the test.
One such approach is to develop various sizes of client requests and When load testing stateful Web
load test categories that can describe ser ver responses. If the expected services, such as services that support
various sides of the expected stream of request sizes and their probabilities the notion of a user, make sure that
requests. Such categories can describe are known (for example, based on log you are applying appropriate intensity
request types, sequences, and intensi- analysis), then create the request mix and concurrency loads. Load intensity
ties with varying degrees of accuracy. accordingly. If such data is unavail- can be expressed in request arrival
An example of such a category break- able, test with the best-, average-, and rate; it affects system resources
down is shown in Figure 4. worst-case scenarios to cover the full required to transfer and process client
Let’s consider these categories in performance spectrum. requests, such as CPU and network
more detail. (The load type category resources.
analysis of your Web service can obvi- Emulation Mode Load concurrency, on the other
ously include other categories as well as A Web service may or may not support hand, affects system resources
extend the ones shown in the Figure 4). the notion of a user. More generically, required to keep the data associated
it may be stateful or stateless. Your with logged-in users or other stateful
Type of Use decision to use either virtual user or entities such as session object in mem-
Depending on the type of deployment, request per second emulation mode ory, open connections, or used disk
your Web services can be exposed to should be based on this criteria. For space. A concurrent load of appropri-
various types of SOAP clients. These example, the load of a stateless search ate intensity could expose synchro-
clients may produce unexpected, erro- engine exposed as a Web service is best nization errors in your Web service
neous, and even malicious requests. expressed in terms of a number of application. You can control the ratio
Your load test scenarios should requests per second because the between load intensity and concurren-
include profiles that emulate such notion of a virtual user is not well- cy by changing the virtual user think
users. The more your Web service is defined in this case. A counter exam- time in your load test tool.
exposed to the outside world (as ple of a stateful Web service is one that
opposed to being for internal con- supports customer login, such as a tick- Content Type
sumption), the greater the probability et reservation service. In this context, When load testing Web services, it is
of non-regular usage. The misuse and it makes more sense to use virtual user easy to overlook the fact that SOAP
malicious use categories may include emulation mode. clients may periodically refresh the
invalid SOAP requests as well as valid If your service is stateless and you WSDL, which describes the service, to
requests with unusual or unexpected have chosen the request per second get updates of the service parameters it
values of request sizes. approach, make sure that you select a is about to invoke. The probability of
For example, if your service uses an test tool, which supports this mode. If such updates may vary depending on
array of complex types, examine your a load test tool can sustain only the the circumstances. Some SOAP clients
WSDL and create load test scenarios scheduled number of users, the effec- refresh the WSDL every time they make
that emulate requests with expected, tive request injection rate may vary a call. The test team can analyze access
average, and maximum possible ele- substantially based on server response logs or make reasonable predictions
ment counts, as well as element counts times. Such a tool will not be able to based on the nature of the service.
that exceed the allowed maximum. accurately emulate the desired request If the WSDL access factor (the
sequence. If the number of users is probability of WSDL access per service
<xsd:complexType name=”IntArray”>
<xsd:sequence>
constant, the request injection rate invocation) is high and WSDL size is
<xsd:element name=”arg” type=”xsd:int” will be inversely proportionate to the compatible with the combined average
maxOccurs=”100”/> server processing time. It will also be size of request and response, then net-
</xsd:sequence>
</xsd:complexType> likely to fluctuate, sometimes dramati- work utilization will be noticeably

MARCH 2009 www.stpcollaborative.com • 31


SPARE PERFORMANCE

return to normal after the load has


TABLE 1: SCORE SHEET
been reduced to the average. If the
Test Resource under Max. resource System behavior System behavior application does not crash under stress,
stress utilization under under stress after stress load is verify that the resources utilized during
stress removed the stress have been released. A com-
1. Application Server 98% Response time Returned to prehensive performance-testing plan
CPU increased to 3 sec. normal performance will also include an endurance test that
on average. Timeouts - success verifies the application’s ability to run
in 10% of requests.
for hours or days, and could reveal slow
2. App. Server thread 100% - all Request timeouts Up to 50% errors resource leaks that are not noticeable
pool threads busy followed by OutOf after stress load is during regular tests. Slow memory leaks
MemoryError(s) removed - failure are among the most common. If they
printed in sever
console. are present in a Java environment, these
leaks could lead to a java.lang.OutOf
3. App. Server 100% - running Connection refused Returned to MemoryError and the crash of the
Network out of sockets in 40% of requests. normal performance
application server instance.
connections - success

Creating a Real-World Value Mix


higher in this scenario, as compared to of the system which were not stressed To better verify the robustness of your
the one without the WSDL refresh. If during the test will not experience Web service, you should use your load
your Web services WSDLs are generat- resource starvation in the production test tool to generate a wide variety of val-
ed dynamically, the high WSDL access environment? ues inside SOAP requests. This mix can
factor will affect server utilization as For instance, performance tests on be achieved, for example, by using mul-
well. On the other hand, if your WSDLs the staging environment revealed that tiple value data sources (such as spread-
are static, you can offload your applica- the application bottleneck was the CPU sheets or databases), or by having the
tion server by moving the WSDL files to of the database server. values of the desired range
a separate Web server optimized for However, you know that dynamically generated


serving static pages. Such a move will you have a high perform- (scripted) and then passed
create increased capacity for processing ance database server clus- to virtual users that simu-
Web service requests. ter in production. In this late SOAP clients. By using
case, it is likely that the this approach in load tests
Type of Load production system bottle- Depending on the of sufficient duration and
To ensure that your Web services appli- neck will be somewhere intensity, you can test your
cation can handle the challenges it will
face once it is deployed in production,
else and the system will
respond differently under
circumstances, it Web service with an
extended range and mix
you test its performance with various
load intensities and durations.
stress. In such a situation,
it would make sense to
may be advisable of argument values that
will augment your func-
Performance requirement specifica- change the parts of your tional testing.
tions should include metrics for both database access code with
to run the mixed Depending on the cir-
expected average and peak loads. code stubs that emulate cumstances, it may be advis-
After you run average and peak load access to the database.
request load test able to run the mixed
scenarios, conduct a stress test. A stress The system bottleneck request load test after all
test should reveal the Web services will shift to some other after all known known concurrency issues
application’s behavior under extreme resource, and the test will have been resolved. If
circumstances, which would cause better emulate produc- concurrency errors start occurring after
your application to start running out tion system behavior. the variable request mix
of resources, such as database connec- Applying this code issues have been has been introduced,
tions or disk space. Your application stubbing approach to inspect error details and
should not crash under this stress. other parts of the system resolved. create functional tests using
It is important to keep in mind that (as described above) will the values that caused your


simply pushing the load generator allow you to shift bottle- Web service to fail during
throttle to the floor is not enough to necks to the parts of the load testing. These newly-
thoroughly stress tests a Web services system that you want to created functional tests
application. Be explicit in what part of put under stress and thus should become part of your
the system you are stressing. While more thoroughly test functional test suite.
some parts of the system may be run- your application. Keeping a table of sys- By implementing automated per-
ning out of resources, others may be tem behavior under stress, as shown in formance testing process in your soft-
comfortably underutilized. Ask your- Table 1, will help you approach stress ware development organization, you
self: When this application is deployed testing in a more systematic manner. can reduce the number and severity of
in the production environment, will Performance degradation—even potential performance problems in
the resource utilization profile be the dramatic degradation—is acceptable in your Web services application and
same? Can you be sure that the parts this context, but the application should improve its overall quality. ý

32 • Software Test & Performance MARCH 2009


Best Practices

Garbage Time is Not Just


For Basketball
In sports, “garbage time” is managed by anything you ance of subsequent builds becomes
when bench players are sent can grasp in Java, because degraded. “With this process change, it is
in for the last few minutes of the GC has removed the allowing this customer to deliver quality
a blowout, long after the objects, but which are still applications with higher performance
final outcome has become held onto by the underlying levels than before,” Seif says.
obvious. Java is different: OS. It’s not a big deal for an Another common mistake Seif sees is
garbage time is essential for app that runs for 10 minutes that in test-driven development where
success. The problem is that and then has its JVM termi- unit test are run for measuring perform-
the very design of the lan- nated, but for a Web server ance, the percentage of code that actual-
guage often lulls developers app designed to run unin- ly gets run is unknown. “An app may
into a sense of false comfort, terrupted for months or appear to run fine from a performance
Joel Shore
says Gwyn Fisher, chief tech- years, it can become a huge or functional standpoint, but if you
nical officer at source code analysis tools resource and performance drain. haven’t exercised every line of code you
maker Klocwork. “You can get smart dev- Derrek Seif, a product manager at can never be completely sure.”
elopers who, all of a sudden, stop think- Quest Software who focuses on perform- Rich Sharples, director of product
ing like developers,” he says. “All of the ance, is in the same arena, believing that management for Red Hat’s JBoss App-
lessons they’ve spent years learning, from inefficient code often bungles memory lication Platforms and Developer Tools
good programming practices in C++ all allocation and release with results that division, certainly agrees with extensive
the way back to assembler, get thrown out are positively deadly. But there’s more to testing, but says it can be done smartly.
the window because now they’re working it, he says. Like all of us, Seif often sees “Running tests and doing performance
in a managed environment.” applications that undergo teardown, tuning are big investments. To be effec-
Blame much of it on GC, the Java redesign, and rebuild as business require- tive with your budget, you have to
garbage collector, Fisher says. ments change. They might work perfect- understand what level of investment is
The problem, as Fisher sees it, is that ly in testing, but slow to a crawl once right.” At one end of the spectrum, fix-
Java does such a good job in many areas released into the real world. Customers ing a problem on a satellite is a place
that its “gotchas” tend to get glossed over. get unhappy very quickly. where you can’t overinvest in quality,
And GC is a gotcha. He says garbage col- The problem is relying on a reactive but a static Web site phone directory or
lection is a myth that in reality is “just ter- methodology to fix these issues rather discussion forum with an occasional
rible in many ways” because of this false than being more proactive in upfront crash and restart, though undesirable, is
sense of security. That means the test staff design and understanding performance not exactly critical.
must be extra-vigilant when it comes to metrics. Easier said than done, says Seif, Modeling the environment is key. “You
understanding what’s really going on since that requires development process can’t replicate the Web tier of a Fortune
under the hood. Because the GC looks reengineering. “Testing, in terms of per- 500 e-commerce site; you don’t have thou-
after memory, programmers tend to formance often gets squeezed to the end, sands of servers sitting around,” Sharples
assume that anything associated with due to additional pressures of other says. “The only solution is modeling, but
memory objects being cleaned up the by aspects of the project,” he says. “There’s that always involves some risk, especially
the GC is also going to be managed by never enough time to make a fix before when things scale up.” Scaling is non-lin-
the GC. But that simply isn’t the case. release, but since it does work, it’s usually ear;at some point the capacity of the net-
As an example, Fisher cites an object ‘we’ll get it out and fix it later.’ ” work may become the bottleneck, but if
that encapsulates a socket – a physical One way to mitigate the performance you didn’t model for this you won’t know.
instantiation of a network endpoint. That problem is through automation. By using “Making the wrong assumption will cause
encapsulating object gets cleaned up by a profiler’s automation capabilities it’s problems later on.” ý
the runtime when it goes out of scope, possible to perform unit tests and estab-
but the underlying operating system lish baseline performance metrics. From Joel Shore is a 20-year industry veteran and
resource, the socket itself, does not get then on, as changes are made, historical has authored numerous books on personal
computing. He owns and operates Reference
cleaned up because the GC has no idea data from that separate build is used as a Guide, a technical product reviewing and docu-
what it is. The result over time is a grow- comparator. This simplifies the task of mentation consultancy in Southboro, Mass.
ing array of things that are no longer zeroing in problem areas when perform-

MARCH 2009 www.stpcollaborative.com • 33


Future
Future Test
Test

Testing’s Future Is In
Challenges,Opportunities
And the Internet
Software stakeholders will lamps. But it eventually software is the Internet, and the expo-
never stop guessing what proved to be among the nential growth of networked and
will the future look like. most important inventions mobile devices to be found there.
Just as farmers monitor in histor y. The same Among the major challenges is the
the weather during the applies to the software management of an ever-larger quantity
rainy season, tr ying to industry. The next great of addresses, users and devices with
benefit from opportuni- idea might not be what the finite number of IP address avail-
ties and prevent disaster, every one is looking for at able. With this problem an opportuni-
software departments will the moment, but once ty exists for some clever software devel-
continue to monitor the available, oper to come along
health of their applica- becomes and solve.
Murtada Elfahal
tions. as indis- If I could pick just
Without some kind of time
machine, the only way to see glimpses
of the future is to look at what is hap-
pening in the present.
pensible as the tele-
phone.
Also affecting the
software industr y’s
• a single word to
describe the future of
the software industry,
it would be “change.”
Present behavior also shapes the
future of the software industry. One
future are the prob-
lems and challenges
The future of While most changes
can be measured only
such factor is a change imposed by of the present, includ- by comparing them to
those observing software stakeholders ing those of our daily the software the past, few could
themselves. When people pay atten- lives, which some peo- have imagined 20
tion to the future, one might say they ple define as the years ago what might
can end up creating it—driving the opportunities. Bill industry can have been possible by
change or getting involved when an Hetzel author of “The interconnecting com-
opportunity shows. They can improve
things or prevent transformation, sim-
Complete Guide to
Software Testing,”
be described in puters and networks
throughout the world.
ilar perhaps to the fictional accounts (Wiley, 1993), wrote This young industr y
of time travel, in which a small change that any line of code one word: has grown incredibly
a month ago affects many things today. is written to solve a fast, and just as quick-
The future of the industry is not
just defined by big manufacturers.
problem. Therefore,
according to this
change. ly has invaded all areas
of human life. Just as
Amateurs also play a role. Notable hypothesis, wherever unrecognizable as the
examples include the Harvard drop- you find software, Web is now from that
out who went on to create the world’s
biggest software company, or the two
young men whose search algorithm
revolutionized the Internet. The
there is a problem
that needs to be
solved. So perhaps
the reverse is also
• of 20 years ago, we will
scarcely recognize it
20 years from now.
The challenges and
future is generally defined by those true: Wherever you opportunities associat-
who can find the next great idea, one find a problem, there ed with these changes
that might totally change the direction could be software written to solve. The will be available for those who are
of an industry. When Thomas Addison greater the challenge, the greater the ready to benefit from them. ý
invented the light bulb, the world did- opportunity.
Murtada Elfahal is a test engineer at SAP.
n’t immediately replace their gas Another factor driving the future of

34 • Software Test & Performance MARCH 2009

También podría gustarte