Documentos de Académico
Documentos de Profesional
Documentos de Cultura
WTF is my server doing?
September 21st, 2015
Introduction to Erik Benner
Erik Benner
Published Author
Enterprise Architect RAC Attack Ninja
ebenner@mythics.com Linux since 1992
Solaris since 1996
DB 12c BETA user
Prelaunch ODA “comet”
First Version of Oracle…7 in 1994
@erik_benner ZFS since “Thumper”
TalesFromTheDatacenter.com OEM 12c since Product Launch
Mythics.com/blog OAUG EM for Apps SIG co-chair
OEM12c CAB Member
2 IOUG Solaris SIG Leader
What would you say if you
were asked:
How busy is that system?
A: I have no idea…
A: 10%
A: Why do you want to know?
A: I’m sorry, you don’t understand your question….
What is system performance?
• In a broader sense, system performance
refers to how well the computer resources
accomplish the work they are designed to
do. The performance of any computer
system may be defined by two criteria:
Response time
Throughput
Performance Tuning
• Performance tuning is a process of
observing the operations of a system and
making adjustments to different
components based on those observations.
Key factors
• Hardware
• Operating system
• Application software
• Users
• Changes over time
Managing system performance
• Monitoring usage of system resources
• Selecting tools to measure system performance
• Diagnosing problems from the results of
measurement
• Tuning the operating system and application
parameters
• Upgrading the hardware resources of the
system
• Planning for the optimal performance
Bottlenecks
• A resource is a bottleneck, if the size of a
request exceeds the available resource. In
other words, a bottleneck is a limitation of
system performance due to the
inadequacy of a hardware or software
component, or of the system’s
organization.
Two ways to solve a bottleneck
• increasing the size of available resource
• decreasing the size of the request
Guidelines in tuning a real system
Watch out for sd_max_throttle limiting throughput when set too low
Watch out for RAID cache being flooded on writes, causes sudden very large
increase in write service time
• The main thing to understand about the sar is that, it reports all
activities over a period if time. So, make sure that sar is enabled all
the time, not just on yout Lunch break and vacations.
[root@hol sa]# sar
Linux 3.8.13-98.1.2.el6uek.x86_64 (hol) 09/22/2015 _x86_64_ (1 CPU)
DEMO!
Recipe to fix a slow system
• Essential Background Information
– What is the business function of the system?
– Who and where are the users?
– Who says there is a problem, and what is slow?
– What changed recently and what is on the way?
• What is the system configuration?
– CPU/RAM/Disk/Net/OS/Patches, what application software is in use?
• What are the busy processes on the system doing?
– use top, prstat, pea.se or /usr/ucb/ps uax | head
• Report CPU and disk utilization levels, iostat -xPncezM -T d 30
– What is making the disks busy?
• What is the network name service configuration?
– How much network activity is there? Use netstat -i 30 or nx.se 30
• Is there enough memory?
– Check free memory and the scan rate with vmstat 30
Variable Clock Rate CPUs
• Laptop and other low power devices do this all the time
– Watch CPU usage of a video application and toggle mains/battery power….
• Server CPU Power Optimization - AMD PowerNow!™
– AMD Opteron server CPU detects overall utilization and reduces clock rate
– Actual speeds vary, but for example could reduce from 2.6GHz to 1.2GHz
– Changes are not understood or reported by operating system metrics
– Speed changes can occur every few milliseconds (thermal shock issues)
– Dual core speed varies per socket, Quad core varies per core
– Quad core can dynamically stop entire cores to save power
• Possible scenario:
– You estimate 20% utilization at 2.6GHz
– You see 45% reported in practice (at 1.2GHz)
– Load doubles, reported utilization drops to 40% (at 2.6GHz)
– Actual mapping of utilization to clock rate is unknown at this point
• Note: Older and "low power" Opterons used in blades fix clock rate
23