Documentos de Académico
Documentos de Profesional
Documentos de Cultura
!rocess place#ent
Me#ory place#ent
)nterrupt place#ent
KAU! WEP "#$% & Imed Chihi *
!he case of HPC
!he case of HPC
on commodit'
on commodit'
platforms
platforms
KAU! WEP "#$% & Imed Chihi 4
HPC on commodit' hard(are
!rocessor ainity
!rocess #igrations
taskset
sched_setaffinity()
!rocess priority
nice
#
U
M
$
n
o
d
e
2
#
U
M
$
n
o
d
e
3
KAU! WEP "#$% & Imed Chihi 1*
2emor' management
Huge pages
;verco##it
)R<1to1"!: ainity
ir%balance
[ksoftirqd/X] .ernel
threads
Multi1%ueue net-or.ing
=ero1copy );
;loading engines
KAU! WEP "#$% & Imed Chihi 13
)#ed "hihi
)#ed "hihi
http>??people9redhat9co#?ichihi?p?
http>??people9redhat9co#?ichihi?p?
ichihi@redhat.com
ichihi@redhat.com
Red Hat Global Support Services
Red Hat Global Support Services
19
19
http>??---9redhat9co#?training?courses?rh442?
http>??---9redhat9co#?training?courses?rh442?
29 Red Hat &nterprise 'inu( 4 @ !eror#ance Tuning Guide
29 Red Hat &nterprise 'inu( 4 @ !eror#ance Tuning Guide
*9 Red Hat Su##it 201*> !eror#ance Analysis and Tuning
*9 Red Hat Su##it 201*> !eror#ance Analysis and Tuning
o Red Hat &nterprise 'inu(
o Red Hat &nterprise 'inu(
Running
High Performance Computing
Workloads on
Red Hat Enterprise Linux
Imed Chihi
Senior Technical Account Manager
Red Hat Global Support Services
21 January 2014
%alaam and &ood mornin&' My name is (med C)i)i and ( am a %enior *ec)nical $ccount Mana&er at +ed
,at' ( am part o" t)e %upport and En&ineerin& or&anisation wit)in t)e company'
KAU! WEP "#$% & Imed Chihi 2
Agenda
!rocess place#ent
Me#ory place#ent
)nterrupt place#ent
*)e purpose o" today-s presentation is to talk a.out t)e use o" +ed ,at Enterprise /inux in ,PC
environments and t)e common tunin& areas to w)ic) t)e ,PC user and administrator needs to pay
attention'
KAU! WEP "#$% & Imed Chihi *
!he case of HPC !he case of HPC
on commodit' on commodit'
platforms platforms
KAU! WEP "#$% & Imed Chihi 4
HPC on commodit' hard(are
!rocessor ainity
!rocess #igrations
taskset
sched_setaffinity()
!rocess priority
nice
#
U
M
$
n
o
d
e
2
#
U
M
$
n
o
d
e
3
Memory mana&ement is o"ten t)e most intricate part o" an operatin& system' (t is very di""icult to implement a virtual memory
mana&er w)ic) can work properly .ot) on a sin&le CPU wit) 82M5 o" +$M and on 82 CPUs wit) 2*5 o" +$M'
*)e traditional PC arc)itecture uses a linear memory )ardware w)ic) can .e accessed "rom all CPUs at t)e same cost> accessin&
a &iven memory location takes t)e same time re&ardless "rom w)ic) CPU t)e access takes place'
,owever0 t)is arc)itecture model does not scale to matc) t)e reBuirements o" t)e modern plat"orms w)ic) tend to )ave tens o"
CPUs and )undreds o" &i&a.ytes o" memory' *)ere"ore0 modern servers are .uilt around a #on Uni"orm Memory $ccess model
w)ere t)e system is comprised o" multiple &roupin&s o" memory modules and CPUs> t)ose &roupin&s are called 4#UM$ nodes6'
=n t)ose models0 access to a memory location "rom CPU! takes muc) less time t)an "rom CPU' *)e #UM$ arc)itecture can .e
viewed wit) numactl as in>
# numactl --hard#are
a$aila%le& ' nodes ((-))
node ( c*us& ( " 2 ) ' + 2' 2+ 2, 2- 2. 2!
node ( si/e& ,++2" 01
node ( free& ,2"(, 01
node " c*us& , - . ! "( "" )( )" )2 )) )' )+
node " si/e& ,++), 01
node " free& ,2!-- 01
node 2 c*us& "2 ") "' "+ ", "- ), )- ). )! '( '"
node 2 si/e& ,++), 01
node 2 free& ,)'+) 01
node ) c*us& ". "! 2( 2" 22 2) '2 ') '' '+ ', '-
node ) si/e& ,++), 01
node ) free& ,)(2. 01
node distances&
node ( " 2 )
(& "( 2" 2" 2"
"& 2" "( 2" 2"
2& 2" 2" "( 2"
)& 2" 2" 2" "(
Just like process mi&ration t)reads exist to move a process to a di""erent CPU0 recent kernels implement #UM$ pa&e mi&ration in
order to 4move6 memory allocated to a process to a #UM$ node 4closer6 to w)ere t)e process is runnin&'
KAU! WEP "#$% & Imed Chihi 1*
2emor' management
Huge pages
;verco##it
)R<1to1"!: ainity
ir%balance
[ksoftirqd/X] .ernel
threads
Multi1%ueue net-or.ing
=ero1copy );
;loading engines
(nterrupts are async)ronous events w)ic) need to .e processed .y CPUs' *)ose are async)ronous
.ecause t)ey are not initiated .y t)e user and t)eir timin& cannot .e controlled' (nterrupts are t)e main
met)od o" communicatin& wit) external devices0 namely networkin& and stora&e'
7it) )i&) speed network inter"aces at !A.E and G!A.E or "i.re c)annel links at IA.ps per port0 t)e
num.er o" interrupts could reBuire a )u&e processin& power "rom CPUs' *)ere"ore0 t)e assi&nment o"
interrupts to CPUs could .e tuned "or optimal processin&' *)e irB.alance service could .e used to
distri.ute t)e interrupts load amon& processors' ,owever0 t)is may not .e t)e optimal c)oice "or certain
workloads'
(nterrupt )andlin& is actually done in two p)ases> a "irst sync)ronous p)ase w)ere t)e CPU receives t)e
interrupt and acknowled&es it0 t)en is sc)edules t)e remainder o" t)e processin& to .e completed later on0
t)is needs to .e done t)e moment t)e interrupt is raised and wit)out delay ot)erwise0 packet loss could
occur' $ second p)ase is processed .y kernel t)reads called Ckso"tirBdD?E async)ronously' *)ose t)reads
are sc)eduled ;ust like any ot)er process as t)ey are not under time constraints'
Modern network devices and device drivers are capa.le o" deliverin& incomin& packets to multiple receive
Bueues' *)is allows "or multiple processors to pick and process packets in parallel .ecause receive
Bueues can only .e accessed under a CPU lock'
$not)er more common optimisation is o""loadin& en&ines w)ic) are )ardware1implemented processin& on
network tra""ic' $ctions like packet re1assem.ly or c)ecksum calculation w)ic) are usually done .y t)e
CPU would .e o""loaded to .e processed .y t)e network inter"ace'
*)e Unix pro&rammin& model expects t)at transmission and reception o" data "rom network or stora&e
inter"aces is done wit) two copies> t)e kernel copies t)e data "rom user space to a .u""er in kernel space0
t)en "rom t)is kernel .u""er to t)e transmission device' *)is dou.le1copy )as pla&ued t)e per"ormance o"
)i&) demandin& applications on /inuxDUnix especially t)at CPU speed and network speed )as &rown muc)
"aster t)an memory speed w)ic) mostly sta&nated over t)e past 2! years' Jero copy is a mec)anism
w)ic) permits direct transmission "rom user .u""ers directly to t)e )ardware w)ic) improves per"ormance0
)owever0 t)ere are still no standard and common inter"aces to do t)is and it still reBuires some )ackin& to
.e implemented'
KAU! WEP "#$% & Imed Chihi 13
)#ed "hihi )#ed "hihi
http>??people9redhat9co#?ichihi?p? http>??people9redhat9co#?ichihi?p?
ichihi@redhat.com ichihi@redhat.com
Red Hat Global Support Services Red Hat Global Support Services
19 19 http>??---9redhat9co#?training?courses?rh442? http>??---9redhat9co#?training?courses?rh442?
29 Red Hat &nterprise 'inu( 4 @ !eror#ance Tuning Guide 29 Red Hat &nterprise 'inu( 4 @ !eror#ance Tuning Guide
*9 Red Hat Su##it 201*> !eror#ance Analysis and Tuning *9 Red Hat Su##it 201*> !eror#ance Analysis and Tuning
o Red Hat &nterprise 'inu( o Red Hat &nterprise 'inu(