Documentos de Académico
Documentos de Profesional
Documentos de Cultura
TheARMInstructionSetArchitecture
MarkMcDermott WithhelpfromourgoodfriendsatARM Fall2008
8/22/2008
MainfeaturesoftheARMInstructionSet
Allinstructionsare32bitslong. Mostinstructionsexecuteinasinglecycle. Mostinstructionscanbeconditionallyexecuted. Aload/storearchitecture
Dataprocessinginstructionsactonlyonregisters
Threeoperandformat CombinedALUandshifterforhighspeedbitmanipulation
Specificmemoryaccessinstructionswithpowerfulautoindexingaddressing modes.
32bitand8bitdatatypes
andalso16bitdatatypesonARMArchitecturev4.
Flexiblemultipleregisterloadandstoreinstructions
Instructionsetextensionviacoprocessors Verydense16bitcompressedinstructionset(Thumb)
8/22/2008 2
Coprocessors
Upto16 coprocessorscanbedefined ExpandstheARMinstructionset Eachcoprocessorcanhaveupto16privateregistersofanyreasonablesize Loadstorearchitecture
Thumb
Thumbisa16bitinstructionset
OptimizedforcodedensityfromCcode Improvedperformanceformnarrowmemory SubsetofthefunctionalityoftheARMinstructionset
Corehastwoexecutionstates ARMandThumb
SwitchbetweenthemusingBXinstruction
Thumbhascharacteristicfeatures:
MostThumbinstructionareexecutedunconditionally ManyThumbdataprocessinstructionusea2addressformat ThumbinstructionformatsarelessregularthanARMinstructionformats,as aresultofthedenseencoding.
ProcessorModes
TheARMhassixoperatingmodes:
User(unprivilegedmodeunderwhichmosttasksrun) FIQ(enteredwhenahighpriority(fast)interruptisraised) IRQ(enteredwhenalowpriority(normal)interruptisraised) Supervisor(enteredonresetandwhenaSoftwareInterruptinstructionis executed) Abort(usedtohandlememoryaccessviolations) Undef(usedtohandleundefinedinstructions)
ARMArchitectureVersion4addsaseventhmode:
System(privilegedmodeusingthesameregistersasusermode)
8/22/2008
TheRegisters
ARMhas37registersintotal,allofwhichare32bitslong.
1dedicatedprogramcounter 1dedicatedcurrentprogramstatusregister 5dedicatedsavedprogramstatusregisters 30generalpurposeregisters
Andprivilegedmodescanalsoaccess
aparticularspsr(savedprogramstatusregister)
8/22/2008
TheARMRegisterSet
Current Visible Registers
Abort Undef Mode SVC IRQ Mode FIQ Mode User Mode
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr spsr
FIQ
r8 r9 r10 r11 r12 r13 (sp) r14 (lr)
IRQ
SVC
Undef
Abort
spsr
spsr
spsr
spsr
spsr
8/22/2008
RegisterOrganizationSummary
User
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr spsr spsr spsr spsr spsr
FIQ
IRQ
SVC
Undef
Abort
AccessingRegistersusingARMInstructions
Nobreakdownofcurrentlyaccessibleregisters.
Allinstructionscanaccessr0r14directly. MostinstructionsalsoallowuseofthePC.
8/22/2008
TheProgramStatusRegisters(CPSRandSPSRs)
31 28 8 4 0
N Z CV
I F T
Mode
CopiesoftheALUstatusflags(latchedifthe instructionhasthe"S"bitset).
8/22/2008
10
ConditionFlags
LogicalInstruction Flag Negative (N=1) Zero (Z=1) Carry (C=1) oVerflow (V=1) Nomeaning
ArithmeticInstruction
8/22/2008
11
TheProgramCounter(R15)
WhentheprocessorisexecutinginARMstate:
Allinstructionsare32bitsinlength Allinstructionsmustbewordaligned ThereforethePCvalueisstoredinbits[31:2]withbits[1:0]equaltozero(as instructioncannotbehalfwordorbytealigned).
or
MOVpc,lr
8/22/2008
12
ExceptionHandlingandtheVectorTable
Whenanexceptionoccurs,thecore:
CopiesCPSRintoSPSR_<mode> SetsappropriateCPSRbits
IfcoreimplementsARMArchitecture4Tandis currentlyinThumbstate,then
ARMstateisentered.
Modefieldbits Interruptdisableflagsifappropriate.
Toreturn,exceptionhandlerneedsto:
RestoreCPSRfromSPSR_<mode> RestorePCfromLR_<mode>
8/22/2008
13
TheOriginalInstructionPipeline
TheARMusesapipelineinordertoincreasethespeedofthe flowofinstructionstotheprocessor.
Allowsseveraloperationstobeundertakensimultaneously,ratherthan serially.
PC FETCH Instruction fetched from memory
PC - 4
DECODE
PC - 8
EXECUTE
Register(s) read from Register Bank Shift and ALU operation Write register(s) back to Register Bank
Ratherthanpointingtotheinstructionbeingexecuted,thePC pointstotheinstructionbeingfetched.
8/22/2008 14
PipelinechangesforARM9TDMI
ARM7TDMI
Instruction Fetch ThumbARM decompress ARM decode Reg Select Reg Shift Read ALU Reg Write
FETCH
DECODE
EXECUTE
ARM9TDMI
Instruction Fetch ARM or Thumb Inst Decode Reg Reg Decode Read Memory Access Reg Write
Shift + ALU
FETCH
DECODE
EXECUTE
MEMORY
WRITE
PipelinechangesforARM10vs.ARM11Pipelines
ARM10
Branch Prediction Instruction Fetch ARM or Thumb Instruction Decode Reg Read Shift + ALU Memory Access Multiply Add Reg Write
Multiply
FETCH
ISSUE
DECODE
EXECUTE
MEMORY
WRITE
ARM11
Shift ALU Saturate
Fetch 1
Fetch 2
Decode
Issue
MAC 1
MAC 2
Write back
ARMInstructionSetFormat
3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 9 8 7 6 5 4 3 2 1 0
InstructionType
Dataprocessing
Condition Condition Condition Condition Condition Condition Condition Condition Condition Condition Condition Condition Condition Condition
0 0 0 0 0 1 0 0 1 0 1 1
0 0 0 0 1 0 0 0 0 0 1 1
I 0 0 0 I 0 0 0 1 0 0 1 0 0 1
OPCODE 0 0 A
S S S 0
Rn Rd RdHIGH Rn Rn Rn Rn Rn
Rs Rn Rd LOW Rd Rd 0 0 Rs Rs 0 0
OPERAND2 1 1 1 0 0 0 0 0 0 1 1 1 Rm Rm Rm
1 U A 0 B 0
P U B W L P U B W L P U 1 W L P U 0 W L L 1 0 0 1 0 1
OFFSET REGISTERLIST
Rd Rd 0
OFFSET1 0 0 0
1 1
S H 1 S H 1
OFFSET2 Rm
P U N W L 0 Op1 OP1 L
Rn CRn CRn
CRd CRd Rd
SWI NUMBER
8/22/2008
17
ConditionalExecution
Mostinstructionsetsonlyallowbranchestobeexecuted conditionally. Howeverbyreusingtheconditionevaluationhardware,ARM effectivelyincreasesnumberofinstructions.
AllinstructionscontainaconditionfieldwhichdetermineswhethertheCPU willexecutethem. Nonexecutedinstructionsconsume1cycle.
CantcollapsetheinstructionlikeaNOP.Stillhavetocompletecyclesoastoallow fetchinganddecodingofthefollowinginstructions.
Thisremovestheneedformanybranches,whichstallthe pipeline(3cyclestorefill).
Allowsverydenseinlinecode,withoutbranches. TheTimepenaltyofnotexecutingseveralconditionalinstructionsis frequentlylessthanoverheadofthebranch orsubroutinecallthatwouldotherwisebeneeded.
8/22/2008 18
TheConditionField
3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 9 8 7 6 5 4 3 2 1 0
InstructionType
Dataprocessing
Condition
OPCODE
Rn
Rs
OPERAND2
0000 = EQ - Z set (equal) 0001 = NE - Z clear (not equal) 0010 = HS / CS - C set (unsigned higher or same) 0011 = LO / CC - C clear (unsigned lower) 0100 = MI -N set (negative) 0101 = PL - N clear (positive or zero) 0110 = VS - V set (overflow) 0111 = VC - V clear (no overflow) 1000 = HI - C set and Z clear (unsigned higher)
1001 = LS - C clear or Z (set unsigned lower or same) 1010 = GE - N set and V set, or N clear and V clear (>or =) 1011 = LT - N set and V clear, or N clear and V set (>) 1100 = GT - Z clear, and either N set and V set, or N clear and V set (>) 1101 = LE - Z set, or N set and V clear,or N clear and V set (<, or =) 1110 = AL - always 1111 = NV - reserved.
8/22/2008
19
UsingandupdatingtheConditionField
Toexecuteaninstructionconditionally,simplypostfixitwiththeappropriate condition:
Forexampleanaddinstructiontakestheform:
ADDr0,r1,r2 ;r0=r1+r2(ADDAL)
Toexecutethisonlyifthezeroflagisset:
ADDEQr0,r1,r2 ;Ifzeroflagsetthen ;...r0=r1+r2
8/22/2008
20
ConditionalExecutionandFlags
ARMinstructionscanbemadetoexecuteconditionallybypostfixingthemwiththe appropriateconditioncodefield. Thisimprovescodedensityand performancebyreducingthenumberofforward branchinstructions.
CMP BEQ ADD skip r3,#0 skip r0,r1,r2 CMP r3,#0 ADDNE r0,r1,r2
Bydefault,dataprocessinginstructionsdonotaffecttheconditioncodeflagsbutthe flagscanbeoptionallysetbyusingS.CMPdoesnotneedS. loop decrement r1 and set flags SUBS r1,r1,#1 BNE loop if Z flag clear then branch
8/22/2008
21
Branchinstructions(1)
Branch: BranchwithLink:
3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3
B{<cond>}label BL{<cond>}sub_routine_label
2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 9 8 7 6 5 4 3 2 1 0
Condition
BRANCH OFFSET
Linkbit
0=Branch 1=Branchwithlink
Conditionfield
Theoffsetforbranchinstructionsiscalculatedbytheassembler:
Bytakingthedifferencebetweenthebranchinstructionandthetargetaddress minus8(toallowforthepipeline). Thisgivesa26bitoffsetwhichisrightshifted2bits(asthebottomtwobitsare alwayszeroasinstructionsareword aligned)andstoredintotheinstruction encoding. Thisgivesarangeof 32Mbytes.
8/22/2008
22
Branchinstructions(2)
Whenexecutingtheinstruction,theprocessor:
shiftstheoffsetlefttwobits,signextendsitto32bits,andaddsittoPC.
Toreturnfromsubroutine,simplyneedtorestorethePCfrom theLR:
MOVpc,lr Again,pipelinehastorefillbeforeexecutioncontinues.
8/22/2008
23
Branchinstructions(3)
The"Branch"instructiondoesnotaffectLR. Note:Architecture4ToffersafurtherARMbranchinstruction,BX
SeeThumbInstructionSetModulefordetails.
BL<subroutine>
StoresreturnaddressinLR ReturningimplementedbyrestoringthePCfromLR Fornonleaffunctions,LRwillhavetobestacked func1
: : BLfunc1 : : STMFDsp!,{regs,lr} : BLfunc2 : LDMFDsp!,{regs,pc}
func2
: : : : : MOVpc,lr
8/22/2008
24
ConditionalBranches
Branch B BAL BEQ BNE BPL BMI BCC BLO BCS BHS BVC BVS BGT BGE BLT BLE BHI BLS Interpretation Unconditional Always Equal Notequal Plus Minus Carryclear Lower Carryset Higherorsame Overflowclear Overflowset Greaterthan Greaterorequal Lessthan Lessorequal Higher Lowerorsame Normaluses Alwaystakethisbranch Alwaystakethisbranch Comparisonequalorzeroresult Comparisonnotequalornonzeroresult Resultpositiveorzero Resultminusornegative Arithmeticoperationdidnotgivecarryout Unsignedcomparisongavelower Arithmeticoperationgavecarryout Unsignedcomparisongavehigherorsame Signedintegeroperation;nooverflowoccurred Signedintegeroperation;overflowoccurred Signedintegercomparisongavegreaterthan Signedintegercomparisongavegreaterorequal Signedintegercomparisongavelessthan Signedintegercomparisongavelessthanorequal Unsignedcomparisongavehigher Unsignedcomparisongavelowerorsame
8/22/2008
25
DataprocessingInstructions
LargestfamilyofARMinstructions,allsharingthesame instructionformat. Contains:
Arithmeticoperations Comparisons(noresults justsetconditioncodes) Logicaloperations Datamovementbetweenregisters
Remember,thisisaload/storearchitecture
Theseinstructiononlyworkonregisters,NOTmemory.
Theyeachperformaspecificoperationononeortwooperands.
Firstoperandalwaysaregister Rn SecondoperandsenttotheALUviabarrelshifter.
Wewillexaminethebarrelshiftershortly.
8/22/2008
26
ArithmeticOperations
Operationsare:
ADD ADC SUB SBC RSB RSC operand1+operand2 operand1+operand2+carry operand1 operand2 operand1 operand2+carry1 operand2 operand1 operand2 operand1+carry 1 ;Add ;Addwithcarry ;Subtract ;Subtractwithcarry ;Reversesubtract ;Reversesubtractwithcarry
Syntax:
<Operation>{<cond>}{S}Rd,Rn,Operand2
Examples
ADDr0,r1,r2 SUBGTr3,r3,#1 RSBLESr4,r5,#5
8/22/2008
27
Comparisons
Theonlyeffectofthecomparisonsistoupdatethecondition flags.ThusnoneedtosetSbit. Operationsare:
CMP CMN TST TEQ operand1 operand2 operand1+operand2 operand1ANDoperand2 operand1EORoperand2 ;Compare ;Comparenegative ;Test ;Testequivalence
Syntax:
<Operation>{<cond>}Rn,Operand2
Examples:
CMP TSTEQ r0,r1 r2,#5
8/22/2008
28
LogicalOperations
Operationsare:
AND operand1ANDoperand2 EOR operand1EORoperand2 ORR operand1ORoperand2 ORNoperand1NORoperand2 BIC operand1ANDNOToperand2[iebitclear]
Syntax:
<Operation>{<cond>}{S}Rd,Rn,Operand2
Examples:
AND r0,r1,r2 BICEQ r2,r3,#7 EORS r1,r3,r0
8/22/2008
29
DataMovement
Operationsare:
MOV operand2 MVN NOToperand2
Notethatthesemakenouseofoperand1. Syntax:
<Operation>{<cond>}{S}Rd,Operand2
Examples:
MOV MOVS MVNEQ r0,r1 r2,#10 r1,#0
8/22/2008
30
TheBarrelShifter
TheARMdoesnthaveactualshiftinstructions. Insteadithasabarrelshifterwhichprovidesamechanismto carryoutshiftsaspartofotherinstructions. Sowhatoperationsdoesthebarrelshiftersupport?
8/22/2008
31
BarrelShifter LeftShift
Shiftsleftbythespecifiedamount(multipliesbypowersoftwo) e.g.
LSL#5=>multiplyby32
LogicalShiftLeft(LSL)
CF
Destination
8/22/2008
32
BarrelShifter RightShifts
LogicalShiftRight(LSR) Shiftsrightbythespecified amount(dividesbypowersof two)e.g. LSR#5=divideby32
LogicalShiftRight ...0
Destination
CF
zeroshiftedin
ArithmeticShiftRight
ArithmeticShiftRight(ASR) Shiftsright(dividesbypowersof two)andpreservesthesignbit, for2'scomplementoperations. e.g. ASR#5=divideby32
Destination
Signbitshiftedin
CF
8/22/2008
33
BarrelShifter Rotations
RotateRight(ROR) SimilartoanASRbutthebits wraparoundastheyleavethe LSBandappearastheMSB. e.g.ROR#5 Notethelastbitrotatedisalso usedastheCarryOut. RotateRightExtended(RRX) ThisoperationusestheCPSRC flagasa33rdbit. Rotatesrightby1bit.Encoded asROR#0
RotateRight
Destination
CF
RotateRightthroughCarry
Destination
CF
8/22/2008
34
UsingtheBarrelShifter:TheSecondOperand
Operand 1 Operand 2 Barrel Shifter
Register,optionallywithshift operationapplied. Shiftvaluecanbeeitherbe: 5bitunsignedinteger Specifiedinbottombyteof anotherregister.
ALU
Result
8/22/2008 35
SecondOperand:ShiftedRegister
Theamountbywhichtheregisteristobeshiftediscontainedin either:
theimmediate5bitfieldintheinstruction
NOOVERHEAD Shiftisdoneforfree executesinsinglecycle.
thebottombyteofaregister(notPC)
Thentakesextracycletoexecute ARMdoesnthaveenoughreadportstoread3registersatonce. Thensameasonotherprocessorswhereshiftis separateinstruction.
Ifnoshiftisspecifiedthenadefaultshiftisapplied:LSL#0
i.e.barrelshifterhasnoeffectonvalueinregister.
8/22/2008
36
SecondOperand:UsingaShiftedRegister
Usingamultiplicationinstructiontomultiplybyaconstantmeansfirstloading theconstantintoaregisterandthenwaitinganumberofinternalcyclesfor theinstructiontocomplete. Amoreoptimumsolutioncanoftenbefoundbyusingsomecombinationof MOVs,ADDs,SUBsandRSBswithshifts.
Multiplicationsbyaconstantequaltoa((powerof2) 1)canbedoneinonecycle. MOVR2,R0,LSL#2 ;ShiftR0leftby2,writetoR2,(R2=R0x4) ADDR9,R5,R5,LSL#3 ;R9=R5+R5x8orR9=R5x9 RSBR9,R5,R5,LSL#3 ;R9=R5x8 R5orR9=R5x7 SUBR10,R9,R8,LSR#4;R10=R9 R8/16 MOVR12,R4,RORR3 ;R12=R4rotatedrightbyvalueofR3
8/22/2008
37
SecondOperand:ImmediateValue(1)
Thereisnosingleinstructionwhichwillloada32bitimmediateconstantinto aregisterwithoutperformingadataloadfrommemory.
AllARMinstructionsare32bitslong ARMinstructionsdonotusetheinstructionstreamasdata.
Thedataprocessinginstructionformathas12bitsavailableforoperand2
Ifuseddirectlythiswouldonlygivearangeof4096.
8/22/2008
38
SecondOperand:ImmediateValue(2)
Thisgivesus:
0 255 256,260,264,..,1020 1024,1040,1056,..,4080 4096,4160,4224,..,16320 [0 0xff] [0x1000x3fc,step4,0x400xffror 30] [0x4000xff0,step16,0x400xffror 28] [0x10000x3fc0,step64,0x400xffror 26] ;=>MOVr0,#0x1000(ie4096)
Thesecanbeloadedusing,forexample:
MOVr0,#0x40,26
Tomakethiseasier,theassemblerwillconverttothisformforusifsimply giventherequiredconstant:
MOVr0,#4096 MOVr0,#0xFFFFFFFF ;=>MOVr0,#0x1000(ie0x40ror 26) ;assemblestoMVNr0,#0
8/22/2008
39
Loadingfull32bitconstants
AlthoughtheMOV/MVNmechanismwillloadalargerangeofconstantsintoa register,sometimesthismechanismwillnotgeneratetherequiredconstant. Therefore,theassembleralsoprovidesamethodwhichwillloadANY32bit constant:
LDRrd,=numericconstant
Asthismechanismwillalwaysgeneratethebestinstructionforagivencase,it istherecommendedwayofloadingconstants.
8/22/2008 40
MultiplicationInstructions
TheBasicARMprovidestwomultiplicationinstructions. Multiply
MUL{<cond>}{S}Rd,Rm,Rs ;Rd=Rm*Rs
MultiplyAccumulate Restrictionsonuse:
doesadditionforfree
;Rd=(Rm*Rs)+Rn
MLA{<cond>}{S}Rd,Rm,Rs,Rn
RdandRmcannotbethesameregister
CanbeavoidedbyswappingRmandRsaround.Thisworksbecausemultiplication iscommutative.
CannotusePC.
Thesewillbepickedupbytheassemblerifoverlooked. Operandscanbeconsideredsignedorunsigned
Uptousertointerpretcorrectly.
8/22/2008
41
MultiplicationImplementation
TheARMmakesuseofBoothsAlgorithmtoperforminteger multiplication. OnnonMARMsthisoperateson2bitsofRsatatime.
Foreachpairofbitsthistakes1cycle(plus1cycletostartwith). Howeverwhentherearenomore1sleftinRs,themultiplicationwillearly terminate.
Example:Multiply18and1:Rd=Rm*Rs
Rm Rs 17cycles 18 0000 0000 0000 0000 0000 0000 0001 0010 1 1111 1111 1111 1111 1111 1111 1111 1111 18 1 Rs Rm 4cycles
Note:Compilerdoesnotuseearlyterminationcriteriato decideonwhichordertoplaceoperands.
8/22/2008 42
ExtendedMultiplyInstructions
MvariantsofARMcorescontainextendedmultiplication hardware.Thisprovidesthreeenhancements:
An8bitBoothsAlgorithmisused
Multiplicationiscarriedoutfaster(maximumforstandardinstructionsisnow5 cycles).
Earlyterminationmethodimprovedsothatnowcompletesmultiplication whenallremainingbitsetscontain
allzeroes(aswithnonMARMs),or allones.
8/22/2008
43
MultiplyLong&MultiplyAccumulateLong
Instructionsare
MULLwhichgivesRdHi,RdLo:=Rm*Rs MLALwhichgivesRdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
Howeverthefull64bitoftheresultnowmatter(lowerprecision multiplyinstructionssimplythrowstop32bitsaway)
Needtospecifywhetheroperandsaresignedorunsigned
Thereforesyntaxofnewinstructionsare:
UMULL{<cond>}{S}RdLo,RdHi,Rm,Rs UMLAL{<cond>}{S}RdLo,RdHi,Rm,Rs SMULL{<cond>}{S}RdLo,RdHi,Rm,Rs SMLAL{<cond>}{S}RdLo,RdHi,Rm,Rs
Notgeneratedbythecompiler. Warning:UnpredictableonnonMARMs.
8/22/2008
44
Load/StoreInstructions
TheARMisaLoad/StoreArchitecture:
Doesnotsupportmemorytomemorydataprocessingoperations. Mustmovedatavaluesintoregistersbeforeusingthem.
Thismightsoundinefficient,butinpracticeitisnt:
Loaddatavaluesfrommemoryintoregisters. Processdatainregistersusinganumberofdataprocessinginstructions whicharenotsloweddownbymemoryaccess. Storeresultsfromregistersouttomemory.
TheARMhasthreesetsofinstructionswhichinteractwithmain memory.Theseare:
Singleregisterdatatransfer(LDR/STR). Blockdatatransfer(LDM/STM). SingleDataSwap(SWP).
8/22/2008
45
Singleregisterdatatransfer
Thebasicloadandstoreinstructionsare:
LoadandStoreWordorByte
LDR/STR/LDRB/STRB
ARMArchitectureVersion4alsoaddssupportforHalfwordsand signeddata.
LoadandStoreHalfword
LDRH/STRH
LoadSignedByteorHalfword loadvalueandsignextenditto32bits.
LDRSB/LDRSH
Alloftheseinstructionscanbeconditionallyexecutedby insertingtheappropriateconditioncodeafterSTR/LDR.
e.g.LDREQB
Syntax:
<LDR|STR>{<cond>}{<size>}Rd,<address>
8/22/2008
46
LoadandStoreWordorByte:BaseRegister
Thememorylocationtobeaccessedisheldinabaseregister
STRr0,[r1] LDRr2,[r1] ;Storecontentsofr0tolocationpointedto ;bycontentsofr1. ;Loadr2withcontentsofmemorylocation ;pointedtobycontentsofr1.
r0 0x5 Memory
Base Register
r1 0x200
0x200
r2 0x5 0x5
8/22/2008
47
Load/StoreWordorByte:OffsetsfromtheBaseRegister
Aswellasaccessingtheactuallocationcontainedinthebase register,theseinstructionscanaccessalocationoffsetfromthe baseregisterpointer. Thisoffsetcanbe
Anunsigned12bitimmediatevalue(ie0 4095bytes). Aregister,optionallyshiftedbyanimmediatevalue
Thiscanbeeitheraddedorsubtractedfromthebaseregister:
Prefixtheoffsetvalueorregisterwith+(default)or.
Thisoffsetcanbeapplied:
beforethetransferismade:Preindexedaddressing
optionallyautoincrementingthebaseregister,bypostfixingtheinstructionwith an!.
afterthetransferismade:Postindexedaddressing
causingthebaseregistertobeautoincremented.
8/22/2008
48
Load/StoreWordorByte:PreindexedAddressing
Example:STRr0,[r1,#12]
Memory Offset 12 Base Register r1 0x200
0x200 0x20c
r0 0x5
0x5
8/22/2008
49
LoadandStoreWordorByte:PostindexedAddressing
Example:STRr0,[r1],#12
Memory r1 0x20c r0
0x20c
Offset 12
0x5
r1 0x200
0x200
0x5
Toautoincrementthebaseregistertolocation0x1f4insteaduse:
STRr0,[r1],#12
Ifr2contains3,autoincrementbaseregisterto0x20cbymultiplyingthisby 4:
STRr0,[r1],r2,LSL#2
8/22/2008
50
LoadandStoreswithUserModePrivilege
Whenusingpostindexedaddressing,thereisafurtherformof Load/StoreWord/Byte:
<LDR|STR>{<cond>}{B}TRd,<post_indexed_address>
Whenusedinaprivilegedmode,thisdoestheload/storewith usermodeprivilege.
Normallyusedbyanexceptionhandlerthatisemulatingamemoryaccess instructionthatwouldnormallyexecuteinusermode.
8/22/2008
51
ExampleUsageofAddressingModes
Imagineanarray,thefirstelementofwhichispointedtobythecontentsofr0. Ifwewanttoaccessaparticularelement, thenwecanusepreindexedaddressing:
r1iselementwewant. LDRr2,[r0,r1,LSL#2]
3 12 8 4 0 element Memory Offset
Useafurtherregistertostoretheaddressoffinalelement, sothattheloopcanbecorrectlyterminated.
8/22/2008
52
OffsetsforHalfwordandSignedHalfword/ByteAccess
TheLoadandStoreHalfwordandLoadSignedByteorHalfword instructionscanmakeuseofpre andpostindexedaddressingin muchthesamewayasthebasicloadandstoreinstructions. Howevertheactualoffsetformatsaremoreconstrained:
Theimmediatevalueislimitedto8bits(ratherthan12bits)givinganoffset of0255bytes. Theregisterformcannothaveashiftappliedtoit.
8/22/2008
53
Effectofendianess
TheARMcanbesetuptoaccessitsdataineitherlittleorbig endianformat. Littleendian:
Leastsignificantbyteofawordisstoredinbits07ofanaddressedword.
Bigendian:
Leastsignificantbyteofawordisstoredinbits2431ofanaddressedword.
8/22/2008
54
YAEndianess Example
r0 = 0x11223344
31 24 23 16 15 87 0
11
22
33
44
31
24 23
16 15
87
31
24 23
16 15
87
r1 = 0x100
11
22
33
44
Memory
LDRB r2, [r1]
44
33
22
11
r1 = 0x100
Little-endian
31 24 23 16 15 87 0
Big-endian
31 24 23 16 15 87 0
00
00
00
44
00
00
00
11
r2 = 0x44
8/22/2008
r2 = 0x11
55
BlockDataTransfer(1)
TheLoadandStoreMultipleinstructions(LDM/STM)allow betweeen1and16registerstobetransferredtoorfrom memory. Thetransferredregisterscanbeeither:
Anysubsetofthecurrentbankofregisters(default). Anysubsetoftheusermodebankofregisterswheninapriviledgedmode (postfixinstructionwitha^).
31 28 27 24 23 22 21 20 19 16 15 0
Cond
1 0 0 P U S W L
Rn
Register list
Condition field
Up/Down bit
0 = Down; subtract offset from base 1 = Up ; add offset to base
Base register
Load/Store bit
0 = Store to memory 1 = Load from memory
8/22/2008
56
BlockDataTransfer(2)
Baseregisterusedtodeterminewherememoryaccessshould occur.
4differentaddressingmodesallowincrementanddecrementinclusiveor exclusiveofthebaseregisterlocation. Baseregistercanbeoptionallyupdatedfollowingthetransfer(byappending itwithan!. Lowestregisternumberisalwaystransferredto/fromlowestmemory locationaccessed.
Theseinstructionsareveryefficientfor
Savingandrestoringcontext
Forthisusefultoviewmemoryasastack.
Movinglargeblocksofdataaroundmemory
Forthisusefultodirectlyrepresentfunctionalityoftheinstructions.
8/22/2008
57
Stacks
Astackisanareaofmemorywhichgrowsasnewdatais pushedontothetopofit,andshrinksasdataispoppedoff thetop. Twopointersdefinethecurrentlimitsofthestack.
Abasepointer
usedtopointtothebottomofthestack(thefirstlocation).
Astackpointer
usedtopointthecurrenttopofthestack.
PUSH {1,2,3}
SP 3 2 1 SP BASE BASE BASE SP 2 1
POP
Result of pop = 3
8/22/2008
58
StackOperation
Traditionally,astackgrowsdowninmemory,withthelastpushedvalueat thelowestaddress.TheARMalsosupportsascendingstacks,wherethestack structuregrowsupthroughmemory. Thevalueofthestackpointercaneither:
Pointtothelastoccupiedaddress(Fullstack)
andsoneedspredecrementing(iebeforethepush)
Pointtothenextoccupiedaddress(Emptystack)
andsoneedspostdecrementing(ieafterthepush)
Thestacktypetobeusedisgivenbythepostfixtotheinstruction:
STMFD/LDMFD:FullDescendingstack STMFA/LDMFA:FullAscendingstack. STMED/LDMED:EmptyDescendingstack STMEA/LDMEA:EmptyAscendingstack
Note:ARMCompilerwillalwaysuseaFulldescendingstack.
8/22/2008
59
StackExamples
STMFD sp!, {r0,r1,r3-r5} STMED sp!, {r0,r1,r3-r5} STMFA sp!, {r0,r1,r3-r5} STMEA sp!, {r0,r1,r3-r5}
0x418
SP r5 r4 r3 r1 r0 SP r5 r4 r3 r1 r0
Old SP
Old SP
SP
r5 r4 r3 r1 r0
r5 r4 r3 r1 r0
Old SP
Old SP
0x400
SP
0x3e8
8/22/2008
60
StacksandSubroutines
Oneuseofstacksistocreatetemporaryregisterworkspaceforsubroutines. Anyregistersthatareneededcanbepushedontothestackatthestartofthe subroutineandpoppedoffagainattheendsoastorestorethembefore returntothecaller:
STMFD sp!,{r0-r12, lr} ........ ........ LDMFD sp!,{r0-r12, pc} ; stack all registers ; and the return address ; load all the registers ; and return automatically
8/22/2008
61
DirectfunctionalityofBlockDataTransfer
WhenLDM/STMarenotbeingusedtoimplementstacks,itis clearertospecifyexactlywhatfunctionalityoftheinstructionis:
i.e.specifywhethertoincrement/decrementthebasepointer,beforeor afterthememoryaccess.
Inordertodothis,LDM/STMsupportafurthersyntaxin additiontothestackone:
STMIA/LDMIA:IncrementAfter STMIB/LDMIB:IncrementBefore STMDA/LDMDA:DecrementAfter STMDB/LDMDB:DecrementBefore
8/22/2008
62
Example:BlockCopy
Copyablockofmemory,whichisanexactmultipleof12wordslongfromthe locationpointedtobyr12tothelocationpointedtobyr13.r14pointstothe endofblocktobecopied.
; r12 points to the start of the source data ; r14 points to the end of the source data ; r13 points to the start of the destination data loop LDMIA STMIA CMP BNE r12!, {r0-r11} ; load 48 bytes r13!, {r0-r11} ; and store them r12, r14 loop ; check for the end ; and loop until done
r13 r14 Increasing Memory
Thislooptransfers48bytesin31cycles Over50Mbytes/secat33MHz
r12
8/22/2008
63
SwapandSwapByteInstructions
Atomicoperationofamemoryreadfollowedbyamemorywrite whichmovesbyteorwordquantitiesbetweenregistersand memory. Syntax:
SWP{<cond>}{B}Rd,Rm,[Rn]
Rn 2 Memory Rm
1 temp 3 Rd
ToimplementanactualswapofcontentsmakeRd=Rm. Thecompilercannotproducethisinstruction.
8/22/2008 64
SoftwareInterrupt(SWI)
3 1 3 0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 1 9 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 1 0 9 8 7 6 5 4 3 2 1 0
InstructionType
SoftwareInterrupt
Condition
SWI NUMBER
Ineffect,aSWIisauserdefinedinstruction. ItcausesanexceptiontraptotheSWIhardwarevector(thus causingachangetosupervisormode,plustheassociatedstate saving),thuscausingtheSWIexceptionhandlertobecalled. Thehandlercanthenexaminethecommentfieldofthe instructiontodecidewhatoperationhasbeenrequested. BymakinguseoftheSWImechanism,anoperatingsystemcan implementasetofprivilegedoperationswhichapplications runninginusermodecanrequest. SeeExceptionHandlingModuleforfurtherdetails.
8/22/2008 65
Backup
8/22/2008
Assembler:Pseudoops
AREA>chunksofdata($data)orcode($code) ADR>loadaddressintoaregister ADRR0,BUFFER ALIGN>adjustlocationcountertowordboundaryusuallyaftera storagedirective END>nomoretoassemble
8/22/2008
67
Assembler:Pseudoops
DCD>definedwordvaluestoragearea BOWDCD1024,2055,9051 DCB>definedbytevaluestoragearea BOBDCB10,12,15 %>zeroedoutbytestoragearea BLBYTE%30
8/22/2008
68
Assembler:Pseudoops
IMPORT>nameofroutinetoimportforuseinthisroutine IMPORT_printf;Cprintroutine EXPORT>nameofroutinetoexportforuseinotherroutines EXPORTadd2;add2routine EQU>symbolreplacement loopcntEQU5
8/22/2008
69
AssemblyLineFormat
label <whitespace> instruction <whitespace> ; comment label: created by programmer, alphanumeric whitespace: space(s) or tab character(s) instruction: op-code mnemonic or pseudo-op with required fields comment: preceded by ; ignored by assembler but useful to the programmer for documentation NOTE: All fields are optional.
8/22/2008
70
Example:Cassignments
C:
x = (a + b) - c;
Assembler:
ADR r4,a LDR r0,[r4] ADR r4,b LDR r1,[r4] ADD r3,r0,r1 ADR r4,c LDR r2,[r4] SUB r3,r3,r2 ADR r4,x STR r3,[r4]
2008WayneWolf
; get address for a ; get value of a ; get address for b, reusing r4 ; get value of b ; compute a+b ; get address for c ; get value of c ; complete computation of x ; get address for x ; store value of x
ComputersasComponents2nd ed.
8/22/2008
71
Example:Cassignment
C:
y = a*(b+c);
Assembler:
ADR LDR ADR LDR ADD ADR LDR MUL ADR STR r4,b ; get address for b r0,[r4] ; get value of b r4,c ; get address for c r1,[r4] ; get value of c r2,r0,r1 ; compute partial result r4,a ; get address for a r0,[r4] ; get value of a r2,r2,r0 ; compute final value for y r4,y ; get address for y r2,[r4] ; store y
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
72
Example:Cassignment
C:
z = (a << 2) | (b & 15);
Assembler:
ADR r4,a ; get address for a LDR r0,[r4] ; get value of a MOV r0,r0,LSL 2 ; perform shift ADR r4,b ; get address for b LDR r1,[r4] ; get value of b AND r1,r1,#15 ; perform AND ORR r1,r0,r1 ; perform OR ADR r4,z ; get address for z STR r1,[r4] ; store value for z
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
73
Example:ifstatement
C:
if (a > b) { x = 5; y = c + d; } else x = c - d;
Assembler:
; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BLE fblock ; if a ><= b, branch to false block
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
74
ifstatement,contd.
; true block MOV r0,#5 ; generate value for x ADR r4,x ; get address for x STR r0,[r4] ; store x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute y ADR r4,y ; get address for y STR r0,[r4] ; store y B after ; branch around false block
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
75
ifstatement,contd.
; false block fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value for d SUB r0,r0,r1 ; compute a-b ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ...
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
76
Example:Conditionalinstructionimplementation
; true block MOVLT r0,#5 ; generate value for x ADRLT r4,x ; get address for x STRLT r0,[r4] ; store x ADRLT r4,c ; get address for c LDRLT r0,[r4] ; get value of c ADRLT r4,d ; get address for d LDRLT r1,[r4] ; get value of d ADDLT r0,r0,r1 ; compute y ADRLT r4,y ; get address for y STRLT r0,[r4] ; store y
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
77
Conditionalinstructionimplementation,contd.
; false block ADRGE r4,c ; get address for c LDRGE r0,[r4] ; get value of c ADRGE r4,d ; get address for d LDRGE r1,[r4] ; get value for d SUBGE r0,r0,r1 ; compute a-b ADRGE r4,x ; get address for x STRGE r0,[r4] ; store value of x
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
78
Example:switchstatement
C:
switch (test) { case 0: break; case 1: }
Assembler:
ADR r2,test ; get address for test LDR r0,[r2] ; load value for test ADR r1,switchtab ; load address for switch table LDR r1,[r1,r0,LSL #2] ; index switch table switchtab DCD case0 DCD case1 ...
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
79
Example:FIRfilter
C:
for (i=0, f=0; i<N; i++) f = f + c[i]*x[i];
Assembler
; loop initiation code MOV r0,#0 ; use r0 for I MOV r8,#0 ; use separate index for arrays ADR r2,N ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
80
FIRfilter,cont.d
ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit? BLT loop ; if i < N, continue
2008WayneWolf
ComputersasComponents2nd ed.
8/22/2008
81
ARMInstructionSetSummary(1/4)
82
ARMInstructionSetSummary(2/4)
83
ARMInstructionSetSummary(3/4)
84
ARMInstructionSetSummary(4/4)
85