Está en la página 1de 16

32BitMicrocontrollerCodeSize Analysis

Draft1.2.4.JosephYiu,AndrewFrame

Overview
Microcontrollerapplicationprogramcodesizecandirectlyaffectthecostandpowerconsumptionof productsthereforeitisalmostalwaysviewedasanimportantfactorintheselectionofa microcontrollerforembeddedprojects.Sincethereleaseandavailabilityof32bitprocessorssuch astheARMCortexM3,moreandmoremicrocontrollerusershavediscoveredthebenefitsof switchingto32bitproductslowerpower,greaterenergyefficiency,smallercodesizeandmuch betterperformance.Whilstmostofthebenefitsofusing32bitmicrocontrollersarewidelyknown, thecodesizeadvantageof32bitmicrocontrollersislessobvious. Inthisarticlewewillexplainwhy32bitmicrocontrollerscanreduceapplicationcodesizewhilststill achievinghighsystemperformanceandeaseofuse.

Typicalmythsofprogramsize
Myth#1:8bitand16bitmicrocontrollershavesmallercodesize
Thereisacommonmisconceptionthatswitchingfroman8bitmicrocontrollertoa32bit microcontrollerwillresultinmuchbiggercodesizewhy?Manypeoplehavetheimpressionthat8 bitmicrocontrollersuse8bitinstructionsand32bitmicrocontrollersuse32bitinstructions.This impressionisoftenreinforcedbyslightlymisleadingmarketingfromthe8bitand16bit microcontrollervendors. Inreality,manyinstructionsin8bitmicrocontrollersare16bit,24bitsorothersizeslargerthan8 bit,forexample,thePIC18instructionsizesare16bitand,withthe8051architecture,although someinstructionsare1bytelong,manyothersare2or3byteslong. Sowouldcodesizebebettermovingtoa16bitmicrocontroller?Notnecessarily.Takingthe MSP430asanexample,asingleoperandinstructioncantake4bytes(32bits)andadoubleoperand instructioncantake6bytes(48bits).Intheworstcase,anextendedimmediate/indexinstructionin MSP430Xcantake8bytes(64bits). SohowaboutthecodesizeforARMCortexmicrocontrollers?TheARMCortexM3andCortexM0 processorsarebasedonThumb2technology,whichprovidesexcellentcodedensity.Thumb2 microcontrollershave16bitinstructionsaswellas32bitinstructions,withthe32bitinstruction functionalityasupersetofthe16bitversion.InmostcasesaCcompilerwillusethe16bitversion oftheinstruction.The32bitversionwouldonlybeusedwhentheoperationcannotbeperformed ARMMicrocontrollerCodeSizeAnalysis|Overview 1

witha16bitinstruction.Asaresult,mostoftheinstructionsinanARMCortexmicrocontroller programare16bits.Thatsevensmallerthansomeoftheinstructionsin8bitmicrocontrollers.
Number of bits 64 48 32 16
Max Min Min

Instruction size
Max

Max

Min

8051

PIC18

PIC24

MSP430 / MSP430X

ARM

Figure1:Sizeofasingleinstructioninvariousprocessors WithinacompiledprogramforCortexMprocessors,thenumberof32bitinstructionscanbeonlya smallportionofthetotalinstructioncount.Forexample,theamountof32bitinstructionsinthe Dhrystoneprogramimageisonly15.8%ofthetotalinstructioncount(averageinstructionsizeis 18.53bits)whencompiledfortheCortexM3.FortheCortexM0theratioof32bitinstructionsis evenlowerat5.4%(averageinstructionsize16.9bits).

Myth#2:Myapplicationonlyprocesses8bitdataand16bitdata
Manyembeddeddevelopersthinkthatiftheirapplicationonlyprocesses8bitdatathenthereisno benefitinswitchingtoa32bitmicrocontroller.However,lookingintotheoutputfromtheC compilercarefully,inmostcasesthehumbleintegerdatatypeisactually16bits.Sowhenyou haveaforloopwithanintegerasloopindex,comparingavaluetoanintegervalue,orusingaC libraryfunctionthatusesaninteger(e.g.memcpy()),youareactuallyusing16bitorlargerdata. Thiscanaffectcodesizeandperformanceinvariousways: Foreachintegercomputation,an8bitprocessorwillneedmultipleinstructionstocarryout theoperations.Thisdirectlyincreasesthecodesizeandtheclockcyclecount. Iftheintegervaluehastobesavedintomemory,orifyouneedtoloadanimmediatevalue fromprogramROMtothisinteger,itwilltakemultipleinstructionsandmultipleclockcycles. Sinceanintegercantakeuptwo8bitregisters,moreregistersarerequiredtoholdthe samenumberofintegervariables.Whenthereareaninsufficientnumberofregistersinthe registerbanktoholdlocalvariables,somehavetobestoredinmemory.Thusan8bit microcontrollermightresultinmorememoryaccesseswhichincreasescodesizeand reducesperformanceandpowerefficiency.Thesameissueappliestotheprocessingof32 bitdataon16bitmicrocontrollers. ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 2

Sincemoreregistersarerequiredtoholdanintegerinan8bitmicrocontrollerwhenpassing variablestoafunctionviathestack,orsavingregistercontentsduringcontextswitchingor interruptservicing,thenumberofstackoperationsrequiredismorethanthatof32bit microcontrollers.Thisincreasestheprogramsize,andcanalsoaffectinterruptlatency becauseanInterruptServiceRoutine(ISR)mustmakesurethatallregistersusedaresaved atISRentryandrestoredatISRexit.Thesameissueappliestotheprocessingof32bitdata on16bitmicrocontrollers.

Thereisevenmorebadnewsfor8bitmicrocontrollerusers:memoryaddresspointerstakemultiple bytessodataprocessinginvolvingtheuseofpointerscanthereforebeextremelyinefficient.

Myth#3:A32bitprocessorisnotefficientathandling8bitand16bitdata
Most32bitprocessorsareactuallyveryefficientathandling8bitand16bitdata.Compact memoryaccessinstructionsforsignedandunsigned8bit,16bitand32bitdataareallavailable. Therearealsoanumberofinstructionsspeciallyincludedfordatatypeconversions.Overallthe handlingof8bitand16bitdatain32bitprocessorssuchastheARMCortexmicrocontrollersisjust aseasyandefficientashandling32bitdata.

Myth#4:ClibrariesforARMprocessorsaretoobig
TherearevariousClibraryoptionsforARMprocessors.Formicrocontrollerapplications,anumber ofcompilervendorshavedevelopedClibrarieswithamuchsmallerfootprint.Forexample,the ARMdevelopmenttoolshaveasmallerversionoftheClibrarycalledMicroLib.TheseClibrariesare especiallydesignedformicrocontrollersandallowapplicationcodesizetobesmallandefficient.

Myth#5:InterrupthandlingonARMmicrocontrollersismorecomplex
OntheARMCortexmicrocontrollerstheinterruptserviceroutinesarejustnormalCsubroutines. VectoredornestedinterruptsaresupportedbytheNestedVectoredInterruptController(NVIC) withnoneedforsoftwareintervention.Infactthesetupprocessandprocessingofaninterrupt requestismuchsimplerthan8bitand16bitmicrocontrollers,asgenerallyyouonlyneedto programtheprioritylevelofaninterruptandthenenableit. Theinterruptvectorsarestoredinavectortableinthebeginningofthememory,normallywithin theflash,withouttheneedforanysoftwareprogrammingsteps.Whenaninterruptrequesttakes placetheprocessorautomaticallyfetchesthecorrespondinginterruptvectorandstartstoexecute theISR.Someoftheregistersarepushedtothestackbyahardwaresequenceandrestored automaticallywhentheinterrupthandlerexits.Theotherregistersthatarenotcoveredbythe hardwarestackingsequencearepushedontothestackbyCcompilergeneratedcodeonlyifthe registerisusedandmodifiedwithintheISR.

ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 3

Whataboutmovingto16bitmicrocontrollers?
16bitmicrocontrollerscanbeefficientinhandling16bitintegersand8bitdata(e.g.strings) howeverthecodesizeisstillnotasoptimalasusing32bitprocessors: Handlingof32bitdata:iftheapplicationrequireshandlingofanylonginteger(32bit)or floatingpointtypesthentheefficiencyof16bitprocessorsisgreatlyreducedbecause multipleinstructionsarerequiredforeachprocessingoperation,aswellasdatatransfers betweentheprocessorandthememory. Registerusage:Whenprocessing32bitdata,16bitprocessorsrequirestworegistersto holdeach32bitvariable.Thisreducesthenumberofvariablesthatcanbeheldinthe registerbank,hencereducingprocessingspeedaswellasincreasingstackoperationsand memoryaccesses. Memoryaddressingmode:Many16bitarchitecturesprovideonlybasicaddressingmodes similarto8bitarchitectures.Asaresult,thecodedensityispoorwhentheyareusedin applicationsthatrequireprocessingofcomplexdatasets. 64Kbyteslimitation:Many16bitprocessorsarelimitedto64Kbytesofaddressable memoryreducingthefunctionalityoftheapplication.Some16bitarchitectureshave extensionstoallowmorethan64Kbytesofmemorytobeaccessed,however,these extensionshaveaninstructioncodeandclockcycleoverhead,forexample,amemory pointerwouldbelargerthan16bitsandmightrequiremultipleinstructionsandmultiple registerstoprocessit.

ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 4

InstructionSetefficiency
Whencustomersporttheirapplicationsfrom8bitarchitecturetoARMCortexmicrocontrollers, theyveryoftenfindthatthetotalcodehasdramaticallydecreased.Forexample,whenMelfas(a leadingcompanyincapacitivesensingtouchscreencontrollers)evaluatedtheCortexM0processor, theyfoundthattheCortexM0programsizewaslessthanhalfofthatofthe8051and,atthesame time,deliveredfivetimesmoreperformanceatthesameclockfrequency.This,forexample,could enablethemtoruntheapplicationat1/5clockspeedoftheequivalent8051product,reducingthe powerconsumption,andloweringproductcostatthesametimeduetoasmallerprogramflashsize requirements. SohowdoesARMarchitectureprovidesuchbigadvantages?ThekeyfactorisThumb2technology whichprovidesahighlyefficientunifiedinstructionset.

PowerfulAddressingmode
TheARMCortexmicrocontrollerssupportanumberofaddressingmodesformemorytransfer instructions.Forexample: Immediateoffset(Address=Registervalue+offset) Registeroffset((Address=Registervalue1+shifted(Registervalue2)) PCrelated(Address=CurrentPCvalue+offset) Stackpointerrelated(Address=SP+offset) Multipleregisterloadandstore,withoptionalautomaticbaseaddressupdate PUSH/POPinstructionswithmultipleregisters

Asaresultofthesevariousaddressingmodes,datatransferbetweenregistersandmemorycanbe handledwithfewerinstructions.SincethePUSHandPOPinstructionssupportmultipleregisters,in mostcases,savingandrestoringofregistersinafunctioncallwillonlyneedonePUSHinthe beginningoffunctionandonePOPattheendofthefunction.ThePOPcanevenbecombinedwith thereturninstructionattheendoffunctiontofurtherreducetheinstructioncount.

Conditionalbranches
AlmostallprocessorsprovideconditionalbranchinstructionshoweverARMprocessorsprovide improvedconditionalbranchingbyhavingseparatedbranchconditionsforsignedandunsigneddata operationresults,andprovidingagoodbranchrange. Forexample,whencomparingtheconditionalbranchesoftheCortexM0andMSP430,theCortex M0hasmorebranchconditionsavailable,makingitpossibletogeneratemorecompactcodeno matterwhetherthedatabeingprocessissignedorunsigned.TheMSP430conditionalbranches mightrequiremultipleinstructionstogetthesameoperations. ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 5

Generallythesamesituationappliestomany8bitor16bitmicrocontrollerswhendealingwith signeddata,additionalstepsmightalsoberequiredintheconditionalbranch. InadditiontothebranchinstructionsavailableintheCortexM0,theCortexM3processoralso supportscompareandbranchinstructions(CBZandCBNZ).Thisfurthersimplifiessomeofthe conditionalbranchinstructionsequence.

ConditionalExecution
AnotherareathatallowstheARMCortexM3microcontrollerstohavemorecompactcodeisthe conditionalexecutionfeature.TheCortexM3supportsaninstructioncalledIT(IFTHEN).This instructionallowsupto4subsequentinstructionstobeconditionallyexecutedreducingtheneed foradditionalbranches.Forexample, if(xpos1<xpos2){x1=xpos1; x2=xpos2; }else{ x1=xpos2; x2=xpos1; Thiscanbeconvertedtothefollowingassemblycode(needs12bytesintheCortexM3): CMP R0, R1 ITTEE CC ; if unsigned < MOVCC R2, R0 MOVCC R3, R1 MOVCS R3, R0 MOVCS R2, R1 Otherarchitecturesmightneedanadditionalbranch(e.g.needs14bytesinMSP430): CMP.W R14, R13 JGE Label1 ; if unsigned < ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 6

MOV.W R11, R14 MOV.W R12, R13 JMP Label2 Label1 MOV.W R11, R13 MOV.W R12, R14 Label2 ThisresultsinanextratwobytesfortheMSP430whencomparedtoCortexM3.

MultiplyandDivide
BoththeCortexM0andCortexM3processorssupportsinglecyclemultiplyoperations.TheCortex M3alsohasmultiplyandmultiplyaccumulateinstructionsfor32bitor64bitresults.These instructionsgreatlyreducethecodesizerequiredwhenhandlingmultiplicationoflargevariables. Mostother8bitand16bitmicrocontrollersalsohavemultiplyinstructionshoweverthelimitation oftheregistersizeoftenmeansthatthemultiplicationrequiresmultiplesteps,iftheresultneedsto bemorethan8or16bits. TheMSP430doesnothavemultiplyinstruction(MSP430documentslaa329,reference1).Tocarry outmultiplicationeitheramemorymappedhardwaremultiplierisused,orthemultiplyoperation hastobehandledbysoftwareusingaddandshift.Evenifahardwaremultiplierispresentthe memorymappednatureofthemultiplierresultsintheadditionaloverheadoftransferringdatato andfromtheexternalhardware.Inaddition,usingthemultiplierwithinaninterrupthandlercould causeexistingdatainthemultipliertobelost.Asaresult,interruptsareusuallydisabledbeforea multiplyoperationandtheinterruptisreenabledaftermultiplicationiscompleted.Thisadds additionalsoftwareoverheadandaffectsinterruptlatencyanddeterminism. TheCortexM3processoralsohasunsignedandsignedintegerdivideinstructions.Thisreducesthe codesizerequiredinapplicationsthatneedtoperformintegerdivisionbecausethereisnoneedfor theClibrarytoincludeafunctionforhandlingdivideoperations.

Powerfulinstructionset
Inadditionaltothestandarddataprocessing,memoryaccessandprogramcontrolinstructions,the Cortexmicrocontrollersalsosupportanumberofotherinstructionstohelpdatatypeconversion. TheCortexM3processoralsosupportsanumberofbitfieldoperationsreducingthesoftware overheadin,forexample,peripheralcontrolandcommunicationdataprocessing.

ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 7

Breakingthe64Kbytememorybarrier
Asalreadymentioned,many8bitand16bitmicrocontrollersarelimitedto64kbytesaddressable memory.Duetothenatureof8bitand16bitmicrocontrollerarchitecture,thecodingefficiencyof thesemicrocontrollersoftendecreasesdramaticallywhentheapplicationexceedsthe64kbyte memorybarrier.In8bitand16bitmicrocontrollers(e.g.8051,PIC24,C166)thisisoftenhandledby memorybankswitchingormemorysegmentationwiththeswitchingcodegeneratedautomatically bytheCcompilers.Everytimeafunctionordatainadifferentmemorypageisrequiredbank switchingcodewouldbeneededandhencefurtherincreasestheprogramsize.

Figure2:Increasecodesizeoverheadofmemorybankswitchingorsegmentationin8bitand16bit systems Thememorybankswitchingnotonlycreateslargercodebutitalsogreatlyreducestheperformance ofasystem.Thisisespeciallythecaseifthedatabeingprocessedisondifferentmemorybank(e.g. copyingablockofdatafromonepagetoanotherpagecanbeverycostlyintermsofperformance.) Thisisparticularlyinefficientfor8bitmicrocontrollerslikethe8051becausetheMCS51 architecturedoesnothavepropersupportforsuchamemorybankswitchingfeature.Therefore

memoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollike I/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedout inacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthe memorypagesmightnotbefullyutilizedandmemoryspaceiswasted.


Forthe8bitand16bitmicrocontrollersthatsupportmemoryofover64kthisoftencomesata price.TheMSP430Xdesignovercomesthe64KbytesmemorybarrierbyincreasingtheProgram Counter(PC)andregisterwidthto20bits.Despitenomemorypagingbeinginvolved,thesizesof someMSP430XinstructionsareconsiderablylargerthantheoriginalMSP430.Forexample,when thelargememorymodelisused,adoubleoperandformattedinstructioncantake8bytesrather than6(a33%increases): ARMMicrocontrollerCodeSizeAnalysis|Breakingthe64Kbytememorybarrier 8

15

12 Op-code

11 Rsrc

7 Ad

6 B/W

5 As

3 Rdst

MSP430 Double Operand intruction

Source or destination 15:0 Destination 15:0

15

12 00011 Op-code

11

8 Source 19:16 Rsrc

6 A/L B/W

5 Rsrv As

MSP430X Double Operand intruction

Ad

3 0 Destination 19:16 Rdst

Source or destination 15:0 Destination 15:0

Figure3:SupportoflargermemorysystemincreasesthesizeofsomeinstructionsinMSP430X Apartfromthesizeoftheinstructionitself,theuseofthe20bitaddressingalsoincreasesthe numberofstackoperationsrequired.Sincethememoryisonly16bit,thesavingofa20bitaddress pointerwillneedtwostackpushoperations,resultinginextrainstructionsandpoorutilizationofthe stackmemory.

Figure4:UseoflargememorydatamodelinMSP430Xincreasescodesize Asaresult,anMSP430Xapplicationhasalowercodedensitywhenthelargememorymodelisused, whichisrequiredwhentheaddressrangeexceedsthe64krange. InARMCortexmicrocontrollers,32bitlinearaddressingisusedtoprovide4GBofmemoryspacefor embeddedapplications.Thereforethereisnopagingoverheadandtheprogrammingmodeliseasy touse.

ARMMicrocontrollerCodeSizeAnalysis|Examples 9

Examples
Todemonstratethecodesizecomparedto8bitand16bitprocessors,anumberoftestcasesare compiledandillustratedhere.ThetestsarebasedonMSP430CompetitiveBenchmarkdocument fromTexasinstruments(SLAA205C,reference2).Theresultslistedhereshowtotalprogram memorysizeinbytes. MSP430results: ThetestslistedarecompiledusingIAREmbeddedWorkbench4.20.1withhardware multiplerenabled,optimizationlevelsettoHighwithSizeoptimization.Unlessspecified, theSmalldatamodelisusedandtypedoubleis32bit.Theresultsareobtainedatlinker outputreport(CODE+CONST). ARMCortexprocessorresults: ThetestslistedarecompiledusingRealViewDevelopmentSuite4.0SP2.Optimizationlevel is3forsize,minimalvectortable,andMicroLIBisused.Theresultsareobtainedatlinker outputreport(VECTORS+CODE). Test Generic MSP430F5438 MSP430F5438 CortexM3 MSP430 largedata model 198 144 256 1122 180 198 144 244 1122 178 246 228 218 218 1170 202 144 256 1162 196 290 (linkererror) 218 218 1222 144 144 120 600 184 256 228 160 160 716(820 without modification) 900 4384(8496 without modification)

Math8bit Math16bit Math32bit MathFloat Matrix2dim8bit

Matrix2dim16bit 268 Matrixmult Switch8bit Switch16bit Firfilter(Note1) 276 200 198 1202

Dhry Whet(Note2)

923 6434

893 6308

1079 6614

ARMMicrocontrollerCodeSizeAnalysis|Examples 10

Note1:TheconstantdataarrayintheFirfiltertestismodifiedtouse16bitdatatypeontheCortex Mprocessor(constunsignedshortintINPUT[]). Note2:Whencertainmathfunctionsareused(sin,cos,atan,sqrt,exp,log)intheARMCstandard thedoubleprecisionlibrariesareusedbydefault.Thiscanresultinsignificantlylargerprogramsize unlessadjustmentsaremade.Inordertoachieveanequivalentcomparison,theprogramcodeis editedsothatsingleprecisionversionsareused(sinf,cosf,atanf,sqrtd,expf,logf).Also,someof theconstantdefinitionshavebeenadjustedtosingleprecision(e.g.1.0becomes1.0F).

Figure5:Codesizecomparisonforbasicoperations Thetotalsizeforsimpletests(integermath,matrixandswitchtests)are: Summaryforsimple tests Totalsize(bytes) Advantage(%smaller) Forapplicationsusingfloatingpoint,thereusasignicantadvantageforCortexmicrocontrollers., whereasDhrystoneprogramsizeiscloser. GenericMSP430 MSP430F5438 CortexM3

1720

1674 2.6%

1396 18.8%

ARMMicrocontrollerCodeSizeAnalysis|Examples 11

Figure6:Codesizecomparisonforfloatingpointoperationsandbenchmarksuites Thetotalsizeforbenchmarkandfloatingpointtests(Dhrystone,Whetstone,FirfilterandMathFloat) are: Summaryforsimple tests Totalsize(bytes) Advantage(%smaller) Observations: 1. Fromtheresults,wecanseethattheCortexmicrocontrollershavebettercodedensity comparedtoMSP430inmostcases.Theremainingtestsshowsimilarcodedensitywhen comparedtoMSP430. 2. Oneofthetests(firfilter)usesanintegerdatatypeforaconstantarray.Sinceanintegeris 32bitintheARMprocessorandis16bitonMSP430,theprogramhasbeenmodifiedto allowadirectcomparison. 3. WhenthelargedatamemorymodelisusedwithMSP430,thecodesizeincreasesbyupto 20%(dhrystone). 4. WeareunabletoreproducealloftheclaimedresultsintheTexasInstrumentsdocument. ThismaybebecausethestorageofconstantdatainROMmighthavebeenomittedfrom theircodesizecalculations. GenericMSP430 MSP430F5438 CortexM3

9681

9493 1.9%

6600 31.8%

ARMMicrocontrollerCodeSizeAnalysis|Examples 12

Additionalinvestigationonfloatingpoint
WhenanalysingtheresultsofthewhetstonebenchmarkitbecameapparentthattheMSP430C compileronlygeneratedsingleprecisionfloatingoperations,whiletheARMCcompilergenerated doubleprecisionoperationsforsomeofthemathfunctionsused. Afterchangingthecodetouseonlysingleprecisionfloatingpointsthecodesizereduced dramaticallyandresultedinmuchsmallercodesizethantheMSP430codesize. TheIARMSP430compilerhasanoptiontodefinefloatingpoint:Sizeoftypedoublewhichisby defaultsetto32bit(singleprecision).Ifitissetto64bit(asinARMCcompiler),thecodesize increasedsignificantly. Programsize TypeDoubleis32bit TypeDoubleis64bit TheseresultsmatchthoseseenfortheARMCortexM3processor. Programsize Whetstonemodifiedtousesingleprecisiononly Outofboxcompileforwhetstone(usedouble precisionformathfunctions) Theoptionofsettingtypedoubleto32bitisquitesensibleforsmallmicrocontrollerapplications wheretheCcodemightonlyneedtoprocesssourcedatageneratedfrom12bit/14bitADC. Benchmarkingusingdifferentdefaulttypescanmakeaverybigdifferenceandnotshowaccurate comparativeresults. CortexM3 4384 8496 GenericMSP430 6434 11510 MSP430430F5438 6308 11798

ARMMicrocontrollerCodeSizeAnalysis|Additionalinvestigationonfloatingpoint 13

RecommendationsonhowtogetthesmallestcodesizewithCortexM microcontrollers
UseMicroLib
IntheARMdevelopmenttoolsthereisanoptiontousetheareaoptimizedMicroLIBratherthanthe standardClibraries.TheMicroLIBissuitableformostembeddedapplicationsandhasamuch smallercodesizewhencomparedtothestandardClibrary.

Ensuretheuseofareaoptimizations
TheperformanceofCortexMmicrocontrollersismuchhigherthanthatof16bitand8bit microcontrollerssowhenportingapplicationsfromthesemicrocontrollersyoucangenerallyselect thehighestareaoptimizationratherthanselectingoptimizationsforspeed.Theresulting performancewillstillbemuchhigherthanthatofa16bitor8bitsystemrunningatthesameclock frequency.

Usetherightdatatype
Whenportingapplicationsfrom8bitor16bitmicrocontrollers,youmightneedtomodifythedata typeforconstantarraystoachievethemostoptimalprogramsize.Forexample,anintegeris normally16bitin8bitand16bitmicrocontrollers,whileinARMmicrocontrollersintegersare32 bit. Type char,unsignedchar enum short,unsignedshort int,unsignedint long,unsignedlong Numberofbitsin 8051 8 8/16 16 16 32 Numberofbitsin MSP430 8 16 16 16 32 NumberofbitsinARM 8 8/16/32(smallestis chosen) 16 32 32

float 32 32 32 double 32 32 64 Whenportingaconstantarrayofintegersfroman8bitor16bitarchitecture,youshouldmodify thedatatypefrominttoshortinttomakesuretheconstantarrayremainsthesamesize.For example, constintmydata={1234,5678,}; Thisshouldbechangedto: constshortintmydata={1234,5678,}; ARMMicrocontrollerCodeSizeAnalysis|Recommendationsonhowtogetthesmallest 14 codesizewithCortexMmicrocontrollers

Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortinteger mightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g. variables)doesnotrequiremodification.

Floatingpointfunctions
Somefloatingpointfunctionsaredefinedassingleprecisionin8bitor16bitmicrocontrollersand arebydefaultdefinedasdoubleprecisioninARMmicrocontrollers,aswehavefoundoutwiththe whetstonetestanalysis.Whenportingapplicationcodefrom8bitor16bitmicrocontrollerstoan ARMmicrocontroller,youmighthavetoadjustmathfunctionstosingleprecisionversionsand modifyconstantdefinitionstoensurethattheprogrambehavesinthesameway.Forexample,in thewhetstoneprogramcode,asectionofcodeusessomemathfunctionsthataredoubleprecision inARMcompilers: X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(XY)1.0)); Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(XY)1.0)); Ifwewanttousesingleprecisiononly,theprogramcodehastobechangedto X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(XY)1.0F)); Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y)+cosf(XY)1.0F)); Otherconstantdefinitionssuchas: /*Module7:Procedurecalls*/ X=1.0; Y=1.0; Z=1.0; shouldtobechangedtothefollowingforsingleprecisionrepresentation: /*Module7:Procedurecalls*/ X=1.0F; Y=1.0F; Z=1.0F;

Defineperipheralsasdatastructure
Youcanalsoreduceprogramsizebydefiningregistersinperipheralsasadatastructure.For example,insteadofrepresentingtheSysTicktimerregistersas #define #define #define #define SYSTICK_CTRL SYSTICK_LOAD SYSTICK_VAL SYSTICK_CALIB (*((volatile (*((volatile (*((volatile (*((volatile unsigned unsigned unsigned unsigned long long long long *)(0xE000E010))) *)(0xE000E014))) *)(0xE000E018))) *)(0xE000E01C)))

ARMMicrocontrollerCodeSizeAnalysis|Recommendationsonhowtogetthesmallest 15 codesizewithCortexMmicrocontrollers

youcandefinetheSysTickregistersas: typedef struct { volatile unsigned int CTRL; volatile unsigned int LOAD; volatile unsigned int VAL; unsigned int CALIB; } SysTick_Type; #define SysTick ((SysTick_Type *) 0xE000E010) Bydoingthis,youonlyneedoneaddressconstanttobestoredintheprogramROM.Theregister accesseswillbeusingthisaddressconstantwithdifferentaddressoffsetsfordifferentregisters.Ifa sequenceofhardwareregisteraccessesisrequiredforaperipheral,usingadatastructurecan reducecodesizeaswellasimproveperformance.Most8bitmicrocontrollersdonothavethesame addressingmodefeaturewhichcanresultinamuchlargercodesizeforthesametask.

Conclusions
32bitprocessorsprovideequalormoreoftenbettercodesizethan8bitand16bitarchitectures whilstatthesametimedeliveringmuchbetterperformance. Forusersof8bitmicrocontrollers,movingtoa16bitarchitecturecansolvesomeoftheinherent problemswith8bitarchitectures,however,theoverallbenefitsofmigratingfrom8bitto16bitis muchlessthanthatachievedbymigratingtothe32bitCortexprocessors. Asthepowerconsumptionandcostof32bitmicrocontrollershasreduceddramaticallyoverlast fewyears,32bitprocessorshavebecomethebestchoiceformanyembeddedprojects.

Reference
ThefollowingarticlesonMSP430arereferenced: Reference 1 MSP430CompetitiveBenchmarking http://focus.ti.com/lit/an/slaa205c/slaa205c.pdf 2 EfficientMultiplicationandDivisionUsingMSP430 http://focus.ti.com/lit/an/slaa329/slaa329.pdf

ARMMicrocontrollerCodeSizeAnalysis|Conclusions 16