Está en la página 1de 36

OracleExadataasaResearchPlatform

JohnC C.Hax OracleCorporation, Corporation MemberIEEE

Science Aproductofdataanalysis
Sciencedoesnotresultfromthelaunchofa missionorthecollectionofdata.Rather, scienceonlyoccursthroughtheanalysisand understandingofthatdata.
PhilosophyoftheNASAScienceMissionDirectorate(SMD)

OraclesR&DPresence
National lIgnitionFacility l Fusionand dLaserResearch h
Database,SecureFiles,OrchestrationandMiddleware,Virtualization, Dataguard,GridControl,StorageManagement,Partitioning

CERN/LargeHadronCollider
Database,Streams,Dataguard,GridControl,StorageManagement, Partitioning

MaxPlanckInstitute
Database,SecureFiles,Dataguard,GridControl,StorageManagement, Partitioning

NBII.gov NationalBiologicalInformationInfrastructure
Middleware,Portal,Spatial http://www.nbii.gov/portal/server.p

JetPropulsionLab
Database,GridControl,Partitioning,StorageManagement

FutureofScientificComputingandAnalysis

DataIntensive

+
Collaborative

DataIntensiveCollaborativeScience

DataIntensiveCollaborativeScience
Cost Knowledge Base Complexity

Drivers

Interdependence

Collaboration
Enablers

Network Capacity

Standards JSR/JCR

Web 2.0

Virtualization/ Grid Technologies

Moores Law

Oracle

DataChallengesforScience
Stewardship thelongtermpreservationof datasoastoensureitscontinuedvalueforboth anticipatedandunanticipateduses Integrity/Provenance dataiscomplete, accurate,verifiable,ifpossiblereproducible Accessibility availabilityofresearchdatato researchers h other th than th those th who h generated t dth the datawhenthedataisneeded Privacy ensuringdataisaccessedinan appropriatemannerinaverifiablemannerby theappropriatepeopleorresources

UseCasesforDataSharing
Reanalysis
Neworexistingdataforsameproblem

SecondaryAnalysis
Reuseofsamedatafordifferentproblem

Replication
Differentdatatostudysameproblem

Verification
3rdpartyreanalysisusingexistinginitialdata.

Collaborators
InitialInvestigators SubsequentAnalysts S i ifi C Scientific Community i FundingAgenciesandFoundations

ObstaclestoDataSharing
Human
Lack L kof fF Foresight i h FearofConflicting C l i Conclusions BreechofConfidentiality Greater G Influence I fl Compromisingof P t ti lP Potential Profits fit Systematic ProjectLevelFunding OriginationRules LackofGuidelines LackofStandards
Classifying Archiving Documenting Metadata

TechnicalObstaclestoCollaboration
Stovepiped/DesktopSystems LackofInstitutionalITSupport Informal f lDataSh Sharing i Mechanisms h i LackofExpertise

DataChallengestoCollaboration
PhysicalLimitations I/OIntensive limitationsonmaxIOPS Networkspeeds/cost time/costtoshipdatatocomputenodes MultipleDataSilos Governanceissues Pedigreeofthedata Multipleaccesspoliciestogettothedata Duplicate p datastoredineachsilo Needtoscaledisparatesystemsasdatagrows IncreasedeffortrequiredforScientists,Developers,Administrators Correlatingthedataacrossdatasilos Coordinatedbackupandrecoveryplan MultipleDataAggregationEfforts

ResearchOrganizationsneedtoefficiently ,analyze y andmanage g alldata store,


Structured SemiStructured XML PDF Unstructured

Database

Filesystem

Simplicityandperformanceoffilesystemsmakesit attractivetostorefiledatainfilesystems,while keepingrelationaldatainDB

ProblemwithFileSystems(bfiles)
TheSplitArchitecture astepinthewrongdirection
Manyapplicationsmanipulatebothfilesandrelationaldata Richuserexperience,compliance,businessintegration Thissplitcompromisesthevalueofthedata. Difficultymergingdata InabilitytoperformFederatedSearches LegacyofStovePipedData Disjointsecurityandauditingmodels Changescannotbemadeatomically Backupandrecoveryarefragmented Search S hacrossrelational l ti ldata d t and dfiles fil is i difficult diffi lt Spacemanagementiscomplicated Separate p interfacesandprotocols p Applicationarchitecturemorecomplex

IntegratingUnstructuredData
New in Oracle Database 11g

RFID DICOM

3D

Binary XML

Images

SecureFiles DBFS

DisparateDataTypes
DatasetCategory OpticsMetrology Productionchecklists Calibration OIInspection OIInspection Online AutoAlignment TargetDiagnosticRaw LaserDiagnosticsRaw ShotAnalysis Anal sisResults Res lts Operations Examples DataType OpticsMeasurements XML,Other LRUmanufacturingchecklist XLS EngNodeSensitivity,CalATP XML,Other DMS,IMS,CIM,VIDARlabs Images(jpeg,GIF) FODI,PODI,LOIS Images(jpeg,GIF) AASamples Images SXI,Dante,FABS HDF5,Other EnergyNode,ISPCal HDF5,Other Anal eddata Analyzed HDF5 Other HDF5, Environmental Scalar

DatabaseFilesystems
BridgetheGapbetweenFilesystemsand R l ti Relational l D Database t b S Systems t
MaintainFilesystemPerformance Leveragemultipleaccessmethods SingleSecurityMechanism UnifiedAdministrativeTools DataPedigree UnifiedArchitectureandSkillsets LeverageInstitutionalResourcesforIT EnablingCollaborationaroundData OptimizedforDataAccess

Filesystems

Databases

DatabaseFilesystems
DBFSisafilesysteminthedatabase,usesdatabaseforstorageandbringsall ofdatabasetechnologytofilesystems FuseClient DBFSimplementsthefilesysteminterfaces: 2methods(getpath,list)forareadonlyfilesystem 5methodsforafilesystemwithreadandwritesupport 15methodsforfullyfunctionalPOSIXfilesystem DBFSinterfaceisextensibleforeasilydefiningspecialpurpose implementations(providers) DBFScansurfaceoneormoreDBtablesasafilesystemorasingletable throughmultiplefilesystems Example, p aCheckImages g tablecanhave2filesystems y onit:
/CheckImages_by_customer/CustomerName/check.jpg /CheckImages_by_date/2008/September/check.jpg

DatabaseFilesystemsbuilton SecureFilesTechnology
Anewdatabasefeaturedesignedtobreaktheperformancebarrierkeeping filedataoutofdatabases SimilartoLOBsbutmuchfaster,andwithmorecapabilities Transparentencryption(withAdvancedSecurityOption) Compression, Compression deduplication(withAdvancedCompressionOption) Preservesthesecurity,reliability,andscalabilityofdatabase SupersetofLOBinterfacesallowseasymigrationfromLOBs Enablesconsolidationoffiledatawithassociatedrelationaldata Singlesecuritymodel Single g viewofdata Singlemanagementofdata

SecureFilesDetail
Base Table Oracle table holding metadata plus locator columns similar to a b-file pointer.
Delta Update Management Write Gather Cache

Encryption

Compression De-duplication Inode Management

IO Management

Space Management

Pedigree with a database filesystem

3/19/2010

20

GoalsofResearchPlatform
OptimizedforCollaboration OptimizeforActiveArchive MinimizeCosts E t ibl Compute Extensible C t F Framework k
InstitutionalCloudandExternalCloud

ImplementsBestPractices Pra ti es
Metadata Standards Institutional

OracleExadata
OracleExadataprovidesamidrangecapacitycomputing platformthatcanmeettheneedsofmany p ydataintensive scientificprogramsatacostmuchlowerthantraditional scientificplatforms.Whencombinedwithadditional computenodes, nodes Exadatacanscaletomeetbothcompute intensiveandIOintensivescientificprogramrequirements.

Definitions
CapacityComputing: Usingsmallerandless expensiveclustersofsystemstorunparallel problemsrequiringmodestcomputationalpower Capability p yComputing: p g Usingthemostpowerfulsupercomputerstosolve thelargestandmostdemandingproblemswiththe intenttominimizetimetosolution l

Moderndatabaseshavemuchtoofferin therealmofdataanalysis
RDF/OWLcanallowsemanticsearchingof data PredictiveAnalytics SpatialDataAnalysis TextMiningofUnstructuredContent

Someofthenativedataminingtechniquesand algorithmsavailable
Technique Classification Algorithms LogisticRegression NaiveBayes SupportVectorMachine DecisionTree MultipleRegression MinimumDescriptionLength OneClassSupportVectorMachine EnhancedKMeans OrthogonalPartitioningClustering Apriori NonnegativeMatrixFactorization

Regression AttributeImportance AnomalyDetection Clustering Association FeatureExtraction

SunOracleDatabaseMachineHardware
Complete,Preconfigured,Testedfor P f Performance
DatabaseServers ExadataStorageServers InfiniBandSwitches EthernetSwitch Precabled Keyboard,Video,Mouse(KVM) hardware PowerDistributionUnits(PDUs)

ReadytoDeploy
Pluginpower ConnecttoNetwork ReadytoRunDatabase

SunOracleDatabaseMachineFullRack
8SunFire X4170OracleDatabase servers 14ExadataStorageServers(AllSASor allSATA) 3Sun S Datacenter D t t InfiniBand I fi iB dSwitch S it h36
36portManagedQDR(40Gb/s)switch

1Admin Admin CiscoEthernetswitch Keyboard,Video,Mouse(KVM) hardware RedundantPowerDistributionsUnits (PDUs) SinglePointofSupportfromOracle

SunFire X4170 DatabaseReferenceServer


Processors Memory LocalDisks Disk Controller Network 2QuadCoreIntel XeonE5540Processors( (2.53 GHz) 72GB 4x146GB10KRPMSASDisks DiskControllerHBAwith512MBBatteryBacked Cache 2InfiniBand4XQDR(40Gb/s)Ports(Dualport HCA) ) 4EmbeddedGigabitEthernetPorts

Remote 1Ethernetport(ILOM) Management Power supplies Redundant

SunOracleExadataStorageServers
Processors Memory Disks 2QuadCoreIntel XeonE5540Processors(2.53GHz) 24GB 12x600GB15KRPMSAS OR 12x2TB7.2KRPMSATA 4x96GBSunFlashAcceleratorF20PCIeCards DiskControllerHBAwith512MBBatteryBackedCache 2InfiniBand4XQDR(40Gb/s)Ports(DualportHCA) g EthernetPorts 4EmbeddedGigabit 1Ethernetport(ILOM) Redundant

Flash DiskController Network Remote Management PowerSupplies

InfiniBandNetwork
UnifiedInfiniBandNetwork
StorageNetwork RACInterconnect ExternalConnectivity(optional)

HighPerformance,LowLatencyNetwork
80Gb/sbandwidthperlink(40Gb/seachdirection) SANlikeEfficiency(Zerocopy,bufferreservation) SimplemanageabilitylikeIPnetwork

Protocols l
ZerocopyZerolossDatagramProtocol(ZDPRDSv3)
LinuxOpenSource,LowCPUoverhead(Transfer3GB/swith2%CPUusage)

InternetProtocoloverInfiniBand(IPoIB)
LookslikenormalEthernettohostsoftware(tcp/ip,udp,http,ssh,)

InfiniBandNetwork
UsesSunDatacenter36portManagedQDR(40Gb/s) / InfiniBandswitches
Runs subnetmanagerandautomaticallydiscoversnetworktopology

Onlyonesubnetmanageractiveatatime 2leafswitchestoconnectindividualserverIBports 1spineswitchinFullRackforscalingouttoadditionalRacks

DatabaseServerandExadataServers
EachserverhasDualportQDR(40Gb/s)IBHCA ActivePassiveBonding AssignSingleIPaddress
PerformanceislimitedbyPCIebus,soactiveactivenotneeded

ConnectoneportfromtheHCAtooneleafswitchandtheotherport y tothesecondleafswitchforredundancy ConnectionsprewiredintheFactory

ScalingOuttoMultipleFullRacks
SingleInfiniBandNetwork SwitchtoaFatTreeTopology
Validupto8Racks Everyleafnodeinterconnectedwitheveryspineswitch Leaf Leaf switchesnotconnectedwithotherleaf leaf switches Spineswitchesnotconnectedwithotherspineswitches DatabaseandExadataServercablingunchanged. Interrackcablingdoneatinstallationtime

Upto3Racks
Extra E t cables bl already l d included i l d dwith itheach hDBMachine M hi

Greaterthan3Racks
Longercablesneedtobepurchased

InfiniBandNetwork ExternalConnectivity
External lconnectivityportsfor f
ConnecttomoreExadataserversforondiskbackup ConnecttomediaserversforTape p backup p DataLoading Client/ApplicationAccess

Validated V lid t dInfiniBand I fi iB dcable bl lengths l th


Upto5mPassiveCopper4XQDRQSFPcables Upto50mFiberOptic4XQDRQSFPcables(moreexpensive)

UseavailableportsonthetwoLeafswitches
12intheFullRack(6perleafswitch) 36intheHalfRack(18perleafswitch) 48intheQuarterRack(24perleafswitch) 32intheSingleServerConfiguration

ExternalConnectivity Ethernet
PerDatabaseMachine AdminAccess
1portfromAdminEthernetswitch 1portfromKVMSwitch Note ForDatabaseMachineBasicSystem System,thereisno KVMorEthernetswitchprovidedandtheILOMand managementportsareconnectedtodatacenternetwork directly

Database/Client/ApplicationAccess
Minimum1portperX4170 2moreEthernetportsperX4170available
Canusethemforbondedclient/applicationaccessorfor additionalconnectivity

Conclusion
Theultimategoalofscienceistocreatenew knowledgeandnewdiscoveries. Oraclehasanumberoffeatureswhichcanbenefitthe scientificcommunityandeasetheburdenof pedigree,datamanagement,andanalysis Usingadatabasefilesystemwillenabledataintensive collaborativescience. Asnewdiscoveriesaremadeanddatavolumes increase itisimperativetohavearobustdatabase increase, systemthatisnotonlycapableofmanagingthe pedigreeofthatdata,butalsoserveasaknowledge repositoryforthefuture. future Exadataprovidesandidealplatformforprogram consolidationandscientificcollaboration

ForMoreInformation
http://search.oracle.com
Exadata

or http://www.oracle.com/

También podría gustarte