Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Science Aproductofdataanalysis
Sciencedoesnotresultfromthelaunchofa missionorthecollectionofdata.Rather, scienceonlyoccursthroughtheanalysisand understandingofthatdata.
PhilosophyoftheNASAScienceMissionDirectorate(SMD)
OraclesR&DPresence
National lIgnitionFacility l Fusionand dLaserResearch h
Database,SecureFiles,OrchestrationandMiddleware,Virtualization, Dataguard,GridControl,StorageManagement,Partitioning
CERN/LargeHadronCollider
Database,Streams,Dataguard,GridControl,StorageManagement, Partitioning
MaxPlanckInstitute
Database,SecureFiles,Dataguard,GridControl,StorageManagement, Partitioning
NBII.gov NationalBiologicalInformationInfrastructure
Middleware,Portal,Spatial http://www.nbii.gov/portal/server.p
JetPropulsionLab
Database,GridControl,Partitioning,StorageManagement
FutureofScientificComputingandAnalysis
DataIntensive
+
Collaborative
DataIntensiveCollaborativeScience
DataIntensiveCollaborativeScience
Cost Knowledge Base Complexity
Drivers
Interdependence
Collaboration
Enablers
Network Capacity
Standards JSR/JCR
Web 2.0
Moores Law
Oracle
DataChallengesforScience
Stewardship thelongtermpreservationof datasoastoensureitscontinuedvalueforboth anticipatedandunanticipateduses Integrity/Provenance dataiscomplete, accurate,verifiable,ifpossiblereproducible Accessibility availabilityofresearchdatato researchers h other th than th those th who h generated t dth the datawhenthedataisneeded Privacy ensuringdataisaccessedinan appropriatemannerinaverifiablemannerby theappropriatepeopleorresources
UseCasesforDataSharing
Reanalysis
Neworexistingdataforsameproblem
SecondaryAnalysis
Reuseofsamedatafordifferentproblem
Replication
Differentdatatostudysameproblem
Verification
3rdpartyreanalysisusingexistinginitialdata.
Collaborators
InitialInvestigators SubsequentAnalysts S i ifi C Scientific Community i FundingAgenciesandFoundations
ObstaclestoDataSharing
Human
Lack L kof fF Foresight i h FearofConflicting C l i Conclusions BreechofConfidentiality Greater G Influence I fl Compromisingof P t ti lP Potential Profits fit Systematic ProjectLevelFunding OriginationRules LackofGuidelines LackofStandards
Classifying Archiving Documenting Metadata
TechnicalObstaclestoCollaboration
Stovepiped/DesktopSystems LackofInstitutionalITSupport Informal f lDataSh Sharing i Mechanisms h i LackofExpertise
DataChallengestoCollaboration
PhysicalLimitations I/OIntensive limitationsonmaxIOPS Networkspeeds/cost time/costtoshipdatatocomputenodes MultipleDataSilos Governanceissues Pedigreeofthedata Multipleaccesspoliciestogettothedata Duplicate p datastoredineachsilo Needtoscaledisparatesystemsasdatagrows IncreasedeffortrequiredforScientists,Developers,Administrators Correlatingthedataacrossdatasilos Coordinatedbackupandrecoveryplan MultipleDataAggregationEfforts
Database
Filesystem
ProblemwithFileSystems(bfiles)
TheSplitArchitecture astepinthewrongdirection
Manyapplicationsmanipulatebothfilesandrelationaldata Richuserexperience,compliance,businessintegration Thissplitcompromisesthevalueofthedata. Difficultymergingdata InabilitytoperformFederatedSearches LegacyofStovePipedData Disjointsecurityandauditingmodels Changescannotbemadeatomically Backupandrecoveryarefragmented Search S hacrossrelational l ti ldata d t and dfiles fil is i difficult diffi lt Spacemanagementiscomplicated Separate p interfacesandprotocols p Applicationarchitecturemorecomplex
IntegratingUnstructuredData
New in Oracle Database 11g
RFID DICOM
3D
Binary XML
Images
SecureFiles DBFS
DisparateDataTypes
DatasetCategory OpticsMetrology Productionchecklists Calibration OIInspection OIInspection Online AutoAlignment TargetDiagnosticRaw LaserDiagnosticsRaw ShotAnalysis Anal sisResults Res lts Operations Examples DataType OpticsMeasurements XML,Other LRUmanufacturingchecklist XLS EngNodeSensitivity,CalATP XML,Other DMS,IMS,CIM,VIDARlabs Images(jpeg,GIF) FODI,PODI,LOIS Images(jpeg,GIF) AASamples Images SXI,Dante,FABS HDF5,Other EnergyNode,ISPCal HDF5,Other Anal eddata Analyzed HDF5 Other HDF5, Environmental Scalar
DatabaseFilesystems
BridgetheGapbetweenFilesystemsand R l ti Relational l D Database t b S Systems t
MaintainFilesystemPerformance Leveragemultipleaccessmethods SingleSecurityMechanism UnifiedAdministrativeTools DataPedigree UnifiedArchitectureandSkillsets LeverageInstitutionalResourcesforIT EnablingCollaborationaroundData OptimizedforDataAccess
Filesystems
Databases
DatabaseFilesystems
DBFSisafilesysteminthedatabase,usesdatabaseforstorageandbringsall ofdatabasetechnologytofilesystems FuseClient DBFSimplementsthefilesysteminterfaces: 2methods(getpath,list)forareadonlyfilesystem 5methodsforafilesystemwithreadandwritesupport 15methodsforfullyfunctionalPOSIXfilesystem DBFSinterfaceisextensibleforeasilydefiningspecialpurpose implementations(providers) DBFScansurfaceoneormoreDBtablesasafilesystemorasingletable throughmultiplefilesystems Example, p aCheckImages g tablecanhave2filesystems y onit:
/CheckImages_by_customer/CustomerName/check.jpg /CheckImages_by_date/2008/September/check.jpg
DatabaseFilesystemsbuilton SecureFilesTechnology
Anewdatabasefeaturedesignedtobreaktheperformancebarrierkeeping filedataoutofdatabases SimilartoLOBsbutmuchfaster,andwithmorecapabilities Transparentencryption(withAdvancedSecurityOption) Compression, Compression deduplication(withAdvancedCompressionOption) Preservesthesecurity,reliability,andscalabilityofdatabase SupersetofLOBinterfacesallowseasymigrationfromLOBs Enablesconsolidationoffiledatawithassociatedrelationaldata Singlesecuritymodel Single g viewofdata Singlemanagementofdata
SecureFilesDetail
Base Table Oracle table holding metadata plus locator columns similar to a b-file pointer.
Delta Update Management Write Gather Cache
Encryption
IO Management
Space Management
3/19/2010
20
GoalsofResearchPlatform
OptimizedforCollaboration OptimizeforActiveArchive MinimizeCosts E t ibl Compute Extensible C t F Framework k
InstitutionalCloudandExternalCloud
ImplementsBestPractices Pra ti es
Metadata Standards Institutional
OracleExadata
OracleExadataprovidesamidrangecapacitycomputing platformthatcanmeettheneedsofmany p ydataintensive scientificprogramsatacostmuchlowerthantraditional scientificplatforms.Whencombinedwithadditional computenodes, nodes Exadatacanscaletomeetbothcompute intensiveandIOintensivescientificprogramrequirements.
Definitions
CapacityComputing: Usingsmallerandless expensiveclustersofsystemstorunparallel problemsrequiringmodestcomputationalpower Capability p yComputing: p g Usingthemostpowerfulsupercomputerstosolve thelargestandmostdemandingproblemswiththe intenttominimizetimetosolution l
Moderndatabaseshavemuchtoofferin therealmofdataanalysis
RDF/OWLcanallowsemanticsearchingof data PredictiveAnalytics SpatialDataAnalysis TextMiningofUnstructuredContent
Someofthenativedataminingtechniquesand algorithmsavailable
Technique Classification Algorithms LogisticRegression NaiveBayes SupportVectorMachine DecisionTree MultipleRegression MinimumDescriptionLength OneClassSupportVectorMachine EnhancedKMeans OrthogonalPartitioningClustering Apriori NonnegativeMatrixFactorization
SunOracleDatabaseMachineHardware
Complete,Preconfigured,Testedfor P f Performance
DatabaseServers ExadataStorageServers InfiniBandSwitches EthernetSwitch Precabled Keyboard,Video,Mouse(KVM) hardware PowerDistributionUnits(PDUs)
ReadytoDeploy
Pluginpower ConnecttoNetwork ReadytoRunDatabase
SunOracleDatabaseMachineFullRack
8SunFire X4170OracleDatabase servers 14ExadataStorageServers(AllSASor allSATA) 3Sun S Datacenter D t t InfiniBand I fi iB dSwitch S it h36
36portManagedQDR(40Gb/s)switch
SunOracleExadataStorageServers
Processors Memory Disks 2QuadCoreIntel XeonE5540Processors(2.53GHz) 24GB 12x600GB15KRPMSAS OR 12x2TB7.2KRPMSATA 4x96GBSunFlashAcceleratorF20PCIeCards DiskControllerHBAwith512MBBatteryBackedCache 2InfiniBand4XQDR(40Gb/s)Ports(DualportHCA) g EthernetPorts 4EmbeddedGigabit 1Ethernetport(ILOM) Redundant
InfiniBandNetwork
UnifiedInfiniBandNetwork
StorageNetwork RACInterconnect ExternalConnectivity(optional)
HighPerformance,LowLatencyNetwork
80Gb/sbandwidthperlink(40Gb/seachdirection) SANlikeEfficiency(Zerocopy,bufferreservation) SimplemanageabilitylikeIPnetwork
Protocols l
ZerocopyZerolossDatagramProtocol(ZDPRDSv3)
LinuxOpenSource,LowCPUoverhead(Transfer3GB/swith2%CPUusage)
InternetProtocoloverInfiniBand(IPoIB)
LookslikenormalEthernettohostsoftware(tcp/ip,udp,http,ssh,)
InfiniBandNetwork
UsesSunDatacenter36portManagedQDR(40Gb/s) / InfiniBandswitches
Runs subnetmanagerandautomaticallydiscoversnetworktopology
DatabaseServerandExadataServers
EachserverhasDualportQDR(40Gb/s)IBHCA ActivePassiveBonding AssignSingleIPaddress
PerformanceislimitedbyPCIebus,soactiveactivenotneeded
ScalingOuttoMultipleFullRacks
SingleInfiniBandNetwork SwitchtoaFatTreeTopology
Validupto8Racks Everyleafnodeinterconnectedwitheveryspineswitch Leaf Leaf switchesnotconnectedwithotherleaf leaf switches Spineswitchesnotconnectedwithotherspineswitches DatabaseandExadataServercablingunchanged. Interrackcablingdoneatinstallationtime
Upto3Racks
Extra E t cables bl already l d included i l d dwith itheach hDBMachine M hi
Greaterthan3Racks
Longercablesneedtobepurchased
InfiniBandNetwork ExternalConnectivity
External lconnectivityportsfor f
ConnecttomoreExadataserversforondiskbackup ConnecttomediaserversforTape p backup p DataLoading Client/ApplicationAccess
UseavailableportsonthetwoLeafswitches
12intheFullRack(6perleafswitch) 36intheHalfRack(18perleafswitch) 48intheQuarterRack(24perleafswitch) 32intheSingleServerConfiguration
ExternalConnectivity Ethernet
PerDatabaseMachine AdminAccess
1portfromAdminEthernetswitch 1portfromKVMSwitch Note ForDatabaseMachineBasicSystem System,thereisno KVMorEthernetswitchprovidedandtheILOMand managementportsareconnectedtodatacenternetwork directly
Database/Client/ApplicationAccess
Minimum1portperX4170 2moreEthernetportsperX4170available
Canusethemforbondedclient/applicationaccessorfor additionalconnectivity
Conclusion
Theultimategoalofscienceistocreatenew knowledgeandnewdiscoveries. Oraclehasanumberoffeatureswhichcanbenefitthe scientificcommunityandeasetheburdenof pedigree,datamanagement,andanalysis Usingadatabasefilesystemwillenabledataintensive collaborativescience. Asnewdiscoveriesaremadeanddatavolumes increase itisimperativetohavearobustdatabase increase, systemthatisnotonlycapableofmanagingthe pedigreeofthatdata,butalsoserveasaknowledge repositoryforthefuture. future Exadataprovidesandidealplatformforprogram consolidationandscientificcollaboration
ForMoreInformation
http://search.oracle.com
Exadata
or http://www.oracle.com/