Está en la página 1de 3

Unit1

Introduction
Introduction:DataProcessingArchitectures,componentsandprocessesDataStoresand
Datakind,Challenges"BigData"andotherwise
SpecialConsiderationsinBigDataAnalysis:Background,TheoryinSearchofData,Data
inSearchofTheory,Overfitting,BignessBias,TooMuchData,FixingData,DataSubsetsin
BigData:NeitherAdditivenorTransitive,AdditionalBigDataPitfalls.
ProvidingStructuretoUnstructuredData:Background,MachineTranslation,Autocoding,
IndexingandTermExtraction
Unit2
DataandFeatures
ComponentPartsofDataScience:DataTypes,ClassesofAnalyticTechniques,Learning
Models,ExecutionModelsFractalAnalyticModel,AnalyticSelectionProcess:
ImplementationConstraints
FeatureEngineering:FeatureSelection,DataVeracity,ApplicationofDomainKnowledge,
CurseofDimensionality
Unit3
DataandAnalysis
SimpleAnalyticTechniques:Background,LookattheData,DataRange,Denominator,
FrequencyDistributions,MeanandStandardDeviation,EstimationOnlyAnalyses
DeepDiveintoAnalysis:Background,AnalyticTasks,Clustering,Classifying,
Recommending,andModelling,DataReduction,NormalisingandAdjustingData,Find
RelationshipsNotSimilarities
Unit4
ApplyingNuancesofDataSciencetoTextProcessingAndInformationRetrieval
TextMining:Definition,Genericarchitecture,TextMiningOperations,FrequentItemset
Mining,CategorizationDocumentRepresentation,Clusteringandcategorization,Bayesian
Classifier
TextProcessing:Tokenization,Stem,Stop,nGram,categorization,informationextraction
Unit5
BignatureofDataCasestudy
MapReduce,ThePaper:Programmingmodel,TypesandExamplesImplementationand
ExecutionArchitecturePartitioning,types,Combiners,DataLocality
Hadoop:ChallengesatLargeScaleandtheHadoopApproachHDFSMapReducein
Hadoop
ReadingMaterial:(Innoparticularorderofprecedence)
1. PrinciplesofBigData:Preparing,SharingandAnalyzingComplexInformation,JulesJ
Berman,FirstEdition,MKPublishers,2013.
2. TheFieldGuidetoDataScience:
http://www.boozallen.com/media/file/TheFieldGuidetoDataScience.pdf
3. UnderstandingBigData:
ftp://129.35.224.12/software/tw/Defining_Big_Data_through_3V_v.pdf
4. Ghemawatet.alGoogle,MapReduce:SimpliedDataProcessingonLargeClusters

5.
6.
7.

8.

http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduc
eosdi04.pdf
HadoopTutorial:http://developer.yahoo.com/hadoop/tutorial/
ScalableSQLandNoSQLDataStores
http://www.sigmod.org/publications/sigmodrecord/1012/pdfs/04.surveys.cattell.pdf
OracleInformationArchitecture
http://www.oracle.com/technetwork/topics/entarch/articles/oeabigdataguide1522052
.pdf
ChallengesandopportunitieswithBigData
http://www.purdue.edu/discoverypark/cyber/assets/pdfs/BigDataWhitePaper.pdf

9. FeatureEngineeringinTextProblems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.9770&rep=rep1&type=pd
f
10. ClusteringExplained(ReadatleasttillVectorSpaceModelandKMeans)
http://www.iula.upf.edu/materials/040701wanner.pdf
Kmeansbrokendown
http://www.engineering.uiowa.edu/~ie_155/Lecture/Kmeans.pdf
11. KMeansexplained(especiallythereasoningforkmeans)
http://www.croce.ggf.br/dados/K%20mean%20Clustering1.pdf
12. NaiveBayesBrokenDown
http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4NaiveBayes.pdf

13. TextMiningSlideshttp://www.vscse.org/summerschool/2013/Abbott.pdf
14. TextMiningHandbookhttp://www.roelsbeestenboel.nl/text.pdf(Chapter1and2,4and
5)

OtherInterestingReadingMaterial:
1. FeatureSelectionforHighDimensionalData:APearsonRedundancyBasedFilter
http://kzi.polsl.pl/~jbiesiada/prace/selekcja/07Wroclaw.pdf
2. OntheRoleofFeatureSelectioninMachineLearning:ThesisonFeatureEngineering
http://www.cs.huji.ac.il/labs/learning/Theses/Navot_PhD.pdf
3. FeatureSelectionMethods(GoodThesis)
http://www.cs.cmu.edu/~kdeng/thesis/feature.pdf
http://www.dccia.ua.es/~boyan/papers/TesisBoyan.pdf
Asurvey
:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.1337&rep=rep1&type=
pdf
4. DimensionalityReduction:http://infolab.stanford.edu/~ullman/mmds/ch11.pdf
5. BayesExplained(Slides)(Veryuseful)
http://www.stanford.edu/class/cs124/lec/naivebayes.pdf

6. NaiveBayesClassifierswithexamples(VeryIntuitiveExplanation)
http://www.cs.ucr.edu/~eamonn/CE/Bayesian%20Classification%20withInsect_exampl
es.pdf

También podría gustarte