Está en la página 1de 39

Rumbo 2020

SAP FORUM

HANA & Hadoop


Javier Fernandez Leon

February 2016
Intel Inside. Powerful Solution Outside.
More information: www.descubrefujitsu.com/SAPforum

FTS INTERNAL

Powered by Intel
Xeon processor.

Rumbo 2020

HANA &
HADOOP
Intro

INDICE

Challenges of distributed Big Data

What is Apache Hadoop? Features

Comparison HANA vs Hadoop

HANA & Apache Spark

HANA & Hadoop combined. Scenarios

Uses Cases HANA & Hadoop

Managed Service Pay per use model for HANA & Hadoop

Intel Inside. Powerful Solution Outside.


More information: www.descubrefujitsu.com/SAPforum

FTS INTERNAL

Powered by Intel
Xeon processor.
Copyright 2014 FUJITSU LIMITED

Challenges of distributed Big Data


WE ARE DROWING IN OUR OWN DATA
Inefficient Data Processing
Real-time drill-down interaction is impossible when data is distributed across thousands
of nodes and processed in batches
Lack of Business Alignment
Need to align business decisions to changing external market conditions by processing
data in business systems with Hadoop Data Lakes together.
Costly Management of Big Data
Extensive amounts of data start clogging business systems with data that can be more
efficiently archived to less expensive systems

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


2

Powered by Intel
Xeon processor.
2015 FUJITSU

Gap between the Enterprise & Big Data Frameworks


WE ARE DROWING IN OUR OWN DATA

Complexity
Performance

Enterprise Core
Systems

Unable to work
together

Big Data
Frameworks &
Tools

.
Objetives : Standarize, simplify and Automate both worlds.
Intel Inside. Powerful Solution Outside.

FUJITSU

More information: www.descubrefujitsu.com/SAPforum


3

Powered by Intel
Xeon processor.
2015 FUJITSU

What is Apache Hadoop?


HADOOP

APACHE HADOOP is open source software that enables reliable, scalable, distributed
computing on clusters of inexpensive servers

RELIABLE : Software is fault tolerant, it expects and handles HW and SW failures


SCALABLE : designed for massive scale of processors, memory and local attached storage. Petabytes
DISTRIBUTED : Handles replication. Offers massively parallel programming model , MapReduce

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


4

Powered by Intel
Xeon processor.
2015 FUJITSU

Hadoop Logical Components


HADOOP

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


5

Powered by Intel
Xeon processor.
2015 FUJITSU

What does Hadoop bring to the Table?


HADOOP

Cost efficient data storage and processing for large volumes of structured, semi-structured
and unstructured data such as web logs, machine data, text data, call data records, audio,
video data.
BATCH PROCESSING
Where fast response times are less critical than reliability ad scalability

COMPLEX INFORMATION PROCESSING: Enable heavily recursive algorithms, machine learning &
queries that cannot be easily expressed in SQL

LOW VALUE DATA ARCHIVE: Data stays available, though access is slower. Scale up to Petabytes

POST-HOC ANALYSIS: Mine raw data that is either schema-less or where schema changes over time

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


6

Powered by Intel
Xeon processor.
2015 FUJITSU

Who uses Hadoop?


HADOOP

FACEBOOK

YAHOO

Facebook runs the worlds largest


Hadoop cluster. Just one of several
Hadoop clusters operated by the
company spans more than 4,000
machines, and houses over 100
petabytes of data

Yahoo runs Hadoop on 42,000


servers--that's 1,200 racks--in four
data centers. Its largest Hadoop
Cluster was 4000 nodes.

Facebook messaging (Hbase) and


generate reports for advertisers
who need to track effectiveness of
campaign

Use it for indexing of web crawl


results

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


7

TWITTER
Twitter uses Hadoop for product
analysis, social graph analysis,
generating indices for people search,
natural language processing and
many other applications

Powered by Intel
Xeon processor.
2015 FUJITSU

Comparison Hadoop & HANA


HADOOP & HANA
HADOOP

SAP HANA

Data Architecture

Unstructured data and files on disk

Structured data in memory

Data Structures

No predefined schema

Predefined schema & models

Performance

Very slow data access


(seconds to hours)

Very fast access (~<1 ms)

Scalability

Scale-out to thousands of low cost servers

Scale up/ Scale-out to many


server

Data Consistency

BASE ( Basic availability, soft state,


eventual consistency)

ACID ( Atomicity, Consistency,


Isolation, Durability)

Licensing costs

Free Open Source or commercial distros

Many options: cloud, enterprise

OLTP

No OLTP

Excellent OLTP

OLAP

Slow OLAP

Excellent OLAP

Server Fail Over

Query & Server Fail Over

Server Failover

Enterprise Admin Tools

Small

Excellent

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


8

Powered by Intel
Xeon processor.
2015 FUJITSU

Combination of HANA & Hadoop


HADOOP & HANA

SAP HANA = Instant results


HADOOP = Infinite storage
+ Raw Data

SAP & Hadoop =


Instant access +
Infinite scale
Intel Inside. Powerful Solution Outside.
FUJITSU

More information: www.descubrefujitsu.com/SAPforum


9

Powered by Intel
Xeon processor.
2015 FUJITSU

Connection to HANA
SMART DATA ACCESS ( SDA)

Benefits
Enables access to remote data access just like
local table
Smart query processing including query
decomposition with predicate push-down,
functional compensation
Supports data location agnostic development
No special syntax to access heterogeneous
data sources
Not restricted only to Hadoop
Heterogeneous data sources
Oracle, MS SQL, Teradata, DB2, Netezza

Hadoop Hive, vUDF, Spark


SAP HANA (BWoH, SoH)
SAP Sybase ASE, IQ, MaxDB
SAP Sybase ESP, SQLA

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


10

Powered by Intel
Xeon processor.
2015 FUJITSU

Example of scenario for bringing both worlds - POS


SCENARIO HADOOP - HANA

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


11

Powered by Intel
Xeon processor.
2015 FUJITSU

Spark
APACHE SPARK

VERY fast in-memory, data-processing framework like lightning fast. 100x


faster than Hadoop fast

Unlike Hadoop, supports batch and steaming Analysis --> Single Framework for
batch and near real time use cases

Spark requires a
1)Cluster Management :standalone, Hadoop YARN, Apache .
2) Distributed Storage System : supports HDFS, Cassandra,
Openstack Swift, Amazon S3 -

All Hadoop connectors can be leveraged in Spark

If you are going to start with Hadoop now, you should do it with Spark
Intel Inside. Powerful Solution Outside.
FUJITSU

More information: www.descubrefujitsu.com/SAPforum


12

Powered by Intel
Xeon processor.
2015 FUJITSU

SAP HANA Vora


WHAT IS INSIDE?

HANA Vora is an in-memory query engine which leverages and extends the Apache Spark
execution framework to provide enriched interactive analytics on Hadoop.

HANA Spark Adapter for improved performance between distributed systems


Compiled queries enable applications & data analysis to work more efficiently across nodes
Familiar OLAP experience on Hadoop to derive Business Insights from Big Data such as drill-down into HFDS data
Integration of SAP data with data Lakes
HANA connectivity on Hadoop
Enterprise Analytics(hierarchies) & Interactive SQL on Hadoop data
Data Tiering from HANA to Hadoop for OLAP scenarios using DLM
Archiving of ERP data using ILM to Hadoop

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


13

Powered by Intel
Xeon processor.
2015 FUJITSU

SAP HANA Vora


USE CASE : IoT for a Turbine

Sensors stream data continuously

Sensors typically structured in a Hierarchy

Information regarding Hierarchy are typically stored on ERP System

Information important for error detection: two sensors

ROLE OF HANA VORA

Providing OLAP capabilities - Joining Hierachy with IoT Data

Bridges gap between Enterprise systems and cluster : BOM of


turbine easily accesible

Performance of in-memory computing: On both Enterprise & Cluster


processing
Intel Inside. Powerful Solution Outside.
FUJITSU

More information: www.descubrefujitsu.com/SAPforum


14

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios

Intel Inside. Powerful Solution Outside.


FUJITSU
INTERNAL
USE ONLY

More information: www.descubrefujitsu.com/SAPforum


15

Powered by Intel
Xeon processor.
Copyright 2014 FUJITSU LIMITED
Copyright 2014 FUJITSU
2015 LIMITED
FUJITSU

Key Scenarios
Example of Scenarios

Flexible data store Using Hadoop as a flexible store of data captured from multiple sources,
including SAP and non-SAP software, enterprise software, and externally sourced data

Simple database Using Hadoop as a simple database for storing and retrieving data in very large
data sets

Processing engine Using the computation engine in Hadoop to execute business logic or some
other process

Data analytics Mining data held in Hadoop for business intelligence and analytics

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


16

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios - Architecture


EXAMPLE OF USE SCENARIOS

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


17

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop as Flexible Data Store


EXAMPLE OF USE SCENARIOS
SCENARIO

DESCRIPTION

SAMPLE USE
CASES

COMMENT

Social Media

Real-time capture of data from social


media sites, especially of
unstructured Text

Comments on
products on Twitter,
Facebook, and
Amazon

Combine social media data


with other data, for CRM
data or product
data, in real time to gain
insight.

Data Stream
Capture

Real-time capture of high volume,


rapidly arriving data streams

Smart meters, factory


floor machines, real
time web logs,
sensors in vehicles

Data Archive

Capture of archive logs that would


otherwise be sent to off-line storage

Archive Data or
computer systems
logs

OLTP Transaction
Data

Long-term persistence of
transactional data from
historical online transaction
processing (OLTP)

Call center,
inventory..

Lower costs when


compared with
conventional solutions

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


18

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop as Flexible Data Store


EXAMPLE OF USE SCENARIOS
SCENARIO

DESCRIPTION

SAMPLE USE CASES

Reference Data

Copy of existing large


reference data sets

Census surveys, GIS, large


industry specific data sets,
weather measurement and
tracking systems

Store reference data


alongside other data in one
place to make it easier to
combine for analytic
purposes

E-mail histories

Capture logs of e-mail


correspondence a company
sends and recevives

Fulfillment of legal
requirements for e-mail
persistence and for use in
analytics

Combine data from email


with other data to support,
for example, risk
management

Document & Multmedia


Storage

Capture of business
documents generated and
received by business.
BLOBS

Healthcare, insurance and


other businesses that
generate or use large
volumes of documents that
must be kept for extended
periords

Store unlimited number of


documents in Hadoop, for
example, using HBAse

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


19

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop as Processing Engine


EXAMPLE OF USE SCENARIOS

Use Hadoop as a data processing engine for ETL rationalization to feed SAP HANA

MapReduce Programs execute process logic

Pig for data analysis

Mahout for data mining and machine learning

Replicate master data to hadoop for data processing

Feed results to SAP HANA with Data Services and merge with conformed model

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


20

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop as Processing Engine


EXAMPLE OF USE SCENARIOS
SCENARIO

DESCRIPTION

SAMPLE USE CASES

ETL Rationalization

Low-latency ingestion of data from


operational systems

Tiered storage: High-value data loaded


and transformed in HANA in parallel, offload preprocessing to hadoop

Identify differences

Differences in large, but similar sets of data

DNA Analysis

Hadoop using
Mapreduce

Risk Analysis

Look for known patterns in data in Hadoop


that suggest risky behavior

Risk in credit cards; Rogue traders

Da

Data Cleansing and


enrichment

Fix data issues. Enhance with additional


information

Add demographic or other data to, for


example, customer Web logs

Data Mining

Look for patterns, data clusters, and


correlations in Hadoop

Analyze machine data to predict


Correlate customer behaviour

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


21

COMMENT

Require Mahout

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop & HANA for Analytics


EXAMPLE OF USE SCENARIOS

Hadoop storage is sometimes so high that cant be replicated into SAP HANA in a cost effective or timely
manner

Some of the analysis must be done in Hadoop as well as SAP HANA

Hadoop queries require longer processing times that SAP HANA

Analysis will likely require combining data from Hadoop , SAP HANA and other sources

Two approaches:

Two-Phase Analytics : run analysis continually o Hadoop, then periodic updates to SAP HANA for
fast interactive query response
Federated Queries:
Split analysis into parts and run async on Hadoop & SAP HANA
Federate results in SAP HANA or BI
Intel Inside. Powerful Solution Outside.

FUJITSU

More information: www.descubrefujitsu.com/SAPforum


22

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop & HANA for Analytics


EXAMPLE OF USE SCENARIOS Two-Phase Analytics

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


23

Powered by Intel
Xeon processor.
2015 FUJITSU

Key Scenarios Hadoop & HANA for Analytics


EXAMPLE OF USE SCENARIOS Federated Queries

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


24

Powered by Intel
Xeon processor.
2015 FUJITSU

Use Cases - Healthcare


USE CASES

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


25

Powered by Intel
Xeon processor.
2015 FUJITSU

Use Cases - Healthcare


EXAMPLE

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


26

Powered by Intel
Xeon processor.
2015 FUJITSU

Use Cases Predictive Maintenance


EXAMPLE OF USE SCENARIOS
Business Challenges
A computer server manufacturer wants to implement effective preventative maintenance by identifying problems as
they arise then take prompt action to prevent the problem occurring at other customer sites
Technical Challenges
Identifying problems by analyzing text data from call centers, customer questionnaires together with server logs
generated by their hardware
Combining results with CRM, sales and manufacturing data to predict which servers are ikely to have problems in
the future
Solution
Use SAP Data Services to analyze call center data and questionnaires stored in Hadoop and identify potential
problems
Use HANA to merge results from Hadoop with server logs to identify indicators in those logs of potential problems
Combine with CRM, bill of material and production/manufacturing data to identify cases where preventative
maintenance would help

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


27

Powered by Intel
Xeon processor.
2015 FUJITSU

Pay per use


Models for
HANA &
Hadoop

Intel Inside. Powerful Solution Outside.


FUJITSU
INTERNAL
USE ONLY

More information: www.descubrefujitsu.com/SAPforum


28

Powered by Intel
Xeon processor.
Copyright 2014 FUJITSU LIMITED
Copyright 2014 FUJITSU
2015 LIMITED
FUJITSU

Modelo de Servicio definido por 5 parmetros

EJEMPLO: Sistema SAP ERP 6.0 de PRODUCCIN


5
parmetros
standard
definen el
servicio
SAP

Cualitativos

Cuantitativos

Availability class

99.5%

Managed operations

24 7

Disaster-recovery class

DR, local HA,.

Managed performance

Dialog response
time 90% < 1 sec.

Additional
Certification(s)

ISAE3402 (SOX),
SAS70

Estos parmetros reflejan los SLAs!!!!

Estos parmetros reflejan el uso!!!!

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


29

Powered by Intel
Xeon processor.
2015 FUJITSU

SLAs verificables desde SAP

Las transacciones
representanla utilizacin
real del sistema SAP y
estn vinculadas al negocio

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


30

Powered by Intel
Xeon processor.
2015 FUJITSU

Y qu pasa con SAP HANA?

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


31

Powered by Intel
Xeon processor.
2015 FUJITSU

HANA en Cloud en modo pago por uso - vHANA


vHANA CLOUD

SERVICIOS INCLUDOS

PAGO MENSUAL EN
FUNCIN DE LA
MEMORIA CONSUMIDA
EN HANA

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


32

Powered by Intel
Xeon processor.
2015 FUJITSU

Service Governance
(Service Desk, Service-Management)

Hadoop in Pay Per Use based on Openstack


Hadoop Integration with SAP HANA
(Administration , Connectivity)

Level 5

HADOOP PLATFORM Services

(Administration/Monitoring, Backup- & Recovery, patches,


upgrades )

OPENSTACK System Services


(Administration/Monitoring, patches, upgrades ...)

Level 3

OPENSTACK FRAMEWORK

Level 2

(Ceph, Neutron, Nova. Heat.)


Data Center and Network Services
(Administration Monitoring , Capacity-Management)

Level 1

Intel Inside. Powerful Solution Outside.


FUJITSU

Level 4

More information: www.descubrefujitsu.com/SAPforum


33

Powered by Intel
Xeon processor.
2015 FUJITSU

Hadoop in Pay Per Use based on Openstack


HADOOP CLOUD

SERVICIOS INCLUDOS

PAGO MENSUAL SERVICIO


GESTONADO EN FUNCIN
DE LA MEMORIA/CPU/
CONSUMIDA POR HADOOP

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


34

Powered by Intel
Xeon processor.
2015 FUJITSU

Take Aways

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


35

Powered by Intel
Xeon processor.
2015 FUJITSU

Summary
TAKE AWAYS

Hadoop excels at very high-scale, low-cost/TB and data type flexibility

SAP HANA excels at speed and structure, plus is fully integrated with Business Suite Enterprise Logic

Leverage strenghs of both platforms in data store, data processing and analytics scenarios

Carefully evaluate your requirements and use case against these scenarios

If you are about to start with Hadoop, use Apache Spark & Vora

Both can be deployed in a simple, pay per use model by Fujitsu

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


36

Powered by Intel
Xeon processor.
2015 FUJITSU

Intel Inside. Powerful Solution Outside.


FUJITSU

More information: www.descubrefujitsu.com/SAPforum


37

Powered by Intel
Xeon processor.
2015 FUJITSU

Rumbo 2020

FTS INTERNAL

También podría gustarte