DWH Roadmap Updated

chapter Existing Datawarehouse Infrastructure
IP. 192.168.75.2 IP. 192.168.75.3
ADCPS
Community Version
(Open Source)
Community Version Community Version

(Open Source) (Open Source) DIM Fact DM
Staging
DELL EMC VNX 5400
MCSYS DEV
STANDBY PROD MCSYS PROD PROMYST PROD Netbackup DBWH PROMYST DEV
chapter NEXT BIGDATA/Datawarehouse Infrastructure
PHASE I Reporting, BI,

Analytics & Data
Discovery
Enterprise Edition
Staging DIM Fact DM
DEDICATED STORAGE
PHASE II Reporting, BI,

Analytics & Data
Discovery
Staging DWH (DIM, FACT, DM)

Enterprise Edition
Reporting, BI,
Analytics & Data
PHASE III Discovery
Enterprise Edition DWH (DIM, FACT, DM)

chapter Modernize Datawarehouse Infrastructure on Hadoop (BIGDATA PLATFORM)
Reporting, BI,
Analytics & Data
FINAL Discovery
Enterprise Edition
chapter Big Data Solution Use Cases
chapter ETL Solution Use Cases
chapter Cloudera Apache Hadoop Solution Components
chapter
chapter
chapter
Cluster Architecture
Lingkungan cluster terdiri dari beberapa layanan perangkat lunak yang berjalan di beberapa node
server fisik.
Implementasinya membagi node server menjadi beberapa peran, dan setiap node memiliki
konfigurasi yang dioptimalkan untuk perannya dalam cluster.
Konfigurasi server fisik dibagi menjadi dua kelas yang luas ;
- Data Node, yang menangani sebagian besar pemrosesan Hadoop,
- Node Infrastruktur, yang mendukung layanan yang diperlukan untuk operasi cluster.
High performance network fabric menghubungkan node cluster bersama, dan memisahkan
jaringan core data dari fungsi manajemen.
chapter
chapter
Konfigurasi minimum yang didukung adalah enam node, meskipun setidaknya tujuh node yang direkomendasikan. Node
tersebut memiliki peran berikut:
chapter
Node Definitions
• Administration Node — provides cluster deployment and management capabilities. The Administration Node is optional in cluster deployments, depending on whether
existing provisioning, monitoring, and management infrastructure will be used. This reference architecture does not specify the configuration for an administration node, since it is
typically site-specific.
• Active Name Node — runs all the services needed to manage the HDFS data storage and YARN resource management. This is sometimes called the “master name node.”
There are four primary services running on the Active Name Node:
• Resource Manager (to support cluster resource management, including MapReduce jobs)
• NameNode (to support HDFS data storage)
• Journal Manager (to support high availability)
• Zookeeper (to support coordination)
• Standby Name Node — when quorum-based HA mode is used, this node runs the standby namenode process, a second journal manager, and an optional standby resource
manager. This node also runs a second Zookeeper service.
• High Availability (HA) Node — this node provides the third journal node for HA—the Active Name Nodes and Standby Name Nodes provide the first and second journal
nodes. It also runs a third Zookeeper service.
• Edge Node — provides an interface between the data and processing capacity available in the Hadoop cluster and a user of that capacity. An Edge Node has a an additional
connection to the Edge Network, and is sometimes called a “gateway node.” Edge Nodes are optional, but at least one is highly recommended. The operational databases required
for Cloudera Manager and additional metastores are on the first Edge Node.
• Data Node — runs all the services required to store blocks of data on the local hard drives and execute processing tasks against that data. A minimum of four Data Nodes are
required, and larger clusters are scaled primarily by adding additional Data Nodes. There are three types of services running on the Data Nodes:
• DataNode Daemon (to support HDFS data storage)
• NodeManager Daemon (to support YARN job execution)
• Standalone Daemons like Impalad and HBase Region Server (for services that are not run under YARN.)
chapter
chapter Hadoop On Cloud vs Hadoop On Premises
On-premise
Pros
 Biasanya lebih murah untuk beban kerja non-elastis - pada saat ini.
 Karena sebagian besar penyedia Cloud menggunakan beberapa jenis penyimpanan jaringan, kinerja biasanya
lebih baik untuk penyebaran berbasis bare-metal..
 Full control of the Hadoop hardware + software.
 Latency to and from systems you integrate with can be minimized.
 Physical data isolation and privacy.
Cons
 Di butuhkan tangan pintar untuk menginstal server dan jaringan.
 Penghalang masuk biasanya lebih tinggi karena ada “hardware friction” dalam mendapatkan tempat infrastruktur
baru.
chapter Hadoop On Cloud vs Hadoop On Premises
Off-premise ( Cloud )
Pros
 Leasing cost model with easier chargeback representation.
 Elasticity.
 Software based managed services as an option.
 Terintegrasi dengan baik dengan sumber data yang sudah tersimpan di cloud.
Cons
 No control of hardware, and limited control of software – especially if you use a cloud
vendor’s distribution.
 Latensi saat interupsi dengan resources dari data on-premise.
 Software level privacy, rather than hardware.
 Lock-in.
 Network bandwidth yang besar dibutuhkan jika mengintegerasikan dengan data production
on-premise.
chapter Hadoop On Bare Metal Server vs On Virtual Server
Bare metal eliminate the latency associated with virtual

machines and their virtualized network and I/O, one of the
strongest use cases for bare-metal cloud is workloads that
require the lowest latency.
chapter
Investment Options of Datawarehouse Technology Specification

Name Quantity Pricelist Investment Total Investment PROS CONS
SOLUSI HADOOP (CLOUDERA)
Hadoop (Cloudera) 10 Nodes Rp150,000,000 Rp1,500,000,000 Performance lebih baik karna cluster technology Server lebih banyak
Mampu menampung data structured dan
Server 10 Servers Rp450,000,000 Rp4,500,000,000
unstructured
Rp6,331,860,000
Implementation Service 1 Service Rp150,000,000 Rp150,000,000 Low Cost of Investment
RedHat Operating System
10 Servers Rp18,186,000 Rp181,860,000 Fitur active archive dan datawarehouse
(Subscription Yearly)
SOLUSI ORACLE
Oracle License 144 Cores $47,500 Rp102,600,000,000 Powerfull untuk OLTP High cost of Investment
Annual Technical Support Rp134,250,000,000
144 Cores $11,875 Rp25,650,000,000 Dipakai pada traditional datawarehouse Tidak support Unstructured Data
Dedicated Storage 1 Storage $400,000 Rp6,000,000,000
SOLUSI POSTGRE ENTERPRISE DB

PostgreEDB License 144 Cores $4,500 Rp9,720,000,000 Powerfull untuk OLTP High cost of Investment
Annual Technical Support
144 Cores $1,125 Rp2,430,000,000 Rp18,150,000,000 Dipakai pada traditional datawarehouse Tidak support Unstructured Data
Dedicated Storage 1 Storage $400,000 Rp6,000,000,000
TOOLS ETL ENTERPRISE

Security fiture, users management, job scheduler,
Pentaho Enterprise 1 Server $146,520 Rp2,197,800,000 Rp2,197,800,000 High cost of Investment
BigData Analytics tools dan support 24x7

DWH Roadmap Updated

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

DWH Roadmap Updated

Cargado por

Copyright:

Formatos disponibles

chapter Existing Datawarehouse Infrastructure

IP. 192.168.75.2 IP. 192.168.75.3

Community Version Community Version

DELL EMC VNX 5400

PHASE I Reporting, BI,

PHASE II Reporting, BI,

Staging DWH (DIM, FACT, DM)

Enterprise Edition DWH (DIM, FACT, DM)

Bare metal eliminate the latency associated with virtual

Investment Options of Datawarehouse Technology Specification

SOLUSI POSTGRE ENTERPRISE DB

TOOLS ETL ENTERPRISE

También podría gustarte