Documentos de Académico
Documentos de Profesional
Documentos de Cultura
ADCPS
Community Version
(Open Source)
MCSYS DEV
STANDBY PROD MCSYS PROD PROMYST PROD Netbackup DBWH PROMYST DEV
chapter NEXT BIGDATA/Datawarehouse Infrastructure
Enterprise Edition
Staging DIM Fact DM
DEDICATED STORAGE
chapter NEXT BIGDATA/Datawarehouse Infrastructure
Reporting, BI,
Analytics & Data
PHASE III Discovery
Reporting, BI,
Analytics & Data
FINAL Discovery
Enterprise Edition
chapter Big Data Solution Use Cases
chapter ETL Solution Use Cases
chapter Cloudera Apache Hadoop Solution Components
chapter
chapter
chapter
Cluster Architecture
Lingkungan cluster terdiri dari beberapa layanan perangkat lunak yang berjalan di beberapa node
server fisik.
Implementasinya membagi node server menjadi beberapa peran, dan setiap node memiliki
konfigurasi yang dioptimalkan untuk perannya dalam cluster.
Konfigurasi server fisik dibagi menjadi dua kelas yang luas ;
- Data Node, yang menangani sebagian besar pemrosesan Hadoop,
- Node Infrastruktur, yang mendukung layanan yang diperlukan untuk operasi cluster.
High performance network fabric menghubungkan node cluster bersama, dan memisahkan
jaringan core data dari fungsi manajemen.
chapter
chapter
Konfigurasi minimum yang didukung adalah enam node, meskipun setidaknya tujuh node yang direkomendasikan. Node
tersebut memiliki peran berikut:
chapter
Node Definitions
• Administration Node — provides cluster deployment and management capabilities. The Administration Node is optional in cluster deployments, depending on whether
existing provisioning, monitoring, and management infrastructure will be used. This reference architecture does not specify the configuration for an administration node, since it is
typically site-specific.
• Active Name Node — runs all the services needed to manage the HDFS data storage and YARN resource management. This is sometimes called the “master name node.”
There are four primary services running on the Active Name Node:
• Resource Manager (to support cluster resource management, including MapReduce jobs)
• NameNode (to support HDFS data storage)
• Journal Manager (to support high availability)
• Zookeeper (to support coordination)
• Standby Name Node — when quorum-based HA mode is used, this node runs the standby namenode process, a second journal manager, and an optional standby resource
manager. This node also runs a second Zookeeper service.
• High Availability (HA) Node — this node provides the third journal node for HA—the Active Name Nodes and Standby Name Nodes provide the first and second journal
nodes. It also runs a third Zookeeper service.
• Edge Node — provides an interface between the data and processing capacity available in the Hadoop cluster and a user of that capacity. An Edge Node has a an additional
connection to the Edge Network, and is sometimes called a “gateway node.” Edge Nodes are optional, but at least one is highly recommended. The operational databases required
for Cloudera Manager and additional metastores are on the first Edge Node.
• Data Node — runs all the services required to store blocks of data on the local hard drives and execute processing tasks against that data. A minimum of four Data Nodes are
required, and larger clusters are scaled primarily by adding additional Data Nodes. There are three types of services running on the Data Nodes:
• DataNode Daemon (to support HDFS data storage)
• NodeManager Daemon (to support YARN job execution)
• Standalone Daemons like Impalad and HBase Region Server (for services that are not run under YARN.)
chapter
chapter Hadoop On Cloud vs Hadoop On Premises
On-premise
Pros
Biasanya lebih murah untuk beban kerja non-elastis - pada saat ini.
Karena sebagian besar penyedia Cloud menggunakan beberapa jenis penyimpanan jaringan, kinerja biasanya
lebih baik untuk penyebaran berbasis bare-metal..
Full control of the Hadoop hardware + software.
Latency to and from systems you integrate with can be minimized.
Physical data isolation and privacy.
Cons
Di butuhkan tangan pintar untuk menginstal server dan jaringan.
Penghalang masuk biasanya lebih tinggi karena ada “hardware friction” dalam mendapatkan tempat infrastruktur
baru.
chapter Hadoop On Cloud vs Hadoop On Premises
Off-premise ( Cloud )
Pros
Leasing cost model with easier chargeback representation.
Elasticity.
Software based managed services as an option.
Terintegrasi dengan baik dengan sumber data yang sudah tersimpan di cloud.
Cons
No control of hardware, and limited control of software – especially if you use a cloud
vendor’s distribution.
Latensi saat interupsi dengan resources dari data on-premise.
Software level privacy, rather than hardware.
Lock-in.
Network bandwidth yang besar dibutuhkan jika mengintegerasikan dengan data production
on-premise.
chapter Hadoop On Bare Metal Server vs On Virtual Server
SOLUSI ORACLE
Oracle License 144 Cores $47,500 Rp102,600,000,000 Powerfull untuk OLTP High cost of Investment
Annual Technical Support Rp134,250,000,000
144 Cores $11,875 Rp25,650,000,000 Dipakai pada traditional datawarehouse Tidak support Unstructured Data
(Subscription Yearly)
Dedicated Storage 1 Storage $400,000 Rp6,000,000,000