UHadoop - Managed Hadoop cluster

Product Introduction

UHadoop is a hosting service for a big data processing system running on the SCloud platform. By running Hadoop and Spark on a cloud platform, users can easily analyze and process their own data using other peripheral systems in the Hadoop and Spark ecosystems (such as Apache Hive, Apache Pig, HBase, etc.).

Product architecture diagram

HDFS

HDFS is deployed in HA mode by default, with two Name nodes deployed on master1 and master2 respectively, Data nodes distributed on all Core nodes, and Task not deploying Data nodes.

Yarn

Yarn also adopts HA deployment by default, with 2 ResourceManagers deployed on master1 and master2 respectively, and Node managers distributed on all Core and Task nodes.

Hive

Hive currently only supports on yarn mode, and two Hive-MetaStores are deployed in master1 and master2, and connected to local mySQL, avoiding Hive service failures caused by the downtime of a single master node.

You can connect to Hive services through HiveCli or Beeline.

HBase

HBase is deployed by HA by default, with two HMasters deployed on master1 and master2 respectively, and HRegionServer distributed on all Core nodes.

Spark

Spark adopts the On Yarn pattern, please refer to the Spark Development Guide for details.

Product features and advantages

Convenient

Create clusters in minutes, without worrying about node allocation, deployment, and optimization; With rich examples and scenario tutorials, you can quickly get started and achieve your business goals.

Use

Automated deployment according to the selected hardware model (CPU, memory, disk), selected software combination and version;
Users can request where their cluster is deployed based on the geographic location of themselves or the data source. Currently, UHADOOP supports regions such as North China, Guangzhou, and Hong Kong. It will be released to all regions supported by SCloud in the future.

Elasticity

The cluster can be large or small, and supports dynamic scaling to effectively avoid waste of resources. Supports separation of compute and storage.

Opening

Fully compatible with the open source community version of Hadoop/Spark, customers can use open source standard APIs to write jobs and migrate to the cloud without any modifications.

Safe

User clusters are located in a dedicated virtual private network to achieve complete isolation of resources.

Stable

Key components such as Hadoop, Spark, and HBase in the cluster support high availability features to ensure service availability.

Port configuration

Configuration name	UHadoop default configuration
yarn.resourcemanager.zk-address	localhost:2181
yarn.resourcemanager.address.rm1	master1:23140
yarn.resourcemanager.address.rm2	master2:23140
yarn.resourcemanager.scheduler.address.rm1	master1:23130
yarn.resourcemanager.scheduler.address.rm2	master2:23130
yarn.resourcemanager.webapp.https.address.rm1	master1:23189
yarn.resourcemanager.webapp.https.address.rm2	master2:23189
yarn.resourcemanager.webapp.address.rm1	master1:23188
yarn.resourcemanager.webapp.address.rm2	master2:23188
yarn.resourcemanager.admin.address.rm1	master1:23141
yarn.resourcemanager.admin.address.rm2	master2:23141
yarn.resourcemanager.resource-tracker.address.rm1	master1:23125
yarn.resourcemanager.resource-tracker.address.rm2	master2:23125
yarn.nodemanager.localizer.address	0.0.0.0:23344
NM Webapp address	0.0.0.0:23999
zeppelin.server.port	master1:29090
presto coordinator http-server.http.port	master1:28080
Presto worker http-server.http.port (core, task node)	28080
mapreduce.shuffle.port	23080
mapreduce.jobhistory.address	10020
dfs.datanode.address	50010
dfs.datanode.http.address	50075
dfs.datanode.https.address	50475
dfs.datanode.ipc.address	50020
fs.defaultFS	8020
dfs.namenode.servicerpc-address	8022
dfs.namenode.http-address	50070
dfs.namenode.https-address	50470
dfs.namenode.secondary.http-address	50090
dfs.secondary.https.address	50495
dfs.namenode.shared.edits.dir	8485
dfs.journalnode.http-address	8480
dfs.journalnode.https-address	8481
DFSZKFailoverController	8019
hbase.master.port	60000
hbase.master.info.port	60010
hbase.regionserver.port	60020
hbase.zookeeper.property.clientPort	2181
hbase.zookeeper.peerport	2888
hbase.zookeeper.leaderport	3888
hbase.rest.port	60050
hive.server2.thrift.port	10000
Hive metastore	9083
Zookeeper ClientPort	2181
Zookeeper Peer	2888, 3888