Skip to main content Link Menu Expand (external link) Document Search Copy Copied

UHadoop - Managed Hadoop cluster

Product Introduction

UHadoop is a hosting service for a big data processing system running on the SCloud platform. By running Hadoop and Spark on a cloud platform, users can easily analyze and process their own data using other peripheral systems in the Hadoop and Spark ecosystems (such as Apache Hive, Apache Pig, HBase, etc.).

Product architecture diagram

image

HDFS

HDFS is deployed in HA mode by default, with two Name nodes deployed on master1 and master2 respectively, Data nodes distributed on all Core nodes, and Task not deploying Data nodes.

image

Yarn

Yarn also adopts HA deployment by default, with 2 ResourceManagers deployed on master1 and master2 respectively, and Node managers distributed on all Core and Task nodes.

image

Hive

Hive currently only supports on yarn mode, and two Hive-MetaStores are deployed in master1 and master2, and connected to local mySQL, avoiding Hive service failures caused by the downtime of a single master node.

You can connect to Hive services through HiveCli or Beeline.

image

HBase

HBase is deployed by HA by default, with two HMasters deployed on master1 and master2 respectively, and HRegionServer distributed on all Core nodes.

image

Spark

Spark adopts the On Yarn pattern, please refer to the Spark Development Guide for details.

Product features and advantages

Convenient

Create clusters in minutes, without worrying about node allocation, deployment, and optimization; With rich examples and scenario tutorials, you can quickly get started and achieve your business goals.

Use

  • Automated deployment according to the selected hardware model (CPU, memory, disk), selected software combination and version;
  • Users can request where their cluster is deployed based on the geographic location of themselves or the data source. Currently, UHADOOP supports regions such as North China, Guangzhou, and Hong Kong. It will be released to all regions supported by SCloud in the future.

Elasticity

The cluster can be large or small, and supports dynamic scaling to effectively avoid waste of resources. Supports separation of compute and storage.

Opening

Fully compatible with the open source community version of Hadoop/Spark, customers can use open source standard APIs to write jobs and migrate to the cloud without any modifications.

Safe

User clusters are located in a dedicated virtual private network to achieve complete isolation of resources.

Stable

Key components such as Hadoop, Spark, and HBase in the cluster support high availability features to ensure service availability.

Port configuration

Configuration name UHadoop default configuration
yarn.resourcemanager.zk-address localhost:2181
yarn.resourcemanager.address.rm1 master1:23140
yarn.resourcemanager.address.rm2 master2:23140
yarn.resourcemanager.scheduler.address.rm1 master1:23130
yarn.resourcemanager.scheduler.address.rm2 master2:23130
yarn.resourcemanager.webapp.https.address.rm1 master1:23189
yarn.resourcemanager.webapp.https.address.rm2 master2:23189
yarn.resourcemanager.webapp.address.rm1 master1:23188
yarn.resourcemanager.webapp.address.rm2 master2:23188
yarn.resourcemanager.admin.address.rm1 master1:23141
yarn.resourcemanager.admin.address.rm2 master2:23141
yarn.resourcemanager.resource-tracker.address.rm1 master1:23125
yarn.resourcemanager.resource-tracker.address.rm2 master2:23125
yarn.nodemanager.localizer.address 0.0.0.0:23344
NM Webapp address 0.0.0.0:23999
zeppelin.server.port master1:29090
presto coordinator http-server.http.port master1:28080
Presto worker http-server.http.port (core, task node) 28080
mapreduce.shuffle.port 23080
mapreduce.jobhistory.address 10020
dfs.datanode.address 50010
dfs.datanode.http.address 50075
dfs.datanode.https.address 50475
dfs.datanode.ipc.address 50020
fs.defaultFS 8020
dfs.namenode.servicerpc-address 8022
dfs.namenode.http-address 50070
dfs.namenode.https-address 50470
dfs.namenode.secondary.http-address 50090
dfs.secondary.https.address 50495
dfs.namenode.shared.edits.dir 8485
dfs.journalnode.http-address 8480
dfs.journalnode.https-address 8481
DFSZKFailoverController 8019
hbase.master.port 60000
hbase.master.info.port 60010
hbase.regionserver.port 60020
hbase.zookeeper.property.clientPort 2181
hbase.zookeeper.peerport 2888
hbase.zookeeper.leaderport 3888
hbase.rest.port 60050
hive.server2.thrift.port 10000
Hive metastore 9083
Zookeeper ClientPort 2181
Zookeeper Peer 2888, 3888