Atlas 2.1.0 实践之安装Atlas并集成HIve

问题导读：
1、如何安装Atlas？
2、如何理解Hive Model？
3、如何配置Hive hook？
4、如何将Hive元数据导入Atlas？

上一篇：Atlas 2.1.0 实践之编译Atlas

安装Atlas

在完成Atlas编译以后，就可以进行Atlas的安装了。Atlas的安装主要是安装Atlas的Server端，也就Atlas的管理页面，并确保Atlas与Kafka Hbase Solr等组件的集成。

Atlas的系统架构如下，在确保底层存储与UI界面正常后，之后就可以进行与Hive等组件的集成调试了。

一、环境准备

安装之前先要准备好

JDK1.8
Zookeeper
Kafka
Hbase
Solr

在启动Atlas时会配置这些环境变量的地址，所以一定要确保以上组件正常运行。

由于在编译时可以选择内部集成，所以这些Atlas是可以自带的，但是JDK一定要安装好。

在安装Altas中，需要Solr 预先创建好collection

 bin/solr create -c vertex_index -shards 3 -replicationFactor 2

 bin/solr create -c edge_index -shards 3 -replicationFactor 2

bin/solr create -c fulltext_index -shards 3 -replicationFactor 2
复制代码

在solr中验证创建成功。

二、安装Atlas

到编译好的包的路径下 apache-atlas-sources-2.1.0/distro/target

将生成好的安装包 apache-atlas-2.1.0-server.tar.gz 拷贝到目标路径下。

解压：

tar -zxvf apache-atlas-2.1.0-server.tar.gz
复制代码

三、修改配置

进入conf目录下：

vi  atlas-env.sh
复制代码

在此指定JAVA_HOME和是否要用内嵌启动

export JAVA_HOME=/opt/jdk1.8.0_191/
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true
复制代码

如果使用内嵌，那么配置结束，直接去启动Atlas

但是大部分时候，需要使用已经有的组件进行集成，所以设置为false。

export JAVA_HOME=/opt/jdk1.8.0_191/
export MANAGE_LOCAL_HBASE=false
export MANAGE_LOCAL_SOLR=false 
#注意修改Hbase配置文件路径
export HBASE_CONF_DIR=/opt/hbase/conf
复制代码

修改其他配置

vim atlas-application.properties
复制代码

这里就是设置Hbase Solr等配置

#Hbase地址  就是Hbase配置的zookeeper地址
atlas.graph.storage.hostname=slave01:2181,slave02:2181,slave03:2181

atlas.audit.hbase.zookeeper.quorum=slave01:2181,slave02:2181,slave03:2181

#solr服务器地址
atlas.graph.index.search.solr.http-urls=http://slave01:8984/solr

#kafka地址
atlas.notification.embedded=false
atlas.kafka.zookeeper.connect=slave01:2181,slave02:2181,slave03:2181
atlas.kafka.bootstrap.servers=slave01:9092,slave02:9092,slave03:9092

#atlas地址
atlas.rest.address=http://slave01:21000
复制代码

四、启动Atlas

bin/atlas_start.py
复制代码

启动成功后访问：

http://slave01:21000

admin/admin登录
复制代码

成功！！

踩坑全纪录

HBase: apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid

hbase的问题我的是因为没有指定对hbase的配置文件

hbase配置文件的 zookeeper.znode.parent 并不是 /hbase/hbaseid
could not instantiate implementation: org.janusgraph.diskstorage.solr.Solr6Index
cannot connect to cluster at ... cluster not found /not ready

提示无法找到solr，在填写atlas.graph.index.search.solr.zookeeper-url时需要添加znode，如2181/solr

具体是什么去solr配置文件里，或者页面上找

could not register new index field with index backend

Solr有问题，检查Solr 确保Solr正常启动了

Can not find the specified config set: vertex_index

solr需要先建三个索引 vertex_index, edge_index, fulltext_index

Atlas集成HIve

在安装好Atlas以后，如果想要使用起来，还要让Atlas与其他组件建立联系。

其中最常用的就是Hive。

通过Atlas的架构，只要配置好Hive Hook ，那么每次Hive做任何操作就会写入Kafka从而被atlas接收。

并在Atlas中已图的形式展示出来。

Hive Model

都会记录Hive哪些操作信息呢？Altas对Hive Model进行了定义。

包含以下内容：
1、实体类型：

hive_db

类型：Asset

属性：qualifiedName, name, description, owner, clusterName, location, parameters, ownerName

hive_table

类型：DataSet

属性：qualifiedName, name, description, owner, db, createTime, lastAccessTime, comment, retention, sd, partitionKeys, columns, aliases, parameters, viewOriginalText, viewExpandedText, tableType, temporary

hive_column

类型：DataSet

属性：qualifiedName, name, description, owner, type, comment, table

hive_storagedesc

类型：Referenceable

属性：qualifiedName, table, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories

hive_process

类型：Process

属性：qualifiedName, name, description, owner, inputs, outputs, startTime, endTime, userName, operationType, queryText, queryPlan, queryId, clusterName

hive_column_lineage

类型：Process

属性：qualifiedName, name, description, owner, inputs, outputs, query, depenendencyType, expression

2、枚举类型：

hive_principal_type 值：USER, ROLE, GROUP
复制代码

3、构造类型

hive_order 属性：col, order

hive_serde 属性：name, serializationLib, parameters
复制代码

HIve实体的结构：

hive_db.qualifiedName:     <dbName>@<clusterName>
hive_table.qualifiedName:  <dbName>.<tableName>@<clusterName>
hive_column.qualifiedName: <dbName>.<tableName>.<columnName>@<clusterName>
hive_process.queryString:  trimmed query string in lower case
复制代码

配置Hive hook

hive hook会监听hive的 create/update/delete 操作，下面是配置步骤：

1、修改hive-env.sh（指定包地址）

export HIVE_AUX_JARS_PATH=/opt/apps/apache-atlas-2.1.0/hook/hive
复制代码

2、修改hive-site.xml（配置完需要重启hive）

<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
复制代码

注意，这里其实是执行后的监控，可以有执行前，执行中的监控。

3、同步配置拷贝atlas配置文件atlas-application.properties到hive配置目录添加配置：

atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
atlas.rest.address=http://doit33:21000
复制代码

将Hive元数据导入Atlas

bin/import-hive.sh

Using Hive configuration directory [/opt/module/hive/conf]

Log file for import is /opt/module/atlas/logs/import-hive.log

log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout.

log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout.

输入用户名：admin；输入密码：admin

Enter username for atlas :- admin

Enter password for atlas :-

Hive Meta Data import was successful!!!
复制代码

踩坑全记录
一、找不到类 org.apache.atlas.hive.hook.hivehook

hive第三方jar包没加进去

小技巧使用hive-shell 看一下jar包加进去没有 set这将打印由用户或配置单元覆盖的配置变量列表。

以加入elsaticsearch-hadoop-2.1.2.jar为例，讲述在Hive中加入第三方jar的几种方式。

1，在hive shell中加入

hive> add jar /home/hadoop/elasticsearch-hadoop-hive-2.1.2.jar;
复制代码

2，Jar放入${HIVE_HOME}/auxlib目录

在${HIVE_HOME}中创建文件夹auxlib，然后将自定义jar文件放入该文件夹中。此方法添加不需要重启Hive。而且比较便捷。

3，HIVE.AUX.JARS.PATH和hive.aux.jars.path

hive-env.sh中的HIVE.AUX.JARS.PATH和hive-site.xml的hive.aux.jars.path配置对服务器无效，仅对当前hive shell有效，不同的hive shell相互不影响，每个hive shell都需要配置，可以配置成文件夹形式。HIVE.AUX.JARS.PATH和hive.aux.jars.path仅支持本地文件。可配置成文件，也可配置为文件夹。

二、HIVE报错 Failing because I am unlikely to write too

HIVE.AUX.JARS.PATH配置不对

hive-env.sh脚本中有一段

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then
  export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then
  export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog
fi
复制代码

如果给HIVE_AUX_JARS_PATH设值，则/usr/hdp/current/hive-webhcat/share/hcatalog就会被忽略掉。

hive只能读取一个HIVE_AUX_JARS_PATH

在一个地方集中放置我们的共享jar包，然后在/usr/hdp/current/hive-webhcat/share/hcatalog下面建立一相应的软连接就可以

sudo -u hive ln -s /usr/lib/share-lib/elasticsearch-hadoop-2.1.0.Beta4.jar /usr/hdp/current/hive-webhcat/share/hcatalog/elasticsearch-hadoop-2.1.0.Beta4.jar
复制代码

作者：独孤风
来源：https://mp.weixin.qq.com/s/1I-NpCQfw5XhrDGRzwweUQ

最新经典文章，欢迎关注公众号

图文精华

Atlas 2.1.0 实践之安装Atlas并集成HIve

最佳新人

热心会员

推荐 /2