about云日志分析项目准备9-1：Flume1.7安装和使用：解决不断增加的日志文件及追加数据

问题导读
1.对于不断追加的文件可以使用flume哪个属性？
2.对于不断追加的文件及变化的文件个数，可是使用flume哪个属性?
3.该如何配置能够搜集网站日志的flume？

本文的背景：
在搜集日志的过程中，日志文件的个数及日志文件需要不断的追加。flume1.6中，可以使用tail -f可以解决不断追加的文件，但是由于日志文件的个数是变化的，不可能只产生一个文件。所以tail -f就已经不能解决这个搜集日志的问题。

需求：
需要能够监控文件，并且追加这个，同时文件个数也是不断变化的。

解决办法：
这时候flume1.7就产生了，很好的通过 TAILDIRl解决了这个问题。TAILDIRl可以监控一个目录下的文件。

官网地址：http://flume.apache.org/FlumeUserGuide.html

官网文档截图：

上面加粗为常用属性。

这里我们只使用了下面两个属性
a1.sources.source1.filegroups.f1 = /data/aboutyunlog/.*log.*
a1.sources.source1.type = TAILDIR

flume1.7安装包
链接：http://pan.baidu.com/s/1c1Pzo9i 密码：fxa4

一、Flume安装

1. 压缩安装包

tar -zxvf ~/jar/apache-flume-1.7.0-bin.tar.gz -C /data
mv /data/apache-flume-1.7.0-bin/ /data/flume-1.7.0 # 重命名

复制代码

2. 配置环境变量

echo -e "export FLUME_HOME=/data/flume-1.7.0\nexport PATH=\$FLUME_HOME/bin:\$PATH" >> ~/.bashrc
source ~/.bashrc

复制代码

3. 配置flume

cp flume-env.sh.template flume-env.sh修改JAVA_HOME
export JAVA_HOME= /data/jdk1.8.0_111

复制代码

4. 验证安装

flume-ng version

复制代码

二、Flume使用

一个agent由source、channel、sink组成。这儿我们使用Spooling Directory Source、File Channel、Kafka Sink。

1. 单节点的agent
1) 增加配置文件

cd $FLUME_HOME/conf
vim single_agent.conf

复制代码

将以下内容拷贝进去

# agent的名称为a1
a1.sources = source1
a1.channels = channel1
a1.sinks = sink1
# set source
#a1.sources.source1.type = spooldir
a1.sources.source1.type = TAILDIR
a1.sources.source1.filegroups = f1
a1.sources.source1.filegroups.f1 = /data/aboutyunlog/.*log.*
#a1.sources.source1.spoolDir=/data/aboutyunlog
a1sources.source1.fileHeader = flase
# set sink
a1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
#a1.sinks.sink1.kafka.bootstrap.servers = master:9092,slave1:9092,slave2:9092
a1.sinks.sink1.brokerList= master:9092,slave1:9092,slave2:9092
a1.sinks.sink1.topic= aboutyunlog
a1.sinks.sink1.kafka.flumeBatchSize = 20
a1.sinks.sink1.kafka.producer.acks = 1
a1.sinks.sink1.kafka.producer.linger.ms = 1
a1.sinks.sink1.kafka.producer.compression.type = snappy
# set channel
a1.channels.channel1.type = file
a1.channels.channel1.checkpointDir = /data/flume_data/checkpoint
a1.channels.channel1.dataDirs= /data/flume_data/data
# bind
a1.sources.source1.channels = channel1
a1.sinks.sink1.channel = channel1

复制代码

2. 创建所需文件

mkdir -p /data/aboutyunlog
mkdir -p /data/flume_data/checkpoint
mkdir -p /data/flume_data/data

复制代码

3. 查看kafka现有的topic

kafka-topics.sh --zookeeper master:2181,slave1:2181,slave2:2181 --list

复制代码

4. 在kafka上创建名为aboutyunlog的topic

kafka-topics.sh --zookeeper master:2181,slave1:2181,slave2:2181 --create --topic aboutyunlog --replication-factor 1 --partitions 3

复制代码

5. 启动flume

flume-ng agent --conf-file /data/flume-1.7.0/conf/single_agent.conf --name a1 -Dflume.root.logger=INFO,console

复制代码

启动过程中控制台会输出很多日志。

6. 创建一个kafka的consumer

kafka-console-consumer.sh --zookeeper master:2181,slave1:2181,slave2:2181 --topic aboutyunlog --from-beginning

复制代码

这条命令的意思是说创建aboutyunlog这个topic下的消费者，消费时从最开始的一条信息开始消费。

上图说明该消费者创建成功，由于本地/data/aboutyunlog目录下没有新文件加入，造成aboutyunlog这个topic没有信息输入，所以消费者没有得到一条信息。

7. 添加文件到flume source目录