1.flume简介
Flume是Cloudera提供的日志收集系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。 Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。
2.安装和使用说明: 2.1 安装 接着解压.暂时用$flume代表解压路径. d. 安装zookeeper yum install hadoop-zookeeper –y yum install hadoop-zookeeper-server –y 修改/zookeeper-3.3.1/conf/ zoo_sample.cfg重命名为zoo.cfg 执行如下命令:
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.3.1 export FLUME_HOME=/home/hadoop/flume-0.9.0+1 export PATH=.:$FLUME_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
2.2 使用
执行>flume 输出如下: usage: flume command [args...] commands include: dump Takes a specified source and dumps to console node Start a Flume node/agent (with watchdog) master Start a Flume Master server (with watchdog) version Dump flume build version information node_nowatch Start a flume node/agent (no watchdog) master_nowatch Start a Flume Master server (no watchdog) class <class> Run specified fully qualified class using Flume environment (no watchdog) ex: flume com.cloudera.flume.agent.FlumeNode classpath Dump the classpath used by the java executables shell Start the flume shell 启动flume的master节点执行:bin/flume master 通过flume打开文件 输入命令 $ flume dump 'tail("/home/hadoop/log/bb.txt")' 输出:
通过flume导入文件到hdfs 从上面URL打开的选项卡config,输入节点配置,然后点提交查询内容 如下: Source为数据源,可有多种输入源,sink为接收器,当启动master节点时,会把文件写入到hdsf里 启动配置好的节点:bin/flume node –n master 通过flume读取syslog-ng 分别启动节点host和collector节点
3.附录: Flume Event Sources console Stdin console text("filename") One shot text file source. One line is one event tail("filename") Similar to Unix’s tail -F. One line is one event. Stays open for more data and follows filename if file rotated. multitail("file1"[, "file2"[, …]]) Similar to tail source but follows multiple files. asciisynth(msg_count,msg_size) A source that synthetically generates msg_count random messages of size msg_size. This converts all characters into printable ASCII characters. syslogUdp(port) Syslog over UDP port, port. This is syslog compatible. syslogTcp(port) Syslog over TCP port, port. This is syslog-ng compatible. Flume Event Sinks | Null sink. Events are dropped. | | Console sink. Display to console’s stdout. The "format" argument is optional and defaults to the "debug" output format. | text("txtfile"[,"format"]) | Textfile sink. Write the events to text file txtfile using output format "format". The default format is "raw" event bodies with no metadata. | | DFS seqfile sink. Write serialized Flume events to a dfs path such as hdfs://namenode/file or file:///file in Hadoop’s seqfile format. Note that because of the hdfs write semantics, no data for this sink write until the sink is closed. | | Syslog TCP sink. Forward to events to host on TCP port port in syslog wire format (syslog-ng compatible), or to other Flume nodes setup to listen for syslogTcp. |
默认端口如下: TCP ports are used in all situations. | | | | | | | | | | flume.master.heartbeat.port | | | | | | | | | flume.master.zk.client.port | | | flume.master.zk.server.quorum.port | | | flume.master.zk.server.election.port | |
|