1. 编写Morphline配置文件,将<软件信息>解析为<软件名字>:<平台类型> <版本号> (比如grep:amd64 3.1-2对应grep名字,amd64平台 4.8.4版本)。解析后的数据以带有Schema的Avro格式发送到kafka channel
2. 为log建立Avro Schema,其中包含<日期> <时间> <操作阶段> <阶段状态> <软件名字> <平台类型> <版本号>
3. 将kafka channel中数据 以Avro格式存在HDFS里。
4. 使用Hive读取HDFS上Avro格式的数据 下面是提供的log(我好难受啊。我就配置了flume代理,那个morphline完全不懂,希望大佬们帮帮忙)
2018-08-04 12:16:58 startup,archives,install
2018-08-04 12:16:58 install,base-passwd:amd64,<none> 3.5.44
2018-08-04 12:16:58 status,half-installed,base-passwd:amd64 3.5.44
2018-08-04 12:16:58 status,unpacked,base-passwd:amd64 3.5.44
2018-08-04 12:16:58 status,unpacked,base-passwd:amd64 3.5.44
2018-08-04 12:16:58 configure,base-passwd:amd64,3.5.44 3.5.44
2018-08-04 12:16:58 status,unpacked,base-passwd:amd64 3.5.44
2018-08-04 12:16:58 status,half-configured,base-passwd:amd64 3.5.44
2018-08-04 12:16:58 status,installed,base-passwd:amd64 3.5.44
2018-08-04 12:16:58 startup,archives,install
2018-08-04 12:16:58 install,base-files:amd64,<none> 10.1ubuntu2
2018-08-04 12:16:58 status,half-installed,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 configure,base-files:amd64,10.1ubuntu2 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:58 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:59 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:59 status,unpacked,base-files:amd64 10.1ubuntu2
2018-08-04 12:16:59 status,unpacked,base-files:amd64 10.1ubuntu2
|