题目描述:
在Linux本地日志文件中会一直产生如下格式的JSON数据
```
{"id":"14943445328940974601","uid":"840717325115457536","lat":"53.530598","lnt":"-2.5620373","hots":0,"title":"0","status":"1","topicId":"0","end_time":"1494344570","watch_num":0,"share_num":"1","replay_url":null,"replay_num":0,"start_time":"1494344544","timestamp":1494344571,"type":"video_info"}
{"uid":"861848974414839801","nickname":"mick","usign":"","sex":1,"birthday":"","face":"","big_face":"","email":"abc@qq.com","mobile":"","reg_type":"102","last_login_time":"1494344580","reg_time":"1494344580","last_update_time":"1494344580","status":"5","is_verified":"0","verified_info":"","is_seller":"0","level":1,"exp":0,"anchor_level":0,"anchor_exp":0,"os":"android","timestamp":1494344580,"type":"user_info"}
{"send_id":"834688818270961664","good_id":"223","video_id":"14943443045138661356","gold":"10","timestamp":1494344574,"type":"gift_record"}
```
注意:这里的数据一共有三种格式,根据JSON中的type字段进行区分。
需求是根据数据中的type
字段进行分目录存储,并且还要对type
字段的值进行一定的处理
最终处理之后的数据需要存储到hdfs://bigdata01:9000/moreInfoRes
父目录中
例如:
type:video_info
类型的数据需要存储到videoInfo
目录里面
type:user_info
类型的数据需要存储到userInfo
目录里面。
type:gift_record
类型的数据需要存储到giftRecord
目录里面。
效果:
最终的效果是这样的
type:video_info
类型的数据存储到hdfs://bigdata01:9000/moreInfoRes/videoInfo
子目录里面
type:user_info
类型的数据存储到hdfs://bigdata01:9000/moreInfoRes/userInfo
子目录里面。
type:gift_record
类型的数据存储到hdfs://bigdata01:9000/moreInfoRes/giftRecord
子目录里面。
任务要求:
1:不能使用search_replace
这个拦截器修改type字段的值,这种方案我们在课程中讲过了,效率有点低
任务提示、思路分析:
1:需要大家自定义一个MySearchAndReplaceInterceptor
拦截器
2:具体如何自定义拦截器,可以参考Flume的文档以及Flume源码中search_replace
的实现代码
3:最终使用自定义的MySearchAndReplaceInterceptor
拦截器配合regex_extractor
拦截器实现此功能