关于hadoop日志讨论

context.write写的文件part-r-00000怎么打开啊，很大，几百M
cat，text，都不太好，直接用记事本显示空白,打不开，太大了
日志是不是都需要输出，如果生产环境，对于日志有什么要求

howtodown · 发表于 2014-6-25 22:00:18

本帖最后由 howtodown 于 2014-6-25 22:02 编辑
直接用-text没有问题
考到win7下用记事本就有问题用读文件的方式一行一行去取还有乱码只输出数字打开也有很多#，

想用http://ip:50070/dfshealth.jsp，但是不管用。

nettman · 发表于 2014-6-25 22:03:24

提示: 作者被禁止或删除内容自动屏蔽

howtodown · 发表于 2014-6-25 22:17:03

我用的这种格式

context.write(Text,PageRankNode)

这种格式的，PageRankNode是一个对象，但是输出下面有各种#等特殊符号

########### 13084322982
####999#999 13198417390
##379595568 13035650470
##89097#64# 13060108148
#03603363#2 15680916657
#0801#09500 13281177726
#1066666688 13037709702 13208149513 13183977086 13219142985
#1283104973 13228318562
#1348229078 13198357723
#1834962085 15608154552
#36#9969160 15680448118
#6##9#92#36 13208268187
#6601#92683 15680947136
#894286176# 13079192980
#9232225555 13086583238
*#*0*77058* 13183407198
*****#*#*#* 15681217948
**999999999 13198650755
*0#12345678 15583697011
*008271#93* 15520393479
*2562546987 13036657511
*30852*7474 13038107748
*7*76#01111 13258402429
*8752541475
*9*9*9*9*9*
*97*0#7*41#

nettman · 发表于 2014-6-25 22:18:47

提示: 作者被禁止或删除内容自动屏蔽

howtodown · 发表于 2014-6-25 22:23:27

public static class AdjGraphMapper extends Mapper<Object, Text, Text, Text> {  
        @Override  
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {  
                String[] arr = value.toString().split("\t");
                if(arr[0].trim().length() == 11){
                        if(arr[1].trim().length() == 11){
                                context.write(new Text(arr[0]), new Text(arr[1]));  
                        }else{
                                context.write(new Text(arr[0]),new Text());
                        }
                }
        }  
    }  
        
        public static class AdjGraphReducer extends Reducer<Text,Text,Text,Text> {  
                @Override  
        public void reduce(Text key, Iterable<Text> values,Context context) throws IOException, InterruptedException { 
                HashSet<String> toIdSet = new HashSet<String>();
                    Text result = new Text();  
            for (Text val : values) {
                    toIdSet.add(val.toString());
            }  
            result.set(toIdSet.toString().replaceAll("[\\[\\]]", "").replaceAll(", ", "\t"));  
            context.write(key, result);  
            System.out.println(key+","+result);
        }  
    }  
复制代码

public static void main(String[] args) throws Exception {  
        String input = "hdfs://192.168.0.106:9000/user/wsc/data/SM20140623.txt";  
        String output = "hdfs://192.168.0.106:9000/user/wsc/data/result2";  
  
        Configuration conf = new Configuration();  
        Job job = new Job(conf, "adj graph");  
        job.setJarByClass(AdjGraph.class);  
        job.setMapperClass(AdjGraphMapper.class);  
        job.setCombinerClass(AdjGraphReducer.class);  
        job.setReducerClass(AdjGraphReducer.class);  
        
        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(Text.class);  
        FileInputFormat.addInputPath(job, new Path(input));  
        FileOutputFormat.setOutputPath(job, new Path(output));  
        System.exit(job.waitForCompletion(true) ? 0 : 1);  
    }  
复制代码