Pig的COUNT问题
grunt> A = LOAD 'c.txt' AS (col1:chararray, col2:int, col3:int, col4:int, col5:double, col6:double, col7:int);grunt> B = GROUP A BY (col2, col3, col4);
grunt> C = FOREACH B {D = DISTINCT A.col7; GENERATE group, COUNT(D);};
grunt> DUMP C; 我主要就是过滤相指定相同字段,然后统计,但是输出都是1,我最终要统计所以的字段到底有多少天
请问是要什么方法? 这里我是要的方法是 D = GROUP C ALL;
E = FOREACH D GENERATE COUNT(C) 本帖最后由 howtodown 于 2014-10-21 11:25 编辑
提供一些表中数据,和你要达到的效果 本帖最后由 Joker 于 2014-10-21 21:08 编辑
testFile
1 2
3 4
1 2
3 4
5 6
test.pig
grunt>
A = LOAD 'testFile.txt' AS (x:int,y:int);
B = GROUP A BY y;
C = FOREACH B {D = DISTINCT A.y; GENERATECOUNT(D) as k;};
D = GROUP C ALL;
E = FOREACH D GENERATE COUNT(C.k);
DUMP E;
如果y值为相同的就是1个,最终统计出3个值
本帖最后由 howtodown 于 2014-10-21 14:31 编辑
使用下面代码试试:是对col2去重,然后输出
A = LOAD '1.txt' AS (col1: chararray, col2: chararray);
B = GROUP A BY (col2);
C = FOREACH B {
D = LIMIT A 1;
GENERATE FLATTEN(D);
};
DUMP C;
本帖最后由 sstutu 于 2014-10-21 15:12 编辑
grunt> A = LOAD 'c.txt' AS (col1:chararray, col2:int, col3:int, col4:int, col5:double, col6:double, col7:int);
grunt> B = GROUP A BY (col2, col3, col4);
grunt> C = FOREACH B {D = DISTINCT A.col7; GENERATE group, COUNT(D);};
grunt> DUMP C;
把上面内容,添加如下内容
B = GROUP A BY (col2, col3, col4,col7);
C = FOREACH B GENERATE (D = DISTINCT col7; group, COUNT(D));
修改为下面形式
grunt> A = LOAD 'c.txt' AS (col1:chararray, col2:int, col3:int, col4:int, col5:double, col6:double, col7:int);
grunt> B = GROUP A BY (col2, col3, col4,col7);
C = FOREACH B GENERATE (D = DISTINCT col7; group, COUNT(D));
grunt> DUMP C;
howtodown 发表于 2014-10-21 14:30
本帖最后由 howtodown 于 2014-10-21 14:31 编辑
使用下面代码试试:是对col2去重,然后输出
恩,之前用过这段代码,但是输出的全是1原因是,和早上想问的一样,我需要做最后的统计就是
不过多谢版主
sstutu 发表于 2014-10-21 15:07
把上面内容,添加如下内容
B = GROUP A BY (col2, col3, col4,col7);
...
多谢,学习了
页:
[1]