求教Hive使用UDF将查询结果导入MySQL中，为什么总是多出两条记录？

本帖最后由 NIITYZU 于 2015-4-23 10:27 编辑

我使用Hive UDF将Hive查询结果导入到MySQL中，但是Hive CLI 明明显示的是10条记录，到了MySQL中后却多出两条记录，还不报错误，请问有人遇到过这个问题吗？
我的UDF处理类如下：

public class AnalyzeStatistics  extends UDF{

        public String evaluate(String clxxbh,String hphm){

                String sql="insert into jtxx2 values(?,?)";
                //调用数据库添加方法
                if(DBSqlHelper.addBatch(sql, clxxbh, hphm)){
                        return clxxbh+"  SUCCESS  "+hphm;
                }else{
                        return clxxbh+"  faile  "+hphm;
                }
        }
}
复制代码

MySQL数据库添加方法

   public static boolean addBatch(String sql,String clxxbh,String hphm){
                  boolean flag=false;
                  try{
                          conn=DBSqlHelper.getConn();
                          //conn.setAutoCommit(flag);

                          ps=(PreparedStatement) conn.prepareStatement(sql);

                          ps.setString(1, clxxbh);
                          ps.setString(2, hphm);

                          //ps.executeBatch();
                         // conn.commit();
                          ps.executeUpdate();
                          flag=true;

                  }catch(Exception e){
                          e.printStackTrace();
                  }finally{
                          try {
                                ps.close();
                        } catch (SQLException e) {
                                // TODO Auto-generated catch block
                                e.printStackTrace();
                        }
                  }

                 return flag;
          }
复制代码

运行结果：
Hive CLI界面结果：（analyze是我创建的一个临时的函数指向上面的UDF，界面结果明明是10条数据）

hive> select analyze(clxxbh,hphm) from transjtxx_hbase limit 10;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1428394594787_0034, Tracking URL = http://secondmgt:8088/proxy/application_1428394594787_0034/
Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1428394594787_0034
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-04-23 10:15:34,355 Stage-1 map = 0%,  reduce = 0%
2015-04-23 10:15:51,032 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.14 sec
MapReduce Total cumulative CPU time: 7 seconds 140 msec
Ended Job = job_1428394594787_0034
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 7.14 sec   HDFS Read: 256 HDFS Write: 532 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 140 msec
OK
32100017000000000220140317000015  SUCCESS  鲁Q58182
32100017000000000220140317000016  SUCCESS  鲁QV4662
32100017000000000220140317000019  SUCCESS  苏LL8128
32100017000000000220140317000020  SUCCESS  苏CAH367
32100017000000000220140317000023  SUCCESS  鲁Q7899W
32100017000000000220140317000029  SUCCESS  苏HN3819
32100017000000000220140317000038  SUCCESS  鲁C01576
32100017000000000220140317000044  SUCCESS  苏DT9178
32100017000000000220140317000049  SUCCESS  苏LZ1112
32100017000000000220140317000052  SUCCESS  苏K9795警
Time taken: 35.815 seconds, Fetched: 10 row(s)
复制代码

而MySQL数据库对应的却是12条数据，如下：

mysql> select * from jtxx2;
+----------------------------------+-------------+
| cllxbh                           | hphm        |
+----------------------------------+-------------+
| 32100017000000000220140317000015 | 鲁Q58182    |
| 32100017000000000220140317000016 | 鲁QV4662    |
| 32100017000000000220140317000019 | 苏LL8128    |
| 32100017000000000220140317000020 | 苏CAH367    |
| 32100017000000000220140317000023 | 鲁Q7899W    |
| 32100017000000000220140317000029 | 苏HN3819    |
| 32100017000000000220140317000038 | 鲁C01576    |
| 32100017000000000220140317000044 | 苏DT9178    |
| 32100017000000000220140317000049 | 苏LZ1112    |
| 32100017000000000220140317000052 | 苏K9795警   |
| 32100017000000000220140317000056 | 黑BF7222    |
| 32100017000000000220140317000108 | 辽H39290    |
+----------------------------------+-------------+
12 rows in set (0.00 sec)
复制代码

多出来最后两条，请问是我的代码有问题，还是哪里出错了？

jixianqiuxue · 发表于 2015-4-23 11:33:51

没有发现什么问题。
建议楼主，把addBatch打印下看看。
最好能把通过hive提取的数据能够做个标记。以免跟以前的数据产生混肴

图文精华

求教Hive使用UDF将查询结果导入MySQL中，为什么总是多出两条记录？

已有(1)人评论

推荐 /2