很多人遇到这么个问题。
同样的数据,用impala查询少8个小时,比如spark和hive查询则是多8个小时。
这里给出一段英文解释
Based on this discussion it seems that when support for saving timestamps in Parquet was added to Hive, the primary goal was to be compatible with Impala's implementation, which probably predates the addition of the timestamp_millis type to the Parquet specification.
Impala's timestamp representation maps to the int96 Parquet type (4 bytes for the date, 8 bytes for the time, details in the linked discussion).
So no, storing a Hive timestamp in Parquet does not use the timestamp_millis type, but Impala's int96 timestamp representation instead. 也就是可能他们用的格式是不一样造成的。
我们知道存在这么个现象,通过函数转换下即可
参考:
http://blog.csdn.net/bsf5521/article/details/72682996
|