本帖最后由 ltz 于 2016-1-13 18:28 编辑
python:代码[mw_shl_code=python,true]
selectSQL = """
SELECT
utm_aid,
regexp_extract(ga_source,'([\\w+|\\(\\-|\\.\\w+|\\)]{0,})') track_code
FROM tbl_analytics
where analytic_type = 0
and year = '%s'
and month = '%s'
and day = '%s'
and length(ga_source)>26
""" % (year, month, day)
hiveDB = HiveDB()
results = hiveDB.select(selectSQL)
endtime = datetime.datetime.now()
print "[全站访客数] [Hive] Total time costs : %ds" % (endtime - starttime).seconds
for rows in results:
print rows[/mw_shl_code]
执行时python脚本通过thrift连接hive,regexp_extract(ga_source,'([\\w+|\\(\\-|\\.\\w+|\\)]{0,})') 并未对ga_source 字符串进行截取,
但单独把sql拿出来,在hive 中执行,正常截取的,请问是什么问题导致的?
|