分享

用Mahout构建职位推荐引擎

52Pig 发表于 2014-11-8 22:40:31 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 2 14361
本帖最后由 52Pig 于 2014-11-8 22:39 编辑
阅读导读:
1.如何设计职位推荐引擎的指标?
2.简述职位推荐引擎所需要的系统架构?
3.如何对推荐结果进行人工比较?
4.职位推荐引擎中什么情况的数据最好做排除?



1. Mahout推荐系统框架概述

Mahout框架包含了一套完整的推荐系统引擎,标准化的数据结构,多样的算法实现,简单的开发流程。Mahout推荐的推荐系统引擎是模块化的,分为5个主要部分组成:数据模型,相似度算法,近邻算法,推荐算法,算法评分器。

更详细的介绍,请参考文章:从源代码剖析Mahout推荐引擎

2. 需求分析:职位推荐引擎指标设计

下面我们将从一个公司案例出发来全面的解释,如何进行职位推荐引擎指标设计。

案例介绍:
互联网某职业社交网站,主要产品包括 个人简历展示页,人脉圈,微博及分享链接,职位发布,职位申请,教育培训等。

用户在完成注册后,需要完善自己的个人信息,包括教育背景,工作经历,项目经历,技能专长等等信息。然后,你要告诉网站,你是否想找工作!!当你选择“是”(求职中),网站会从数据库中为你推荐你可能感兴趣的职位。

通过简短的描述,我们可以粗略地看出,这家职业社交网站的定位和主营业务。核心点有2个:
  • 用户:尽可能多的保存有效完整的用户资料
  • 服务:帮助用户找到工作,帮助猎头和企业找到员工

因此,职位推荐引擎 将成为这个网站的核心功能。
KPI指标设计
  • 通过推荐带来的职位浏览量: 职位网页的PV
  • 通过推荐带来的职位申请量: 职位网页的有效转化

3. 算法模型:推荐算法

2个测试数据集:
  • pv.csv: 职位被浏览的信息,包括用户ID,职位ID
  • job.csv: 职位基本信息,包括职位ID,发布时间,工资标准

1). pv.csv
  • 2列数据:用户ID,职位ID(userid,jobid)
  • 浏览记录:2500条
  • 用户数:1000个,用户ID:1-1000
  • 职位数:200个,职位ID:1-200
部分数据:
  1. 1,11
  2. 2,136
  3. 2,187
  4. 3,165
  5. 3,1
  6. 3,24
  7. 4,8
  8. 4,199
  9. 5,32
  10. 5,100
  11. 6,14
  12. 7,59
  13. 7,147
  14. 8,92
  15. 9,165
  16. 9,80
  17. 9,171
  18. 10,45
  19. 10,31
  20. 10,1
  21. 10,152
复制代码
2). job.csv
  • 3列数据:职位ID,发布时间,工资标准(jobid,create_date,salary)
  • 职位数:200个,职位ID:1-200

部分数据:
  1. 1,2013-01-24,5600
  2. 2,2011-03-02,5400
  3. 3,2011-03-14,8100
  4. 4,2012-10-05,2200
  5. 5,2011-09-03,14100
  6. 6,2011-03-05,6500
  7. 7,2012-06-06,37000
  8. 8,2013-02-18,5500
  9. 9,2010-07-05,7500
  10. 10,2010-01-23,6700
  11. 11,2011-09-19,5200
  12. 12,2010-01-19,29700
  13. 13,2013-09-28,6000
  14. 14,2013-10-23,3300
  15. 15,2010-10-09,2700
  16. 16,2010-07-14,5100
  17. 17,2010-05-13,29000
  18. 18,2010-01-16,21800
  19. 19,2013-05-23,5700
  20. 20,2011-04-24,5900
复制代码
为了完成KPI的指标,我们把问题用“技术”语言转化一下:我们需要让职位的推荐结果更准确,从而增加用户的点击。
  • 1. 组合使用推荐算法,选出“评估推荐器”验证得分较高的算法
  • 2. 人工验证推荐结果
  • 3. 职位有时效性,推荐的结果应该是发布半年内的职位
  • 4. 工资的标准,应不低于用户浏览职位工资的平均值的80%

我们选择UserCF,ItemCF,SlopeOne的 3种推荐算法,进行7种组合的测试。
  • userCF1: LogLikelihoodSimilarity + NearestNUserNeighborhood + GenericBooleanPrefUserBasedRecommender
  • userCF2: CityBlockSimilarity+ NearestNUserNeighborhood + GenericBooleanPrefUserBasedRecommender
  • userCF3: UserTanimoto + NearestNUserNeighborhood + GenericBooleanPrefUserBasedRecommender
  • itemCF1: LogLikelihoodSimilarity + GenericBooleanPrefItemBasedRecommender
  • itemCF2: CityBlockSimilarity+ GenericBooleanPrefItemBasedRecommender
  • itemCF3: ItemTanimoto + GenericBooleanPrefItemBasedRecommender
  • slopeOne:SlopeOneRecommender

关于的推荐算法的详细介绍,请参考文章:Mahout推荐算法API详解

关于算法的组合的详细介绍,请参考文章:从源代码剖析Mahout推荐引擎

4. 架构设计:职位推荐引擎系统架构
mahout-recommend-job-architect.png
上图中,左边是Application业务系统,右边是Mahout,下边是Hadoop集群。
  • 1. 当数据量不太大时,并且算法复杂,直接选择用Mahout读取CSV或者Database数据,在单机内存中进行计算。Mahout是多线程的应用,会并行使用单机所有系统资源。
  • 2. 当数据量很大时,选择并行化算法(ItemCF),先业务系统的数据导入到Hadoop的HDFS中,然后用Mahout访问HDFS实现算法,这时算法的性能与整个Hadoop集群有关。
  • 3. 计算后的结果,保存到数据库中,方便查询

5. 程序开发:基于Mahout的推荐算法实现

开发环境mahout版本为0.8。 ,请参考文章:用Maven构建Mahout项目

新建Java类:
  • RecommenderEvaluator.java, 选出“评估推荐器”验证得分较高的算法
  • RecommenderResult.java, 对指定数量的结果人工比较
  • RecommenderFilterOutdateResult.java,排除过期职位
  • RecommenderFilterSalaryResult.java,排除工资过低的职位
1). RecommenderEvaluator.java, 选出“评估推荐器”验证得分较高的算
源代码:
  1. public class RecommenderEvaluator {
  2.     final static int NEIGHBORHOOD_NUM = 2;
  3.     final static int RECOMMENDER_NUM = 3;
  4.     public static void main(String[] args) throws TasteException, IOException {
  5.         String file = "datafile/job/pv.csv";
  6.         DataModel dataModel = RecommendFactory.buildDataModelNoPref(file);
  7.         userLoglikelihood(dataModel);
  8.         userCityBlock(dataModel);
  9.         userTanimoto(dataModel);
  10.         itemLoglikelihood(dataModel);
  11.         itemCityBlock(dataModel);
  12.         itemTanimoto(dataModel);
  13.         slopeOne(dataModel);
  14.     }
  15.     public static RecommenderBuilder userLoglikelihood(DataModel dataModel) throws TasteException, IOException {
  16.         System.out.println("userLoglikelihood");
  17.         UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
  18.         UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
  19.         RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
  20.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  21.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  22.         return recommenderBuilder;
  23.     }
  24.     public static RecommenderBuilder userCityBlock(DataModel dataModel) throws TasteException, IOException {
  25.         System.out.println("userCityBlock");
  26.         UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataModel);
  27.         UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
  28.         RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
  29.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  30.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  31.         return recommenderBuilder;
  32.     }
  33.     public static RecommenderBuilder userTanimoto(DataModel dataModel) throws TasteException, IOException {
  34.         System.out.println("userTanimoto");
  35.         UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataModel);
  36.         UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM);
  37.         RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, false);
  38.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  39.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  40.         return recommenderBuilder;
  41.     }
  42.     public static RecommenderBuilder itemLoglikelihood(DataModel dataModel) throws TasteException, IOException {
  43.         System.out.println("itemLoglikelihood");
  44.         ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
  45.         RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
  46.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  47.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  48.         return recommenderBuilder;
  49.     }
  50.     public static RecommenderBuilder itemCityBlock(DataModel dataModel) throws TasteException, IOException {
  51.         System.out.println("itemCityBlock");
  52.         ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataModel);
  53.         RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
  54.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  55.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  56.         return recommenderBuilder;
  57.     }
  58.     public static RecommenderBuilder itemTanimoto(DataModel dataModel) throws TasteException, IOException {
  59.         System.out.println("itemTanimoto");
  60.         ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataModel);
  61.         RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, false);
  62.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  63.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  64.         return recommenderBuilder;
  65.     }
  66.     public static RecommenderBuilder slopeOne(DataModel dataModel) throws TasteException, IOException {
  67.         System.out.println("slopeOne");
  68.         RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender();
  69.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  70.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  71.         return recommenderBuilder;
  72.     }
  73.     public static RecommenderBuilder knnLoglikelihood(DataModel dataModel) throws TasteException, IOException {
  74.         System.out.println("knnLoglikelihood");
  75.         ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
  76.         RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
  77.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  78.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  79.         return recommenderBuilder;
  80.     }
  81.     public static RecommenderBuilder knnTanimoto(DataModel dataModel) throws TasteException, IOException {
  82.         System.out.println("knnTanimoto");
  83.         ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.TANIMOTO, dataModel);
  84.         RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
  85.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  86.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  87.         return recommenderBuilder;
  88.     }
  89.     public static RecommenderBuilder knnCityBlock(DataModel dataModel) throws TasteException, IOException {
  90.         System.out.println("knnCityBlock");
  91.         ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.CITYBLOCK, dataModel);
  92.         RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10);
  93.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  94.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  95.         return recommenderBuilder;
  96.     }
  97.     public static RecommenderBuilder svd(DataModel dataModel) throws TasteException {
  98.         System.out.println("svd");
  99.         RecommenderBuilder recommenderBuilder = RecommendFactory.svdRecommender(new ALSWRFactorizer(dataModel, 5, 0.05, 10));
  100.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  101.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  102.         return recommenderBuilder;
  103.     }
  104.     public static RecommenderBuilder treeClusterLoglikelihood(DataModel dataModel) throws TasteException {
  105.         System.out.println("treeClusterLoglikelihood");
  106.         UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel);
  107.         ClusterSimilarity clusterSimilarity = RecommendFactory.clusterSimilarity(RecommendFactory.SIMILARITY.FARTHEST_NEIGHBOR_CLUSTER, userSimilarity);
  108.         RecommenderBuilder recommenderBuilder = RecommendFactory.treeClusterRecommender(clusterSimilarity, 3);
  109.         RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7);
  110.         RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2);
  111.         return recommenderBuilder;
  112.     }
  113. }
复制代码
运行结果,控制台输出:
  1. userLoglikelihood
  2. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.2741487771272658
  3. Recommender IR Evaluator: [Precision:0.6424242424242422,Recall:0.4098360655737705]
  4. userCityBlock
  5. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.575306732961736
  6. Recommender IR Evaluator: [Precision:0.919580419580419,Recall:0.4371584699453552]
  7. userTanimoto
  8. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.5546485136181523
  9. Recommender IR Evaluator: [Precision:0.6625766871165644,Recall:0.41803278688524603]
  10. itemLoglikelihood
  11. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.5398332608612343
  12. Recommender IR Evaluator: [Precision:0.26229508196721296,Recall:0.26229508196721296]
  13. itemCityBlock
  14. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.9251437840891661
  15. Recommender IR Evaluator: [Precision:0.02185792349726776,Recall:0.02185792349726776]
  16. itemTanimoto
  17. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.9176432856689655
  18. Recommender IR Evaluator: [Precision:0.26229508196721296,Recall:0.26229508196721296]
  19. slopeOne
  20. AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.0
  21. Recommender IR Evaluator: [Precision:0.01912568306010929,Recall:0.01912568306010929]
复制代码
可视化“评估推荐器”输出:
difference.png
evaluator.png
UserCityBlock算法评估的结果是最好的,基于UserCF的算法比ItemCF都要好,SlopeOne算法几乎没有得分。

2). RecommenderResult.java, 对指定数量的结果人工比较
为得到差异化结果,我们分别取UserCityBlock,itemLoglikelihood,对推荐结果人工比较。

源代码:
  1. public class RecommenderResult {
  2.     final static int NEIGHBORHOOD_NUM = 2;
  3.     final static int RECOMMENDER_NUM = 3;
  4.     public static void main(String[] args) throws TasteException, IOException {
  5.         String file = "datafile/job/pv.csv";
  6.         DataModel dataModel = RecommendFactory.buildDataModelNoPref(file);
  7.         RecommenderBuilder rb1 = RecommenderEvaluator.userCityBlock(dataModel);
  8.         RecommenderBuilder rb2 = RecommenderEvaluator.itemLoglikelihood(dataModel);
  9.         LongPrimitiveIterator iter = dataModel.getUserIDs();
  10.         while (iter.hasNext()) {
  11.             long uid = iter.nextLong();
  12.             System.out.print("userCityBlock    =>");
  13.             result(uid, rb1, dataModel);
  14.             System.out.print("itemLoglikelihood=>");
  15.             result(uid, rb2, dataModel);
  16.         }
  17.     }
  18.     public static void result(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException {
  19.         List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM);
  20.         RecommendFactory.showItems(uid, list, false);
  21.     }
  22. }
复制代码
控制台输出:只截取部分结果
  1. ...
  2. userCityBlock    =>uid:968,(61,0.333333)
  3. itemLoglikelihood=>uid:968,(121,1.429362)(153,1.239939)(198,1.207726)
  4. userCityBlock    =>uid:969,
  5. itemLoglikelihood=>uid:969,(75,1.326499)(30,0.873100)(85,0.763344)
  6. userCityBlock    =>uid:970,
  7. itemLoglikelihood=>uid:970,(13,0.748417)(156,0.748417)(122,0.748417)
  8. userCityBlock    =>uid:971,
  9. itemLoglikelihood=>uid:971,(38,2.060951)(104,1.951208)(83,1.941735)
  10. userCityBlock    =>uid:972,
  11. itemLoglikelihood=>uid:972,(131,1.378395)(4,1.349386)(87,0.881816)
  12. userCityBlock    =>uid:973,
  13. itemLoglikelihood=>uid:973,(196,1.432040)(140,1.398066)(130,1.380335)
  14. userCityBlock    =>uid:974,(19,0.200000)
  15. itemLoglikelihood=>uid:974,(145,1.994049)(121,1.794289)(98,1.738027)
  16. ...
复制代码
我们查看uid=974的用户推荐信息:
搜索pv.csv:
  1. > pv[which(pv$userid==974),]
  2.      userid jobid
  3. 2426    974   106
  4. 2427    974   173
  5. 2428    974    82
  6. 2429    974   188
  7. 2430    974    78
复制代码
搜索job.csv:
  1. > job[job$jobid %in% c(145,121,98,19),]
  2.     jobid create_date salary
  3. 19     19  2013-05-23   5700
  4. 98     98  2010-01-15   2900
  5. 121   121  2010-06-19   5300
  6. 145   145  2013-08-02   6800
复制代码
上面两种算法,推荐的结果都是2010年的职位,这些结果并不是太好,接下来我们要排除过期职位,只保留2013年的职位。
3).RecommenderFilterOutdateResult.java,排除过期职位
源代码:
  1. public class RecommenderFilterOutdateResult {
  2.     final static int NEIGHBORHOOD_NUM = 2;
  3.     final static int RECOMMENDER_NUM = 3;
  4.     public static void main(String[] args) throws TasteException, IOException {
  5.         String file = "datafile/job/pv.csv";
  6.         DataModel dataModel = RecommendFactory.buildDataModelNoPref(file);
  7.         RecommenderBuilder rb1 = RecommenderEvaluator.userCityBlock(dataModel);
  8.         RecommenderBuilder rb2 = RecommenderEvaluator.itemLoglikelihood(dataModel);
  9.         LongPrimitiveIterator iter = dataModel.getUserIDs();
  10.         while (iter.hasNext()) {
  11.             long uid = iter.nextLong();
  12.             System.out.print("userCityBlock    =>");
  13.             filterOutdate(uid, rb1, dataModel);
  14.             System.out.print("itemLoglikelihood=>");
  15.             filterOutdate(uid, rb2, dataModel);
  16.         }
  17.     }
  18.     public static void filterOutdate(long uid, RecommenderBuilder recommenderBuilder, DataModel dataModel) throws TasteException, IOException {
  19.         Set jobids = getOutdateJobID("datafile/job/job.csv");
  20.         IDRescorer rescorer = new JobRescorer(jobids);
  21.         List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM, rescorer);
  22.         RecommendFactory.showItems(uid, list, true);
  23.     }
  24.     public static Set getOutdateJobID(String file) throws IOException {
  25.         BufferedReader br = new BufferedReader(new FileReader(new File(file)));
  26.         Set jobids = new HashSet();
  27.         String s = null;
  28.         while ((s = br.readLine()) != null) {
  29.             String[] cols = s.split(",");
  30.             SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd");
  31.             Date date = null;
  32.             try {
  33.                 date = df.parse(cols[1]);
  34.                 if (date.getTime() < df.parse("2013-01-01").getTime()) {
  35.                     jobids.add(Long.parseLong(cols[0]));
  36.                 }
  37.             } catch (ParseException e) {
  38.                 e.printStackTrace();
  39.             }
  40.         }
  41.         br.close();
  42.         return jobids;
  43.     }
  44. }
  45. class JobRescorer implements IDRescorer {
  46.     final private Set jobids;
  47.     public JobRescorer(Set jobs) {
  48.         this.jobids = jobs;
  49.     }
  50.     @Override
  51.     public double rescore(long id, double originalScore) {
  52.         return isFiltered(id) ? Double.NaN : originalScore;
  53.     }
  54.     @Override
  55.     public boolean isFiltered(long id) {
  56.         return jobids.contains(id);
  57.     }
  58. }
复制代码
控制台输出:只截取部分结果
  1. ...
  2. itemLoglikelihood=>uid:965,(200,0.829600)(122,0.748417)(170,0.736340)
  3. userCityBlock    =>uid:966,(114,0.250000)
  4. itemLoglikelihood=>uid:966,(114,1.516898)(101,0.864536)(99,0.856057)
  5. userCityBlock    =>uid:967,
  6. itemLoglikelihood=>uid:967,(105,0.873100)(114,0.725016)(168,0.707119)
  7. userCityBlock    =>uid:968,
  8. itemLoglikelihood=>uid:968,(174,0.735004)(39,0.696716)(185,0.696171)
  9. userCityBlock    =>uid:969,
  10. itemLoglikelihood=>uid:969,(197,0.723203)(81,0.710230)(167,0.668358)
  11. userCityBlock    =>uid:970,
  12. itemLoglikelihood=>uid:970,(13,0.748417)(122,0.748417)(28,0.736340)
  13. userCityBlock    =>uid:971,
  14. itemLoglikelihood=>uid:971,(28,1.540753)(174,1.511881)(39,1.435575)
  15. userCityBlock    =>uid:972,
  16. itemLoglikelihood=>uid:972,(14,0.800605)(60,0.794088)(163,0.710230)
  17. userCityBlock    =>uid:973,
  18. itemLoglikelihood=>uid:973,(56,0.795529)(13,0.712680)(120,0.701026)
  19. userCityBlock    =>uid:974,(19,0.200000)
  20. itemLoglikelihood=>uid:974,(145,1.994049)(89,1.578694)(19,1.435193)
  21. ...
复制代码
我们查看uid=994的用户推荐信息:
搜索pv.csv:
  1. > pv[which(pv$userid==974),]
  2.      userid jobid
  3. 2426    974   106
  4. 2427    974   173
  5. 2428    974    82
  6. 2429    974   188
  7. 2430    974    78
复制代码
搜索job.csv:
  1. > job[job$jobid %in% c(19,145,89),]
  2.     jobid create_date salary
  3. 19     19  2013-05-23   5700
  4. 89     89  2013-06-15   8400
  5. 145   145  2013-08-02   6800
复制代码
排除过期的职位比较,我们发现userCityBlock结果都是19,itemLoglikelihood的第2,3的结果被替换为了得分更低的89和19。
4).RecommenderFilterSalaryResult.java,排除工资过低的职位
我们查看uid=994的用户,浏览过的职位。
  1. > job[job$jobid %in% c(106,173,82,188,78),]
  2.     jobid create_date salary
  3. 78     78  2012-01-29   6800
  4. 82     82  2010-07-05   7500
  5. 106   106  2011-04-25   5200
  6. 173   173  2013-09-13   5200
  7. 188   188  2010-07-14   6000
复制代码
平均工资为=6140,我们觉得用户的浏览职位的行为,一般不会看比自己现在工资低的职位,因此设计算法,排除工资低于平均工资80%的职位,即排除工资小于4912的推荐职位(6140*0.8=4912)

大家可以参考上文中RecommenderFilterOutdateResult.java,自行实现。

这样,我们就完成用Mahout构建职位推荐引擎的算法。如果没有Mahout,我们自己写这个算法引擎估计还要花个小半年的时间,善加利用开源技术会帮助我们飞一样的成长!!





本帖被以下淘专辑推荐:

已有(2)人评论

跳转到指定楼层
anyhuayong 发表于 2014-11-10 08:41:17
好资源,收藏了,楼主辛苦
回复

使用道具 举报

net211211 发表于 2015-3-18 11:19:09
好东西。学些一下。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条