RnD_Alex 发表于 2013-10-25 10:42:52

Nutch1.4无法继续执行爬取任务

# bin/nutch crawl hdfs://192.168.19.141:9000/user/root/urls -dir crawl -depth 200 -threads 20 -topN 100
Warning: $HADOOP_HOME is deprecated.
12/04/11 19:29:32 WARN crawl.Crawl: solrUrl is not set, indexing will be skipped...
12/04/11 19:29:32 INFO crawl.Crawl: crawl started in: crawl
12/04/11 19:29:32 INFO crawl.Crawl: rootUrlDir = hdfs://192.168.19.141:9000/user/root/urls
12/04/11 19:29:32 INFO crawl.Crawl: threads = 20
12/04/11 19:29:32 INFO crawl.Crawl: depth = 200
12/04/11 19:29:32 INFO crawl.Crawl: solrUrl=null
12/04/11 19:29:32 INFO crawl.Crawl: topN = 100
12/04/11 19:29:32 INFO crawl.Injector: Injector: starting at 2012-04-11 19:29:32
12/04/11 19:29:32 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb
12/04/11 19:29:32 INFO crawl.Injector: Injector: urlDir: hdfs://192.168.19.141:9000/user/root/urls
12/04/11 19:29:32 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
执行到根据注入的列表生成待下载的地址库时,无法继续爬取信息,也未生成文件夹crawl
页: [1]
查看完整版本: Nutch1.4无法继续执行爬取任务