Nutch1.4无法继续执行爬取任务
# bin/nutch crawl hdfs://192.168.19.141:9000/user/root/urls -dir crawl -depth 200 -threads 20 -topN 100Warning: $HADOOP_HOME is deprecated.
12/04/11 19:29:32 WARN crawl.Crawl: solrUrl is not set, indexing will be skipped...
12/04/11 19:29:32 INFO crawl.Crawl: crawl started in: crawl
12/04/11 19:29:32 INFO crawl.Crawl: rootUrlDir = hdfs://192.168.19.141:9000/user/root/urls
12/04/11 19:29:32 INFO crawl.Crawl: threads = 20
12/04/11 19:29:32 INFO crawl.Crawl: depth = 200
12/04/11 19:29:32 INFO crawl.Crawl: solrUrl=null
12/04/11 19:29:32 INFO crawl.Crawl: topN = 100
12/04/11 19:29:32 INFO crawl.Injector: Injector: starting at 2012-04-11 19:29:32
12/04/11 19:29:32 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb
12/04/11 19:29:32 INFO crawl.Injector: Injector: urlDir: hdfs://192.168.19.141:9000/user/root/urls
12/04/11 19:29:32 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
执行到根据注入的列表生成待下载的地址库时,无法继续爬取信息,也未生成文件夹crawl
页:
[1]