使用stanford segmenter怎么分布式运行。我现在主要的问题是,该框架在加载分词器的时候会自动给你加入当前项目路径,使得我无法用上HDFS路径
代码如下
- private static final String basedir = System.getProperty("CRFUtils", "data");
- private static String[] files = {"test/test.simp.utf8"};
-
- public static void main(String[] args) throws Exception {
-
- System.setOut(new PrintStream(System.out, true, "utf-8"));
-
- Properties props = new Properties();
- props.setProperty("sighanCorporaDict", basedir);
- props.setProperty("serDictionary", basedir + "/dict-chris6.ser.gz");
- if (files.length > 0) {
- props.setProperty("testFile", files[0]);
- }
- props.setProperty("inputEncoding", "UTF-8");
- props.setProperty("sighanPostProcessing", "true");
-
- CRFClassifier<CoreLabel> segmenter = new CRFClassifier<CoreLabel>(props);
- segmenter.loadClassifierNoExceptions(basedir + "/ctb.gz", props); //这句话会加入当前项目路径
-
- //process input files
- for (String filename : files) {
- segmenter.classifyAndWriteAnswers(filename);
- }
- }
复制代码
有没有做过此类项目的前辈啊,跪求啊。
|