分享

java或者scala如何生成parquet文件

九剑问天 发表于 2016-8-12 16:58:16 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 1 15004
最近需要使用parquet文件类型,但没有搜到怎么使用java和scala生成这种文件,都是使用spark/hadoop/hive之类的,有没有办法直接使用java/scala输出parquet文件呢?

已有(1)人评论

跳转到指定楼层
arsenduan 发表于 2016-8-12 18:12:51
很多的:
java代码实例parquet.hadoop.parquetwriter

例子1:
[mw_shl_code=java,true]private boolean closeFile() {
  boolean retval = false;


  if( data.parquetWriters != null ) {
    Iterator<ParquetWriter> openFiles = data.parquetWriters.iterator();
    while ( openFiles.hasNext() ) {
      ParquetWriter writer = openFiles.next();
      if ( writer != null ) {
        try {
          writer.close();
        } catch ( Exception e ) {
          logBasic( "Error trying to close file.  This may not be a problem.");
          logDetailed( "Stack trace from error trying to close file:", e );
        }
        writer = null;
      }
    }

    if ( log.isDebug() ) {
      logDebug( "Closed all open parquet writers." );
    }
  }



  return retval;
}
[/mw_shl_code]


例子2:


[mw_shl_code=java,true]public ParquetOutputData() {

  super();

   daf = new SimpleDateFormat();
  dafs = new DateFormatSymbols();

  defaultDateFormat = new SimpleDateFormat();
  defaultDateFormatSymbols = new DateFormatSymbols();

  openFiles = new ArrayList<String>();
  parquetWriters = new ArrayList<ParquetWriter>();

}
  
[/mw_shl_code]

例子3:
[mw_shl_code=java,true]@Override
public void open() {
  Preconditions.checkState(state.equals(ReaderWriterState.NEW),
    "Unable to open a writer from state:%s", state);

  logger.debug(
    "Opening data file with pathTmp:{} (final path will be path:{})",
    pathTmp, path);

  try {
    CompressionCodecName codecName = CompressionCodecName.UNCOMPRESSED;
    if (enableCompression) {
       if (SnappyCodec.isNativeCodeLoaded()) {
         codecName = CompressionCodecName.SNAPPY;
       } else {
         logger.warn("Compression enabled, but Snappy native code not loaded. " +
             "Parquet file will not be compressed.");
       }
    }
    avroParquetWriter = new AvroParquetWriter<E>(fileSystem.makeQualified(pathTmp),
        schema, codecName, DEFAULT_BLOCK_SIZE,
        ParquetWriter.DEFAULT_PAGE_SIZE);
  } catch (IOException e) {
    throw new DatasetWriterException("Unable to create writer to path:" + pathTmp, e);
  }

  state = ReaderWriterState.OPEN;
}[/mw_shl_code]


更多
http://www.aboutyun.com/home.php ... do=blog&id=3078
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条