spark 广播变量（Broadcast）源码分析-Spark-About云-梭伦科技

desehawk 发表于 2015-2-28 20:55:33

spark 广播变量（Broadcast）源码分析

问题导读

1.创建broadcast变量如何实现的？
2.如何读取广播变量的值？

static/image/hrline/4.gif

概述最近工作上忙死了……广播变量这一块其实早就看过了，一直没有贴出来。
本文基于Spark 1.0源码分析，主要探讨广播变量的初始化、创建、读取以及清除。

类关系BroadcastManager类中包含一个BroadcastFactory对象的引用。大部分操作通过调用BroadcastFactory中的方法来实现。
BroadcastFactory是一个Trait，有两个直接子类TorrentBroadcastFactory、HttpBroadcastFactory。这两个子类实现了对HttpBroadcast、TorrentBroadcast的封装，而后面两个又同时集成了Broadcast抽象类。
图……就不画了

BroadcastManager的初始化SparkContext初始化时会创建SparkEnv对象env，这个过程中会调用BroadcastManager的构造方法返回一个对象作为env的成员变量存在：

val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)
构造BroadcastManager对象时会调用initialize方法，主要根据配置初始化broadcastFactory成员变量，并调用其initialize方法。
val broadcastFactoryClass =
      conf.get("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")

   broadcastFactory =
      Class.forName(broadcastFactoryClass).newInstance.asInstanceOf

   // Initialize appropriate BroadcastFactory and BroadcastObject
   broadcastFactory.initialize(isDriver, conf, securityManager)两个工厂类的initialize方法都是对其相应实体类的initialize方法的调用，下面分开两个类来看。

HttpBroadcast的initialize方法
def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) {
synchronized {
   if (!initialized) {
   bufferSize = conf.getInt("spark.buffer.size", 65536)
   compress = conf.getBoolean("spark.broadcast.compress", true)
   securityManager = securityMgr
   if (isDriver) {
      createServer(conf)
      conf.set("spark.httpBroadcast.uri",serverUri)
   }
   serverUri = conf.get("spark.httpBroadcast.uri")
   cleaner = new MetadataCleaner(MetadataCleanerType.HTTP_BROADCAST, cleanup, conf)
   compressionCodec = CompressionCodec.createCodec(conf)
   initialized = true
   }
}
}
除了一些变量的初始化外，主要做两件事情，一是createServer（只有在Driver端会做），其次是创建一个MetadataCleaner对象。

createServer
private def createServer(conf: SparkConf) {
broadcastDir = Utils.createTempDir(Utils.getLocalDir(conf))
server = new HttpServer(broadcastDir, securityManager)
server.start()
serverUri = server.uri
logInfo("Broadcast server started at " + serverUri)
}首先创建一个存放广播变量的目录，默认是
conf.get("spark.local.dir",System.getProperty("java.io.tmpdir")).split(',')(0)然后初始化一个HttpServer对象并启动（封装了jetty），启动过程中包括加载资源文件，起端口和线程用来监控请求等。这部分的细节在org.apache.spark.HttpServer类中，此处不做展开。

创建MetadataCleaner对象一个MetadataCleaner对象包装了一个定时计划Timer，每隔一段时间执行一个回调函数，此处传入的回调函数为cleanup：
private def cleanup(cleanupTime: Long) {
val iterator = files.internalMap.entrySet().iterator()
while(iterator.hasNext) {
   val entry = iterator.next()
   val (file, time) = (entry.getKey, entry.getValue)
   if (time < cleanupTime) {
   iterator.remove()
   deleteBroadcastFile(file)
   }
}
}即清楚存在吵过一定时长的broadcast文件。在时长未设定（默认情况）时，不清除：
if (delaySeconds > 0) {
logDebug(
   "Starting metadata cleaner for " + name + " with delay of " + delaySeconds + " seconds " +
   "and period of " + periodSeconds + " secs")
timer.schedule(task, periodSeconds * 1000, periodSeconds * 1000)
}

TorrentBroadcast的initialize方法
def initialize(_isDriver: Boolean, conf: SparkConf) {
TorrentBroadcast.conf = conf // TODO: we might have to fix it in tests
synchronized {
   if (!initialized) {
   initialized = true
   }
}
}Torrent在此处没做什么，这也可以看出和Http的区别，Torrent的处理方式就是p2p，去中心化。而Http是中心化服务，需要启动服务来接受请求。

创建broadcast变量调用SparkContext中的 def broadcast(value: T): Broadcast方法来初始化一个广播变量，实现如下：
def broadcast(value: T): Broadcast = {
val bc = env.broadcastManager.newBroadcast(value, isLocal)
cleaner.foreach(_.registerBroadcastForCleanup(bc))
bc
}即调用broadcastManager的newBroadcast方法：
def newBroadcast(value_ : T, isLocal: Boolean) = {
broadcastFactory.newBroadcast(value_, isLocal, nextBroadcastId.getAndIncrement())
}
再调用工厂类的newBroadcast方法，此处返回的是一个Broadcast对象。

HttpBroadcastFactory的newBroadcast
def newBroadcast(value_ : T, isLocal: Boolean, id: Long) =
new HttpBroadcast(value_, isLocal, id)
即创建一个新的HttpBroadcast对象并返回。
构造对象时主要做两件事情：
HttpBroadcast.synchronized {
SparkEnv.get.blockManager.putSingle(
   blockId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
}

if (!isLocal) {
HttpBroadcast.write(id, value_)
}1.将变量id和值放入blockManager，但并不通知master
2.调用伴生对象的write方法
def write(id: Long, value: Any) {
val file = getFile(id)
val out: OutputStream = {
   if (compress) {
   compressionCodec.compressedOutputStream(new FileOutputStream(file))
   } else {
   new BufferedOutputStream(new FileOutputStream(file), bufferSize)
   }
}
val ser = SparkEnv.get.serializer.newInstance()
val serOut = ser.serializeStream(out)
serOut.writeObject(value)
serOut.close()
files += file
}write方法将对象值按照指定的压缩、序列化写入指定的文件。这个文件所在的目录即是HttpServer的资源目录，文件名和id的对应关系为：

case class BroadcastBlockId(broadcastId: Long, field: String = "") extends BlockId {
def name = "broadcast_" + broadcastId + (if (field == "") "" else "_" + field)
}
TorrentBroadcastFactory的newBroadcast方法def newBroadcast(value_ : T, isLocal: Boolean, id: Long) =
new TorrentBroadcast(value_, isLocal, id)同样是创建一个TorrentBroadcast对象，并返回。

TorrentBroadcast.synchronized {
SparkEnv.get.blockManager.putSingle(
   broadcastId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)
}

if (!isLocal) {
sendBroadcast()
}做两件事情，第一步和Http一样，第二步：
def sendBroadcast() {
val tInfo = TorrentBroadcast.blockifyObject(value_)
totalBlocks = tInfo.totalBlocks
totalBytes = tInfo.totalBytes
hasBlocks = tInfo.totalBlocks

// Store meta-info
val metaId = BroadcastBlockId(id, "meta")
val metaInfo = TorrentInfo(null, totalBlocks, totalBytes)
TorrentBroadcast.synchronized {
   SparkEnv.get.blockManager.putSingle(
   metaId, metaInfo, StorageLevel.MEMORY_AND_DISK, tellMaster = true)
}

// Store individual pieces
for (i <- 0 until totalBlocks) {
   val pieceId = BroadcastBlockId(id, "piece" + i)
   TorrentBroadcast.synchronized {
   SparkEnv.get.blockManager.putSingle(
      pieceId, tInfo.arrayOfBlocks(i), StorageLevel.MEMORY_AND_DISK, tellMaster = true)
   }
}
}可以看出，先将元数据信息缓存到blockManager，再将块信息缓存过去。开头可以看到有一个分块动作，是调用伴生对象的blockifyObject方法：
def blockifyObject(obj: T): TorrentInfo此方法将对象obj分块（默认块大小为4M），返回一个TorrentInfo对象，第一个参数为一个TorrentBlock对象（包含blockID和block字节数组）、块数量以及obj的字节流总长度。
元数据信息中的blockId为广播变量id+后缀，value为总块数和总字节数。
数据信息是分块缓存，每块的id为广播变量id加后缀及块变好，数据位一个TorrentBlock对象

读取广播变量的值通过调用bc.value来取得广播变量的值，其主要实现在反序列化方法readObject中

HttpBroadcast的反序列化
此方法将对象obj分块（默认块大小为4M），返回一个TorrentInfo对象，第一个参数为一个TorrentBlock对象（包含blockID和block字节数组）、块数量以及obj的字节流总长度。

元数据信息中的blockId为广播变量id+后缀，value为总块数和总字节数。

数据信息是分块缓存，每块的id为广播变量id加后缀及块变好，数据位一个TorrentBlock对象

读取广播变量的值
通过调用bc.value来取得广播变量的值，其主要实现在反序列化方法readObject中

HttpBroadcast的反序列化首先查看blockManager中是否已有，如有则直接取值，否则调用伴生对象的read方法进行读取：
def read(id: Long): T = {
logDebug("broadcast read server: " +serverUri + " id: broadcast-" + id)
val url = serverUri + "/" + BroadcastBlockId(id).name

var uc: URLConnection = null
if (securityManager.isAuthenticationEnabled()) {
   logDebug("broadcast security enabled")
   val newuri = Utils.constructURIForAuthentication(new URI(url), securityManager)
   uc = newuri.toURL.openConnection()
   uc.setAllowUserInteraction(false)
} else {
   logDebug("broadcast not using security")
   uc = new URL(url).openConnection()
}

val in = {
   uc.setReadTimeout(httpReadTimeout)
   val inputStream = uc.getInputStream
   if (compress) {
   compressionCodec.compressedInputStream(inputStream)
   } else {
   new BufferedInputStream(inputStream, bufferSize)
   }
}
val ser = SparkEnv.get.serializer.newInstance()
val serIn = ser.deserializeStream(in)
val obj = serIn.readObject()
serIn.close()
obj
}使用serverUri和block id对应的文件名直接开启一个HttpConnection将中心服务器上相应的数据取过来，使用配置的压缩和序列化机制进行解压和反序列化。
这里可以看到，所有需要用到广播变量值的executor都需要去driver上pull广播变量的内容。
取到值后，缓存到blockManager中，以便下次使用。

TorrentBroadcast的反序列化
private def readObject(in: ObjectInputStream) {
in.defaultReadObject()
TorrentBroadcast.synchronized {
   SparkEnv.get.blockManager.getSingle(broadcastId) match {
   case Some(x) =>
      value_ = x.asInstanceOf

   case None =>
      val start = System.nanoTime
      logInfo("Started reading broadcast variable " + id)

      // Initialize @transient variables that will receive garbage values from the master.
      resetWorkerVariables()

      if (receiveBroadcast()) {
         value_ = TorrentBroadcast.unBlockifyObject(arrayOfBlocks, totalBytes, totalBlocks)

         /* Store the merged copy in cache so that the next worker doesn't need to rebuild it.
         * This creates a trade-off between memory usage and latency. Storing copy doubles
         * the memory footprint; not storing doubles deserialization cost. Also,
         * this does not need to be reported to BlockManagerMaster since other executors
         * does not need to access this block (they only need to fetch the chunks,
         * which are reported).
         */
         SparkEnv.get.blockManager.putSingle(
         broadcastId, value_, StorageLevel.MEMORY_AND_DISK, tellMaster = false)

         // Remove arrayOfBlocks from memory once value_ is on local cache
         resetWorkerVariables()
      } else {
         logError("Reading broadcast variable " + id + " failed")
      }

      val time = (System.nanoTime - start) / 1e9
      logInfo("Reading broadcast variable " + id + " took " + time + " s")
   }
}
}和Http一样，都是先查看blockManager中是否已经缓存，若没有，则调用receiveBroadcast方法：
def receiveBroadcast(): Boolean = {
// Receive meta-info about the size of broadcast data,
// the number of chunks it is divided into, etc.
val metaId = BroadcastBlockId(id, "meta")
var attemptId = 10
while (attemptId > 0 && totalBlocks == -1) {
   TorrentBroadcast.synchronized {
   SparkEnv.get.blockManager.getSingle(metaId) match {
      case Some(x) =>
         val tInfo = x.asInstanceOf
         totalBlocks = tInfo.totalBlocks
         totalBytes = tInfo.totalBytes
         arrayOfBlocks = new Array(totalBlocks)
         hasBlocks = 0

      case None =>
         Thread.sleep(500)
   }
   }
   attemptId -= 1
}
if (totalBlocks == -1) {
   return false
}

/*
* Fetch actual chunks of data. Note that all these chunks are stored in
* the BlockManager and reported to the master, so that other executors
* can find out and pull the chunks from this executor.
*/
val recvOrder = new Random().shuffle(Array.iterate(0, totalBlocks)(_ + 1).toList)
for (pid <- recvOrder) {
   val pieceId = BroadcastBlockId(id, "piece" + pid)
   TorrentBroadcast.synchronized {
   SparkEnv.get.blockManager.getSingle(pieceId) match {
      case Some(x) =>
         arrayOfBlocks(pid) = x.asInstanceOf
         hasBlocks += 1
         SparkEnv.get.blockManager.putSingle(
         pieceId, arrayOfBlocks(pid), StorageLevel.MEMORY_AND_DISK, tellMaster = true)

      case None =>
         throw new SparkException("Failed to get " + pieceId + " of " + broadcastId)
   }
   }
}

hasBlocks == totalBlocks
}和写数据一样，同样是分成两个部分，首先取元数据信息，再根据元数据信息读取实际的block信息。注意这里都是从blockManager中读取的，这里贴出blockManager.getSingle的分析。
调用栈中最后到BlockManager.doGetRemote方法，中间有一条语句：
val locations = Random.shuffle(master.getLocations(blockId))即将存有这个block的节点信息随机打乱，然后使用：

val data = BlockManagerWorker.syncGetBlock(
   GetBlock(blockId), ConnectionManagerId(loc.host, loc.port))来获取。
从这里可以看出，Torrent方法首先将广播变量数据分块，并存到BlockManager中；每个节点需要读取广播变量时，是分块读取，对每一块都读取其位置信息，然后随机选一个存有此块数据的节点进行get；每个节点读取后会将包含的快信息报告给BlockManagerMaster，这样本地节点也成为了这个广播网络中的一个peer。
与Http方式形成鲜明对比，这是一个去中心化的网络，只需要保持一个tracker即可，这就是p2p的思想。

广播变量的清除
广播变量被创建时，紧接着有这样一句代码：

cleaner.foreach(_.registerBroadcastForCleanup(bc))cleaner是一个ContextCleaner对象，会将刚刚创建的广播变量注册到其中，调用栈为：
def registerBroadcastForCleanup(broadcast: Broadcast) {
registerForCleanup(broadcast, CleanBroadcast(broadcast.id))
}
private def registerForCleanup(objectForCleanup: AnyRef, task: CleanupTask) {
referenceBuffer += new CleanupTaskWeakReference(task, objectForCleanup, referenceQueue)
}
等出现广播变量被弱引用时（关于弱引用，可以参考：http://blog.csdn.net/lyfi01/article/details/6415726），则会执行
cleaner.foreach(_.start())
start方法中会调用keepCleaning方法，会遍历注册的清理任务（包括RDD、shuffle和broadcast），依次进行清理：
private def keepCleaning(): Unit = Utils.logUncaughtExceptions {
while (!stopped) {
   try {
   val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
      .map(_.asInstanceOf)
   reference.map(_.task).foreach { task =>
      logDebug("Got cleaning task " + task)
      referenceBuffer -= reference.get
      task match {
         case CleanRDD(rddId) =>
         doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
         case CleanShuffle(shuffleId) =>
         doCleanupShuffle(shuffleId, blocking = blockOnCleanupTasks)
         case CleanBroadcast(broadcastId) =>
         doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
      }
   }
   } catch {
   case e: Exception => logError("Error in cleaning thread", e)
   }
}
}doCleanupBroadcast调用以下语句：
broadcastManager.unbroadcast(broadcastId, true, blocking)
然后是：
def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {
broadcastFactory.unbroadcast(id, removeFromDriver, blocking)
}每个工厂类调用其对应实体类的伴生对象的unbroadcast方法。

HttpBroadcast中的变量清除
def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = synchronized {
SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)
if (removeFromDriver) {
   val file = getFile(id)
   files.remove(file)
   deleteBroadcastFile(file)
}
}1是删除blockManager中的缓存，2是删除本地持久化的文件

TorrentBroadcast中的变量清除def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = synchronized {
SparkEnv.get.blockManager.master.removeBroadcast(id, removeFromDriver, blocking)
}

小结Broadcast可以使用在executor端多次使用某个数据的场景（比如说字典），Http和Torrent两种方式对应传统的CS访问方式和P2P访问方式，当广播变量较大或者使用较频繁时，采用后者可以减少driver端的压力。
BlockManager在此处充当P2P中的tracker角色，没有展开描述，后续会开专题讲这个部分。

出处：http://blog.csdn.net/asongoficeandfire/article/details/37584643

马一博 发表于 2016-8-12 17:05:31

在1.6中，TorrentBroadcast没有了initialize方法，只有TorrentBroadcastFactory有一个 initialize方法，但是什么也不做：
override def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) { }

页: [1]

About云-梭伦科技's Archiver

spark 广播变量（Broadcast）源码分析