Spark3.0 preview预览版可以下载使用,地址:https://archive.apache.org/dist/spark/spark-3.0.0-preview/,pom.xml也可以进行引用,如下:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0-preview</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-launcher -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-launcher_2.12</artifactId>
<version>3.0.0-preview</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.0-preview</version>
</dependency>
</dependencies>
注意:目前阿里云镜像部分包还没有(2019年11月10日,spark-launcher_2.12下载没有),可以用国外的。
测试代码:
object SparkPi {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.appName("Spark Pi")
.master("local[2]")
.config("spark.driver.resource.gpu.discoveryScript", "D:\\gpu.bat")
.config("spark.worker.resource.gpu.discoveryScript", "D:\\gpu.bat")
.config("spark.driver.resource.gpu.amount", 1)
.config("spark.executor.resource.gpu.amount", 1)
.config("spark.worker.resource.gpu.amount", 1)
.getOrCreate()
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x * x + y * y <= 1) 1 else 0
}.reduce(_ + _)
println(s"Pi is roughly ${4.0 * count / (n - 1)}")
spark.stop()
}
}
@echo off
echo {"name": "gpu", "addresses": ["0"]}
运行日志如下:
2019-11-10 00:39:33,429 [main] INFO [org.apache.spark.SparkContext] - Running Spark version 3.0.0-preview
2019-11-10 00:39:34,915 [main] INFO [org.apache.spark.resource.ResourceUtils] - ==============================================================
2019-11-10 00:39:34,918 [main] INFO [org.apache.spark.resource.ResourceUtils] - Resources for spark.driver:
gpu -> [name: gpu, addresses: 0]
2019-11-10 00:39:34,919 [main] INFO [org.apache.spark.resource.ResourceUtils] - ==============================================================
20
我以为可以成功调用GPU,查看任务管理器里面的GPU显示,并没有发现,最后搜索代码,在"spark-3.0.0-preview\core\src\main\scala\org\apache\spark\scheduler\local\LocalSchedulerBackend.scala"(85,56)显示如下:
def reviveOffers(): Unit = {
// local mode doesn't support extra resources like GPUs right now
val offers = IndexedSeq(new WorkerOffer(localExecutorId, localExecutorHostname, freeCores,
Some(rpcEnv.address.hostPort)))
for (task <- scheduler.resourceOffers(offers).flatten) {
freeCores -= scheduler.CPUS_PER_TASK
executor.launchTask(executorBackend, task)
}
}
注释:local mode doesn't support extra resources like GPUs right now
本地模式不支持GPU
心一凉,本来打算搭建standalone模式,最后看了一下window的搞不了,Linux的得个虚拟机了,比较笨资源有限,就暂不试了。