SparkContext 源码分析
SparkContext 源码分析,带Youtub视频
Youtub 视频分享
Youtub视频(Spark原理分析图解): https://youtu.be/euIuutjAB4I
Youtub视频(Spark源码分析详解): https://youtu.be/tUH7QnCcwgg
文档说明
原文
Main entry point for Spark functionality.
A SparkContext represents the connection to a Spark cluster,
and can be used to create RDDs, accumulators and broadcast variables on that cluster.
Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.
This limitation may eventually be removed; see SPARK-2243 for more details.
翻译
).Spark功能主要入口点
).一个SparkContext表示与一个Spark集群的连接
).在Spark集群上,能创建RDDs,累加器,广播变量
).每个JVM仅仅只有一个SparkContext可能是活动的
).在创建一个新的SparkContext之前,你必须停掉活动的SparkContext,这个限制最终可能被 移除,看SPARK-2243 更多详情
SparkContext原理图
xmind文件下载
https://github.com/opensourceteams/spark-scala-maven/blob/master/md/images/spark/SparkContext.xmind
配置信息
可配置信息
spark.jars = jar文件路径(可迭代的)
spark.files = 文件路径
spark.eventLog.dir=/tmp/spark-events // 事件日志目录
spark.eventLog.compress=false //事件日志是否压缩
spark.shuffle.manager=sort //指定shuffler manager
// Let the user specify short names for shuffle managers
val shortShuffleMgrNames = Map(
"hash" -> "org.apache.spark.shuffle.hash.HashShuffleManager",
"sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager",
"tungsten-sort" -> "org.apache.spark.shuffle.sort.SortShuffleManager")
val shuffleMgrName = conf.get("spark.shuffle.manager", "sort")
val shuffleMgrClass = shortShuffleMgrNames.getOrElse(shuffleMgrName.toLowerCase, shuffleMgrName)
val shuffleManager = instantiateClass[ShuffleManager](shuffleMgrClass)
spark.memory.useLegacyMode=true //指定内存管理器
val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", true) val memoryManager: MemoryManager = if (useLegacyMemoryManager) { new StaticMemoryManager(conf, numUsableCores) } else { UnifiedMemoryManager(conf, numUsableCores) }
spark.cores.max=2 设置executor占用cpu内核个数