spark
  • Spark技术分享
  • 安装
    • standalone 模式安装
  • 源码分析
    • SparkContext 源码分析
    • Spark通信原理(发件箱和收件箱)
    • Master启动源码分析
    • Worker启动源码分析
    • Master资源调度--Worker向Master注册
    • Master资源调度--SparkContext向所有master注册
    • CoarseGrainedExecutorBackend启动源码分析
  • 环境配置
    • 远程调试
Powered by GitBook
On this page
  • 前置条件
  • 配置
  • 环境变量配置
  • host配置
  • spark-default.conf 配置
  • spark-env.sh 配置
  • salves配置
  • 启动命令
  • 启动 master 命令
  • 停止master命令
  • 启动 worker 命令
  • 停止 worker 命令
  • 启动 history 命令
  • 停止 history 命令
  • 启动 spark-shell 命令
  • 提交命令
  • spark-submit 命令

Was this helpful?

  1. 安装

standalone 模式安装

spark-1.6.0-cdh5.15.0 standalone 模式安装

前置条件

  • jdk ( 1.8.0_181)已安装

  • scala(2.10.7)已安装

  • hadoop(hadoop-2.6.0-cdh5.15.0)已安装,hdfs已启动

  • 第三方jar

spark 依赖jar 文件配置,放置在 $SPARK_HOME/lib/*

parquet-hadoop-1.4.3.jar
jackson-databind-2.4.4.jar
jackson-annotations-2.4.4.jar
jackson-core-2.4.4.jar
jackson-module-scala_2.10-2.4.4.jar


    <dependency>
      <groupId>com.twitter</groupId>
      <artifactId>parquet-hadoop</artifactId>
      <version>1.4.3</version>
    </dependency>

配置

环境变量配置

export JAVA_HOME=/opt/module/jdk/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SCALA_HOME=/opt/module/scala/scala-2.10.7
export HADOOP_HOME=/opt/module/bigdata/hadoop-2.6.0-cdh5.15.0
export SPARK_HOME=/opt/module/bigdata/spark-1.6.0-cdh5.15.0

export PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

host配置

#/etc/hosts文件配置 IP到域名的配置 
192.168.88.200  standalone.com   standalone

spark-default.conf 配置

spark.master=spark://standalone.com:7077
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://standalone.com:9000/spark/log/eventLog
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.driver.memory=2g


#history
spark.history.fs.logDirectory=hdfs://standalone.com:9000/spark/log/historyEventLog
#The port to which the web interface of the history server binds.
spark.history.ui.port=18080
#	The period at which information displayed by this history server is updated. Each update checks for any changes made to the event logs in persisted storage.
spark.history.fs.update.interval=10s
#	The number of application UIs to retain. If this cap is exceeded, then the oldest applications will be removed.
spark.history.retainedApplications=50
spark.history.fs.cleaner.enabled=false
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=7d
spark.history.ui.acls.enable=false

spark-env.sh 配置

export SPARK_DIST_CLASSPATH=${SPARK_HOME}/lib/*:$(${HADOOP_HOME}/bin/hadoop classpath)
SPARK_MASTER_IP=standalone.com
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_MEMORY=2g

SPARK_PRINT_LAUNCH_COMMAND=true  #显示Main启动类执行的命令

salves配置

standalone.com

启动命令

启动 master 命令

start-master.sh

停止master命令

stop-master.sh

启动 worker 命令

start-slave.sh spark://standalone.com:7077

停止 worker 命令

stop-slave.sh 

启动 history 命令

start-history-server.sh

停止 history 命令

stop-history-server.sh 

启动 spark-shell 命令

spark-shell --master spark://standalone.com:7077

提交命令

spark-submit 命令

spark-submit \
  --class com.opensource.bigdata.spark.standalone.RunTextFileMkString2 \
  --master spark://standalone:7077 \
  --executor-memory 1G \
  --total-executor-cores 100 \
  /root/temp/spark-scala-maven-1.0-SNAPSHOT.jar \
 spark-submit \
  --class com.opensource.bigdata.spark.standalone.RunTextFileMkString2 \
  --master spark://standalone:7077 \
  --deploy-mode client\
  --executor-memory 1G \
  --total-executor-cores 100 \
  /root/temp/spark-scala-maven-1.0-SNAPSHOT.jar \
PreviousSpark技术分享NextSparkContext 源码分析

Last updated 6 years ago

Was this helpful?