浅谈Spark几种不同的任务提交相关脚本(以Spark 1.5.0为例)

本节主要内容

  1. spark-shell
  2. spark-submit
  3. spark-sql
  4. spark-class
  5. 总结

1. spark-shell

spark-shell脚本文件内容如下:

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Shell script for starting the Spark Shell REPL

cygwin=false
case "`uname`" in
  CYGWIN*) cygwin=true;;
esac

# Enter posix mode for bash
set -o posix

export FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]"

# SPARK-4161: scala does not assume use of the java classpath,
# so we need to add the "-Dscala.usejavacp=true" flag manually. We
# do this specifically for the Spark shell because the scala REPL
# has its own class loader, and any additional classpath specified
# through spark.driver.extraClassPath is not automatically propagated.
SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"

function main() {
  if $cygwin; then
    # Workaround for issue involving JLine and Cygwin
    # (see http://sourceforge.net/p/jline/bugs/40/).
    # If you're using the Mintty terminal emulator in Cygwin, may need to set the
    # "Backspace sends ^H" setting in "Keys" section of the Mintty options
    # (see https://github.com/sbt/sbt/issues/562).
    stty -icanon min 1 -echo > /dev/null 2>&1
    export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
    #调用spark-submit脚本,传入class org.apache.spark.repl.Main类
    #、spark程序名称"Spark shell"及所有传入该脚本的参数
    "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
    stty icanon echo > /dev/null 2>&1
  else
    export SPARK_SUBMIT_OPTS
    "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
  fi
}

# Copy restore-TTY-on-exit functions from Scala script so spark-shell exits properly even in
# binary distribution of Spark where Scala is not installed
exit_status=127
saved_stty=""

# restore stty settings (echo in particular)
function restoreSttySettings() {
  stty $saved_stty
  saved_stty=""
}

function onExit() {
  if [[ "$saved_stty" != "" ]]; then
    restoreSttySettings
  fi
  exit $exit_status
}

# to reenable echo if we are interrupted before completing.
trap onExit INT

# save terminal settings
saved_stty=$(stty -g 2>/dev/null)
# clear on error so we don't later try to restore them
if [[ ! $? ]]; then
  saved_stty=""
fi

main "$@"

# record the exit status lest it be overwritten:
# then reenable echo and propagate the code.
exit_status=$?
onExit

2. spark-sql

spark-sql脚本内容如下:

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

export FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
export _SPARK_CMD_USAGE="Usage: ./bin/spark-sql [options] [cli option]"
#同样,通过spark-submit脚本提交任务
#只不过传入的类是SparkSQLCLIDriver
exec "$FWDIR"/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@"

3. spark-submit

spark-submit脚本内容如下:

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"

# disable randomized hash for string in Python 3.3+
export PYTHONHASHSEED=0

#spark-submit最终调用的是spark-class脚本
#传入的类是org.apache.spark.deploy.SparkSubmit
#及其它传入的参数
exec "$SPARK_HOME"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"

4. spark-class

spark-class脚本内容如下:

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Figure out where Spark is installed
#定位SAPRK_HOME目录
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"

#加载load-spark-env.sh,运行环境相关信息
#例如配置文件conf下的spark-env.sh等
. "$SPARK_HOME"/bin/load-spark-env.sh

# Find the java binary
# 定位JAVA_HOME目录
if [ -n "${JAVA_HOME}" ]; then
  RUNNER="${JAVA_HOME}/bin/java"
else
  if [ `command -v java` ]; then
    RUNNER="java"
  else
    echo "JAVA_HOME is not set" >&2
    exit 1
  fi
fi

# Find assembly jar
#定位spark-assembly-1.5.0-hadoop2.4.0.jar文件(以spark1.5.0为例)
#这意味着任务提交时无需将该JAR文件打包
SPARK_ASSEMBLY_JAR=
if [ -f "$SPARK_HOME/RELEASE" ]; then
  ASSEMBLY_DIR="$SPARK_HOME/lib"
else
  ASSEMBLY_DIR="$SPARK_HOME/assembly/target/scala-$SPARK_SCALA_VERSION"
fi

num_jars="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" | wc -l)"
if [ "$num_jars" -eq "0" -a -z "$SPARK_ASSEMBLY_JAR" ]; then
  echo "Failed to find Spark assembly in $ASSEMBLY_DIR." 1>&2
  echo "You need to build Spark before running this program." 1>&2
  exit 1
fi
ASSEMBLY_JARS="$(ls -1 "$ASSEMBLY_DIR" | grep "^spark-assembly.*hadoop.*\.jar$" || true)"
if [ "$num_jars" -gt "1" ]; then
  echo "Found multiple Spark assembly jars in $ASSEMBLY_DIR:" 1>&2
  echo "$ASSEMBLY_JARS" 1>&2
  echo "Please remove all but one jar." 1>&2
  exit 1
fi

SPARK_ASSEMBLY_JAR="${ASSEMBLY_DIR}/${ASSEMBLY_JARS}"

LAUNCH_CLASSPATH="$SPARK_ASSEMBLY_JAR"

# Add the launcher build dir to the classpath if requested.
if [ -n "$SPARK_PREPEND_CLASSES" ]; then
  LAUNCH_CLASSPATH="$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
fi

export _SPARK_ASSEMBLY="$SPARK_ASSEMBLY_JAR"

# The launcher library will print arguments separated by a NULL character, to allow arguments with
# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating
# an array that will be used to exec the final command.
#执行org.apache.spark.launcher.Main作为Spark应用程序的主入口
CMD=()
while IFS= read -d '' -r ARG; do
  CMD+=("$ARG")
done < <("$RUNNER" -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@")
exec "${CMD[@]}"

5. 总结

通过上述脚本的源码可以看到 spark-shell、spark-sql实现方式都是通过调用spark-submit脚本来实现的,而spark-submit又是通过spark-class脚本来实现的,spark-class脚本最终执行org.apache.spark.launcher.Main,作为整个Spark程序的主入口

时间: 2024-10-30 04:43:00

浅谈Spark几种不同的任务提交相关脚本(以Spark 1.5.0为例)的相关文章

浅谈linux几种定时函数的使用_Linux

在程序开发过程中,我们时不时要用到一些定时器,通常如果时间精度要求不高,可以使用sleep,uslepp函数让进程睡眠一段时间来实现定时, 前者单位为秒(s),后者为微妙(us):但有时候我们又不想让进程睡眠阻塞在哪儿,我们需要进程正常执行,当到达规定的时间时再去执行相应的操作, 在linux下面我们一般使用alarm函数跟setitimer函数来实现定时功能: 下面对这两个函数进行详细分析: (1)alarm函数 alarm也称为闹钟函数,它可以在进程中设置一个定时器,当定时器指定的时间到时,

浅谈C语言函数调用参数压栈的相关问题_C 语言

参数入栈的顺序 以前在面试中被人问到这样的问题,函数调用的时候,参数入栈的顺序是从左向右,还是从右向左.参数的入栈顺序主要看调用方式,一般来说,__cdecl 和__stdcall 都是参数从右到左入栈. 看下面的代码: #include <stdio.h> int test(int a, int b) { printf("address of a %x.\n", &a); printf("address of b %x.\n", &b)

浅谈Bootstrap的DatePicker日期范围选择_AJAX相关

用日期插件时,经常会有一种需求.两个input框选择.开始时间小于结束时间,结束时间大于开始时间,开始时间和结束时间都不大于当前时间. 我们当然可以用选择的结果来判断输入正确与否.但是更好的办法是让我们的日期选择插件做出一些限制. Bootstrap搭配了很优秀的日期选择插件.DatePicker和DateTimePicker. 两者功能很类似.使用方法也是差不多的.DatePicker支持更多的事件和设置. 看api知道日期变化的时候会有一个事件changeDate.当选择的日期变化的时候,会

浅谈时下几种网站建设方式的好坏

大多站长在谈到网站建设可能脱口而出的是建站公司,大家可能会认为只有这种专业.有技术含量的建站方式才可以称的上网站建设.其实这种观点是错误的,只要你是一名站长你维护这个网站的活动就是网站建设,也可以说是网站运营(目前来说,设计,优化,推广都是"混搭"着来的).总之,你也是你自己的"建站大师".下面笔者就自己的建站经历来谈谈网站建站那点事. 专业建站公司提供建站服务 这种建站方式是网上最专业的建站方法之一,因为专业建站公司是专门混这碗饭的,其专业的知识和技术能针对委托人

浅谈:五种最适合做移动站的行业

中介交易 http://www.aliyun.com/zixun/aggregation/6858.html">SEO诊断 淘宝客 云主机 技术大厅 不久前,我给一个pc端的网站做了一个手机站,短短一个星期,实现了网站流量翻倍.后台统计数据显示pc端流量和手机端流量平分秋色. 现 在很多中小站长还在权衡是否做一个手机站,也可以换句话说,站长们考虑的是做出了手机站是否能达到预期的效果,花费了许多精力建设的手机站点没有流量,没 有用户,做了也白搭.其实,现在所有的pc端都应该做一个手机端,就目

浅谈javascript:两种注释,声明变量,定义函数_基础知识

javascript:单行注释用//呵呵呵呵:多行注释用/*hdhdhdh*/ javascript中区别大小写,定义变量使用关键字var,语法如下:var 变量名,其中变量从编程角度讲,变量是用于存储某种/某些数值的存储器. javascript函数调用:

浅谈Ajax技术实现页面无刷新_AJAX相关

ajax (ajax开发) AJAX即"Asynchronous Javascript And XML"(异步JavaScript和XML),是指一种创建交互式网页应用的网页开发技术. AJAX = 异步 JavaScript和XML(标准通用标记语言的子集). AJAX 是一种用于创建快速动态网页的技术. 通过在后台与服务器进行少量数据交换,AJAX 可以使网页实现异步更新.这意味着可以在不重新加载整个网页的情况下,对网页的某部分进行更新. 传统的网页(不使用 AJAX)如果需要更新

浅谈字节序(Byte Order)及其相关操作

最近在为Tokyo Tyrant写一个.NET客户端类库.Tokyo Tyrant公开了一个基于 TCP协议的二进制协议,于是我们的工作其实也只是按照协议发送和读取一些二进 制数据流而已,并不麻烦.不过在其中涉及到了 "字节序"的概念,这本是计算 机体系结构/操作系统等课程的基础,不过我还是打算在这里进行简单说明,并且 对.NET中部分类库在此类数据流处理时的注意事项进行些许记录与总结. 字节序(Byte Order) 说到程序间的通信,说到底便是发送数据流.我们一般把字节(byte)

浅谈:如何通过百度下拉和相关搜索选择关键词

中介交易 SEO诊断 淘宝客 云主机 技术大厅 上两篇文章写到关键词密度和标题,描述的规则,关键词匹配度等,如没有看的兄弟姐妹们是否已经看了呢?那你怎么去选择标题关键词和描述呢?网站首页标题写得好与不好直接影响百度排名和用户引点,那如何写呢?在做任何事情先,都需要总结下前辈留下的经验. 百度数据 工具:百度下拉,百度相关,百度指数,百度推广端,还有本身装的百度统计.从这些工具中分析出来的数据,总结需要做的关键词,如何呢? 百度下拉和百度搜索相关 百度下拉和百度相关的数据是怎么体现出来,用户经常搜