SeaTunnel Setup

Standalone mode setup

Jacob Xie published on
2 min, 386 words

Categories: Doc

SeaTunnel standalone mode setup.

Spark

Choose a spark version from here to download, for instance (currently SeaTunnel only accepts Spark 2.0):

wget https://downloads.apache.org/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz

Next, extract the saved archive using tar:

tar xvf spark-*

And mv command:

sudo mv spark-* /opt/spark

Now we are going to configure Spark environment using echo:

echo "export SPARK_HOME=/opt/spark" >> ~/.profile
echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile
echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile

and load it:

source ~/.profile

We can now start standalone Spark Master Server by:

cd /opt/spark/sbin
./start-master.sh

Then we shall see Spark web user interface on http://localhost:8080/.

Next, we need to have a slave server to run:

./start-slave.sh spark://localhost:7077

To test Spark shell:

spark-shell

:q to exit Scala:

:q

Other basic commands:

  • start-master.sh

  • stop-master.sh

  • stop-slave.sh

  • start-all.sh

  • stop-all.sh

IMPORTANT!

If we're running Ubuntu on WSL, and we may see localhost: ssh: connect to host localhost port 22: Connection refused this error while trying to start a spark worker, we shall generate a new ssh key for localhost. According to this:

If openssh-server not installed:

sudo apt-get upgrade
sudo apt-get update
sudo apt-get install openssh-server
sudo service ssh start

Take the following steps to enable ssh for localhost:

cd ~/.ssh
ssh-keygen                          # generate a public/private rsa key pair; use the default options
cat id_rsa.pub >> authorized_keys   # to append the key to the authorized_keys file
chmod 640 authorized_keys           # to set restricted permissions
sudo service ssh restart            # to pickup recent changes
ssh localhost

SeaTunnel

Download SeaTunnel:

export version="2.1.2"
wget "https://archive.apache.org/dist/incubator/seatunnel/${version}/apache-seatunnel-incubating-${version}-bin.tar.gz"
tar -xzvf "apache-seatunnel-incubating-${version}-bin.tar.gz"

Test demo (spark.streaming.conf.template only works on Spark cluster):

cd "apache-seatunnel-incubating-${version}"
./bin/start-seatunnel-spark.sh \
--master local[4] \
--deploy-mode client \
--config ./config/spark.batch.conf.template

Docker

A simple docker demo

Use case

WIP

References