Spark UI

Kasim Ali
3 min readDec 27, 2022

--

In this post, you will learn about the Spark UI and how to speed up your Spark SQL queries. You will be learning about stage boundaries.

Spark UI | Kasim Data

The first thing you will do to learn about stage boundaries is to disable AQE.

Let's run a simple COUNT query on your data:

You will notice that there are two stages. You might be wondering, why this is broken into two steps and why can’t it be done in one step.

Spark is a bulk synchronous processing system, which means you will need all of your executors or slots to count up the records locally. Then we need some way of being able to aggregate these counts in the end. This is why you can see stage boundaries where the counts are executed and when the counts are reduced or aggregated.

A simpler way to understand this is assigning 8 of your friends to count sunflower seeds. Then, randomly assign one of your friends to get the total number by adding up all of the sunflower seeds. This is what we can observe in the stage boundaries.

Stage 1: Each friend or slot counts the sunflower seeds.

Stage 2: One friend is randomly assigned to tally the number of sunflower seeds.

One other important thing to consider is that having a very fast executor will not matter but having a very slow executor will matter. Since it is a bulk synchronous processing system every executor needs to have completed its job before the second stage (aggregating the counts) can begin.

Your executors might even be lifting the load unevenly in some situations, especially if you have too many shuffle partitions. So we are able to counter this by setting our shuffle partition parameter.

Below you can see how we can set the partitions to a high number. You can also use this code to set the partitions to something reasonable like 8.

SET spark.sql.shuffle.partitions=200

We are able to observe this imbalance when we have a high number of shuffle partitions.

Do let me know what you think of this post. I am still a learner myself and I would love to hear your thoughts. You are more than welcome to message me on LinkedIn or Twitter.

--

--

No responses yet