Apache Spark: Tackling common issues

Apache Spark: Tackling common issues

If you are an Apache Spark user, it is likely that some of these issues would have bugged you at least once. Here's a list of three most common problems that Spark admins often complain about and how you can tackle them.

 

Remember, the best way to identify and tune errors is by having a complete understanding of your Spark setup and periodically analysing logs.

 

1. Memory: This has to be the first obvious problem. You might have often found yourself at the receiving end of Driver errors or Execute errors. This usually happens when the driver or executor has been configured with a memory limit that is not sufficient for the application to run. Generally, while moving from standalone cluster mode to YARN and Mesos, memory issues such as these could be common. On a side note, remember to distribute, instead of running everything on the local node.




2. Unreliable and slow programs: If you notice that your Spark programs are taking time to run and seem to exhibit unusual behavior, quite often this could be due to excessive data transfers and shuffling. A few tips to address this are:

  1. Using a broadcast variable to facilitate better joins between RDDs of different sizes.
  2. Using an Accumulator to update variable values parallely while executing.
  3. Avoiding operations that trigger shuffling.

jl      

 

3. Unstable streaming: A pipeline that can operate at scale 24/7 is a challenging target to achieve. However, try these steps to begin with and you just might see a difference.

  1. Fix RPC time-out exceptions: Nodes could  be  busy at times due to disk or CPU spikes. This configuration change might show performance improvements in streaming.
  2. Increase driver and executor memory.

 

Although Spark is faster for big data processing and easier to program, it is important to note that a thorough knowledge right from the implementation language to the kernel is necessary to ensure that Spark performs optimally and you are well equipped to address issues. After all, rectification solutions, for all servers, often emerge only from comprehensive understanding of your server environments.

                New to ADManager Plus?

                  New to ADSelfService Plus?