You will master the essential skills of the open source Apache Spark framework and the scala programming language. Scala offers a lot of advanced programming features, but you don't need to use any of them when writing Spark code. Performance is mediocre when using Python programming code to make calls to Spark libraries, but if there is a lot of processing involved, Python code becomes much slower than the equivalent Scala code. You can use the basic programming features of Scala with the IntelliJ IDE and get useful features such as type hints and compile-time checks for free.
Scala allows the expression of general programming patterns in a very concise and effective format while minimising the number of lines of code. Data scientists often prefer to learn both Scala for Spark and Python for Spark, but Python is often the second favourite language for Apache Spark, as Scala came first. Choosing a programming language for Apache Spark is a subjective matter because the reasons why a particular data scientist or data analyst likes Python or Scala for Apache Spark may not always apply to others. Scala and Python languages are equally expressive in the context of Spark, so using Scala or Python can achieve the desired functionality.
Scala is also ideal for low-level Spark programming and for easy navigation directly to the underlying source code. There is a growing demand for Scala developers because big data companies value developers who can master a productive and robust programming language for data analysis and processing in Apache Spark. However, when there is significant processing logic, performance is an important factor and Scala definitely offers better performance than Python for programming against Spark. Scala allows developers to write efficient, readable and maintainable services without hanging program code in an unreadable web of call-backs.
Using Scala for Spark provides access to the latest features of the Spark framework, as they are first available in Scala and then ported to Python. I'm working on a project called bebe that I hope will provide the community with a secure, high-performance Scala programming interface. Mutable values, type inference, pattern matching, Scala collections and common operations on them (the basis of Spark's RDD API), really useful Scala types like case classes, tuples and options, effective use of the Spark shell (Scala interpreter), and common mistakes and how to avoid them. Scala is definitely the best choice for the Spark Streaming feature because Python Spark support is not as advanced and mature as Scala.