Pyspark code runs more slowly than Pandas
My current project involves converting Pandas code to PySpark code. The code uses a number of transformations, including join, group by, and more. But I've come into a problem where my PySpark code is much slower than the identical Pandas code. The previous three months have been spent learning about PySpark, which I am…