Resources
People often ask me "How did you learn how to hack?" The answer: by reading. This page is a collection of the blog posts and other articles that I have accumulated over the years of my journey. Enjoy!
Google Dataproc is a managed service that runs Apache Spark and Hadoop clusters for data analytics workloads. When creating an instance, the default allows for no internet access but computers in the same VPC can access the the service completely.
The Dataproc cluster contains a YARN Resource Manager on port 8088 and HDFS on port 9870. Neither of these require any authentication on them.
If an attacker has access to a vulnerable compute instance via an RCE bug, they can then access the Dataproc clusters. If they access the HDFS endpoint, they can browse through a file system to obtain sensitive data.
Their key takeaway of using an OSS project and hosting it without considering the security consequences is a good callout though. To me, the issue is on Google for using this incorrectly. To fix this, I'd personally add a better default network permissions in order to prevent this from happening. The authors are right - shells happen and is the public instance doesn't need access to it then it shouldn't have network access to it.