Highest scored 'google-cloud-dataproc' questions

83 votes

7 answers

81k views

What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

I am using Google Data Flow to implement an ETL data ware house solution. Looking into google cloud offering, it seems DataProc can also do the same thing. It also seems DataProc is little bit ...

KosiB

1,146

asked Sep 26, 2017 at 22:36

54 votes

7 answers

57k views

Google Cloud Platform: how to monitor memory usage of VM instances

I have recently performed a migration to Google Cloud Platform, and I really like it. However I can't find a way to monitor the memory usage of the Dataproc VM intances. As you can see on the ...

Daniele B

20.2k

asked May 16, 2017 at 1:40

23 votes

4 answers

3k views

job failing with ERROR: gcloud crashed (AttributeError): 'bool' object has no attribute 'lower' [closed]

We noticed our jobs are failing with the below error on the dataproc cluster. ERROR: gcloud crashed (AttributeError): 'bool' object has no attribute 'lower' If you would like to report this issue, ...

Poorna Noothalapati

283

asked May 2, 2023 at 23:06

17 votes

2 answers

19k views

Error: permission denied on resource project when launching Dataproc cluster

I was successfully able to launch a dataproc cluster by manually creating one via gcloud dataproc clusters create.... However, when I try to launch one through a script (that automatically provisions ...

claudiadast

429

asked Sep 26, 2017 at 19:26

16 votes

4 answers

14k views

Where is the Spark UI on Google Dataproc?

What port should I use to access the Spark UI on Google Dataproc? I tried port 4040 and 7077 as well as a bunch of other ports I found using netstat -pln Firewall is properly configured.

BAR

16.8k

asked Oct 18, 2015 at 0:35

16 votes

2 answers

5k views

Output from Dataproc Spark job in Google Cloud Logging

Is there a way to have the output from Dataproc Spark jobs sent to Google Cloud logging? As explained in the Dataproc docs the output from the job driver (the master for a Spark job) is available ...

Thomas Oldervoll

340

asked Dec 9, 2015 at 18:38

13 votes

3 answers

10k views

While submit job with pyspark, how to access static files upload with --files argument?

for example, i have a folder: / - test.py - test.yml and the job is submited to spark cluster with: gcloud beta dataproc jobs submit pyspark --files=test.yml "test.py" in the test.py, I want ...

lucemia

6,547

asked Jan 22, 2016 at 5:19

13 votes

4 answers

22k views

"No Filesystem for Scheme: gs" when running spark job locally

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder) When running the job locally on my Mac machine, I am getting the ...

Yaniv Donenfeld

663

asked Jan 5, 2015 at 15:41

12 votes

1 answer

14k views

How to install python packages in a Google Dataproc cluster

Is it possible to install python packages in a Google Dataproc cluster after the cluster is created and running? I tried to use "pip install xxxxxxx" in the master command line but it does not seem ...

Pablo Brenner

193

asked May 10, 2018 at 19:07

12 votes

2 answers

10k views

Which HBase connector for Spark 2.0 should I use? [closed]

Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector working with these versions. The Spark 2.0 and the new DataSet API support is ...

ogen

802

asked Dec 1, 2016 at 11:00

12 votes

1 answer

48k views

org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 in stage 11.0 failed 4 times

I am using Google Cloud Dataproc to do spark job and my editor is Zepplin. I was trying to write json data into gcp bucket. It succeeded before when I tried 10MB file. But failed with 10GB file. My ...

wwwwan

407

asked Apr 8, 2019 at 2:04

11 votes

4 answers

27k views

spark.sql.crossJoin.enabled for Spark 2.x

I am using the 'preview' Google DataProc Image 1.1 with Spark 2.0.0. To complete one of my operations I have to complete a cartesian product. Since version 2.0.0 there has been a spark configuration ...

Stijn

459

asked Aug 17, 2016 at 14:13

11 votes

1 answer

13k views

Incorrect memory allocation for Yarn/Spark after automatic setup of Dataproc Cluster

I'm trying to run Spark jobs on a Dataproc cluster, but Spark will not start due to Yarn being misconfigured. I receive the following error when running "spark-shell" from the shell (locally on the ...

habitats

2,391

asked Nov 8, 2015 at 21:37

11 votes

2 answers

4k views

Dataproc + BigQuery examples - any available?

According to the Dataproc docos, it has "native and automatic integrations with BigQuery". I have a table in BigQuery. I want to read that table and perform some analysis on it using the Dataproc ...

Graham Polley

14.7k

asked Oct 6, 2015 at 2:16

11 votes

1 answer

2k views

BigQuery connector for pyspark via Hadoop Input Format example

I have a large dataset stored into a BigQuery table and I would like to load it into a pypark RDD for ETL data processing. I realized that BigQuery supports the Hadoop Input / Output format https://...

Luca Fiaschi

3,205

asked Jul 14, 2015 at 8:11

Collectives™ on Stack Overflow

What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

Google Cloud Platform: how to monitor memory usage of VM instances

job failing with ERROR: gcloud crashed (AttributeError): 'bool' object has no attribute 'lower' [closed]

Error: permission denied on resource project when launching Dataproc cluster

Where is the Spark UI on Google Dataproc?

Output from Dataproc Spark job in Google Cloud Logging

While submit job with pyspark, how to access static files upload with --files argument?

"No Filesystem for Scheme: gs" when running spark job locally

How to install python packages in a Google Dataproc cluster

Which HBase connector for Spark 2.0 should I use? [closed]

org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 in stage 11.0 failed 4 times

spark.sql.crossJoin.enabled for Spark 2.x

Incorrect memory allocation for Yarn/Spark after automatic setup of Dataproc Cluster

Dataproc + BigQuery examples - any available?

BigQuery connector for pyspark via Hadoop Input Format example

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags