Newest 'google-cloud-dataproc+google-cloud-bigtable' Questions

1 vote

0 answers

50 views

Bigtable Read and Write using DataProc with compute engine results in Key not found

I am experimenting with reading and writing data in cloud BigTable using the DataProc compute engine and PySpark Job using spark-bigtable-connector. I got an example from spark-bigtable repo and ...

Suga Raj

541

asked Jun 26 at 17:07

2 votes

0 answers

26 views

How to evaluate total execution connection time in BigTable from Spark by scala

I am attempting to measure the total execution time from Spark to BigTable. However, when I wrap the following code around the BigTable related function, it consistently shows only 0.5 seconds, ...

Kuengaer

21

asked Jun 24 at 11:36

1 vote

1 answer

39 views

Is there any way to access Hive (Dataproc) through Nodejs or Python flask?

I'm trying to build a server which can access the data from the hive(Dataproc)/Bigquery/Bigtable like GCP services what is the procedure to connect and access the data from these DBS through NodeJS or ...

nandhabalan marimuthu

11

asked Mar 31, 2023 at 5:42

2 votes

2 answers

181 views

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

How do I find out root cause? (I'm reading from Casssandra and writing to Bigtable) I've tried: looking through Cassandra logs eliminating columns in case it was a data issue reducing spark.cassandra....

Adam C. Scott

79

asked Mar 30, 2023 at 20:29

1 vote

1 answer

914 views

Flink-BigTable - Any connector?

I would like to use BigTable as a sink for a Flink job: Is there a connector out-of-the-box ? Can I use Datastream API ? How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/...

py-r

439

asked Jan 28, 2021 at 18:38

2 votes

1 answer

359 views

Spark HBase/BigTable - Wide/sparse dataframe persistence

I want to persist to BigTable a very wide Spark Dataframe (>100,000 columns) that is sparsely populated (>99% of values are null) while keeping only non-null values (to avoid storage cost). Is ...

py-r

439

asked Jan 9, 2021 at 21:02

3 votes

1 answer

470 views

Spark-BigTable - HBase client not closed in Pyspark?

I'm trying to execute a Pyspark statement that writes to BigTable within a Python for loop, which leads to the following error (job submitted using Dataproc). Any client not properly closed (as ...

py-r

439

asked Jan 2, 2021 at 14:13

2 votes

1 answer

318 views

Getting Issue while writing to BigTable using bulkput API after upgrading Spark and Scala Version

I'm writing into BigTable using JavaHBaseContext bulkput API. This is working fine with below spark and scala version <spark.version>2.3.4</spark.version> <scala.version>2.11.8</...

arunK

408

asked Nov 17, 2020 at 15:04

0 votes

1 answer

235 views

How do I pass in the google cloud project to the SHC BigTable connector at runtime?

I'm trying to access BigTable from Spark (Dataproc). I tried several different methods and SHC seems to be the cleanest for what I am trying to do and performs well. https://github.com/...

Constantijn Visinescu

752

asked May 29, 2018 at 20:03

0 votes

1 answer

990 views

Spark HBase to Google Dataproc and Bigtable migration

I have HBase Spark job running at AWS EMR cluster. Recently we moved to GCP. I transferred all HBase data to BigTable. Now I am running same Spark - Java/Scala job in Dataproc. Spark job failing as it ...

nxverma

3

asked May 26, 2018 at 23:50

4 votes

0 answers

994 views

How to write data in Google Cloud Bigtable in PySpark application on dataproc?

I am using Spark on a Google Cloud Dataproc cluster and I would like to write in Bigtable in a PySpark job. As google connector for the same is not available, I am simply using google cloud bigtable ...

MANISH ZOPE

1,181

asked Apr 18, 2018 at 12:40

0 votes

1 answer

856 views

Dependency Issues for Cloud DataProc + Spark + Cloud BigTable with JAVA

I need to create an application to run on Cloud DataProc and process large BigTable writes, scans, and deletes in massively parallel fashion using Spark. This could be in JAVA (or Python if it's ...

VS_FF

2,363

asked Mar 12, 2018 at 21:32

2 votes

1 answer

795 views

Connection to Google Bigtable in Google Dataproc using HBase odbc driver

Has anyone already made a connection to Google Bigtable in Google Cloud Dataproc using any available HBase odbc driver? If yes, can you tell which ODBC you've used? Thanks

Red

1,301

asked Jan 29, 2018 at 7:49

1 vote

1 answer

1k views

Submitting Pig job from Google Cloud Dataproc does not add custom jars to Pig classpath

I'm trying to submit a Pig job via Google Cloud Dataproc and include a custom jar that implements a custom load function I use in the Pig script, but I can't find out how to do that. Adding my custom ...

EduBoom

147

asked Mar 29, 2017 at 16:36

1 vote

2 answers

363 views

Can't connect to Bigtable to scan HTable data due to hardcoded managed=true in hbase client jars

I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following ...

EduBoom

147

asked Mar 25, 2017 at 1:16

Collectives™ on Stack Overflow

All Questions

Bigtable Read and Write using DataProc with compute engine results in Key not found

How to evaluate total execution connection time in BigTable from Spark by scala

Is there any way to access Hive (Dataproc) through Nodejs or Python flask?

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

Flink-BigTable - Any connector?

Spark HBase/BigTable - Wide/sparse dataframe persistence

Spark-BigTable - HBase client not closed in Pyspark?

Getting Issue while writing to BigTable using bulkput API after upgrading Spark and Scala Version

How do I pass in the google cloud project to the SHC BigTable connector at runtime?

Spark HBase to Google Dataproc and Bigtable migration

How to write data in Google Cloud Bigtable in PySpark application on dataproc?

Dependency Issues for Cloud DataProc + Spark + Cloud BigTable with JAVA

Connection to Google Bigtable in Google Dataproc using HBase odbc driver

Submitting Pig job from Google Cloud Dataproc does not add custom jars to Pig classpath

Can't connect to Bigtable to scan HTable data due to hardcoded managed=true in hbase client jars

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags