Skip to main content

All Questions

1 vote
0 answers
50 views

Bigtable Read and Write using DataProc with compute engine results in Key not found

I am experimenting with reading and writing data in cloud BigTable using the DataProc compute engine and PySpark Job using spark-bigtable-connector. I got an example from spark-bigtable repo and ...
Suga Raj's user avatar
  • 541
2 votes
0 answers
26 views

How to evaluate total execution connection time in BigTable from Spark by scala

I am attempting to measure the total execution time from Spark to BigTable. However, when I wrap the following code around the BigTable related function, it consistently shows only 0.5 seconds, ...
Kuengaer's user avatar
1 vote
1 answer
39 views

Is there any way to access Hive (Dataproc) through Nodejs or Python flask?

I'm trying to build a server which can access the data from the hive(Dataproc)/Bigquery/Bigtable like GCP services what is the procedure to connect and access the data from these DBS through NodeJS or ...
nandhabalan marimuthu's user avatar
2 votes
2 answers
181 views

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

How do I find out root cause? (I'm reading from Casssandra and writing to Bigtable) I've tried: looking through Cassandra logs eliminating columns in case it was a data issue reducing spark.cassandra....
Adam C. Scott's user avatar
1 vote
1 answer
914 views

Flink-BigTable - Any connector?

I would like to use BigTable as a sink for a Flink job: Is there a connector out-of-the-box ? Can I use Datastream API ? How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/...
py-r's user avatar
  • 439
2 votes
1 answer
359 views

Spark HBase/BigTable - Wide/sparse dataframe persistence

I want to persist to BigTable a very wide Spark Dataframe (>100,000 columns) that is sparsely populated (>99% of values are null) while keeping only non-null values (to avoid storage cost). Is ...
py-r's user avatar
  • 439
3 votes
1 answer
470 views

Spark-BigTable - HBase client not closed in Pyspark?

I'm trying to execute a Pyspark statement that writes to BigTable within a Python for loop, which leads to the following error (job submitted using Dataproc). Any client not properly closed (as ...
py-r's user avatar
  • 439
2 votes
1 answer
318 views

Getting Issue while writing to BigTable using bulkput API after upgrading Spark and Scala Version

I'm writing into BigTable using JavaHBaseContext bulkput API. This is working fine with below spark and scala version <spark.version>2.3.4</spark.version> <scala.version>2.11.8</...
arunK's user avatar
  • 408
0 votes
1 answer
235 views

How do I pass in the google cloud project to the SHC BigTable connector at runtime?

I'm trying to access BigTable from Spark (Dataproc). I tried several different methods and SHC seems to be the cleanest for what I am trying to do and performs well. https://github.com/...
Constantijn Visinescu's user avatar
0 votes
1 answer
990 views

Spark HBase to Google Dataproc and Bigtable migration

I have HBase Spark job running at AWS EMR cluster. Recently we moved to GCP. I transferred all HBase data to BigTable. Now I am running same Spark - Java/Scala job in Dataproc. Spark job failing as it ...
nxverma's user avatar
4 votes
0 answers
994 views

How to write data in Google Cloud Bigtable in PySpark application on dataproc?

I am using Spark on a Google Cloud Dataproc cluster and I would like to write in Bigtable in a PySpark job. As google connector for the same is not available, I am simply using google cloud bigtable ...
MANISH ZOPE's user avatar
  • 1,181
0 votes
1 answer
856 views

Dependency Issues for Cloud DataProc + Spark + Cloud BigTable with JAVA

I need to create an application to run on Cloud DataProc and process large BigTable writes, scans, and deletes in massively parallel fashion using Spark. This could be in JAVA (or Python if it's ...
VS_FF's user avatar
  • 2,363
2 votes
1 answer
795 views

Connection to Google Bigtable in Google Dataproc using HBase odbc driver

Has anyone already made a connection to Google Bigtable in Google Cloud Dataproc using any available HBase odbc driver? If yes, can you tell which ODBC you've used? Thanks
Red's user avatar
  • 1,301
1 vote
1 answer
1k views

Submitting Pig job from Google Cloud Dataproc does not add custom jars to Pig classpath

I'm trying to submit a Pig job via Google Cloud Dataproc and include a custom jar that implements a custom load function I use in the Pig script, but I can't find out how to do that. Adding my custom ...
EduBoom's user avatar
  • 147
1 vote
2 answers
363 views

Can't connect to Bigtable to scan HTable data due to hardcoded managed=true in hbase client jars

I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following ...
EduBoom's user avatar
  • 147

15 30 50 per page