All Questions
Tagged with google-cloud-dataproc google-cloud-bigtable
21
questions
1
vote
0
answers
50
views
Bigtable Read and Write using DataProc with compute engine results in Key not found
I am experimenting with reading and writing data in cloud BigTable using the DataProc compute engine and PySpark Job using spark-bigtable-connector. I got an example from spark-bigtable repo and ...
2
votes
0
answers
26
views
How to evaluate total execution connection time in BigTable from Spark by scala
I am attempting to measure the total execution time from Spark to BigTable. However, when I wrap the following code around the BigTable related function, it consistently shows only 0.5 seconds, ...
1
vote
1
answer
39
views
Is there any way to access Hive (Dataproc) through Nodejs or Python flask?
I'm trying to build a server which can access the data from the hive(Dataproc)/Bigquery/Bigtable like GCP services what is the procedure to connect and access the data from these DBS through NodeJS or ...
2
votes
2
answers
181
views
Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows
How do I find out root cause? (I'm reading from Casssandra and writing to Bigtable)
I've tried:
looking through Cassandra logs
eliminating columns in case it was a data issue
reducing spark.cassandra....
1
vote
1
answer
914
views
Flink-BigTable - Any connector?
I would like to use BigTable as a sink for a Flink job:
Is there a connector out-of-the-box ?
Can I use Datastream API ?
How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/...
2
votes
1
answer
359
views
Spark HBase/BigTable - Wide/sparse dataframe persistence
I want to persist to BigTable a very wide Spark Dataframe (>100,000 columns) that is sparsely populated (>99% of values are null) while keeping only non-null values (to avoid storage cost).
Is ...
3
votes
1
answer
470
views
Spark-BigTable - HBase client not closed in Pyspark?
I'm trying to execute a Pyspark statement that writes to BigTable within a Python for loop, which leads to the following error (job submitted using Dataproc). Any client not properly closed (as ...
2
votes
1
answer
318
views
Getting Issue while writing to BigTable using bulkput API after upgrading Spark and Scala Version
I'm writing into BigTable using JavaHBaseContext bulkput API. This is working fine with below spark and scala version
<spark.version>2.3.4</spark.version>
<scala.version>2.11.8</...
0
votes
1
answer
235
views
How do I pass in the google cloud project to the SHC BigTable connector at runtime?
I'm trying to access BigTable from Spark (Dataproc). I tried several different methods and SHC seems to be the cleanest for what I am trying to do and performs well.
https://github.com/...
0
votes
1
answer
990
views
Spark HBase to Google Dataproc and Bigtable migration
I have HBase Spark job running at AWS EMR cluster. Recently we moved to GCP. I transferred all HBase data to BigTable. Now I am running same Spark - Java/Scala job in Dataproc. Spark job failing as it ...
4
votes
0
answers
994
views
How to write data in Google Cloud Bigtable in PySpark application on dataproc?
I am using Spark on a Google Cloud Dataproc cluster and I would like to write in Bigtable in a PySpark job. As google connector for the same is not available, I am simply using google cloud bigtable ...
0
votes
1
answer
856
views
Dependency Issues for Cloud DataProc + Spark + Cloud BigTable with JAVA
I need to create an application to run on Cloud DataProc and process large BigTable writes, scans, and deletes in massively parallel fashion using Spark. This could be in JAVA (or Python if it's ...
2
votes
1
answer
795
views
Connection to Google Bigtable in Google Dataproc using HBase odbc driver
Has anyone already made a connection to Google Bigtable in Google Cloud Dataproc using any available HBase odbc driver? If yes, can you tell which ODBC you've used? Thanks
1
vote
1
answer
1k
views
Submitting Pig job from Google Cloud Dataproc does not add custom jars to Pig classpath
I'm trying to submit a Pig job via Google Cloud Dataproc and include a custom jar that implements a custom load function I use in the Pig script, but I can't find out how to do that.
Adding my custom ...
1
vote
2
answers
363
views
Can't connect to Bigtable to scan HTable data due to hardcoded managed=true in hbase client jars
I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following ...