Hive Bulk Export Action
The Hive Bulk Export action plugin is available in the Hub.
Plugin version: 1.9.0-1.1.0
The Hive Bulk Export action takes a SELECT query as input and runs that query on a Hive table. It stores the results under the provided HDFS directory. When the SELECT query is provided to the plugin, it converts that SELECT query to INSERT OVERWRITE DIRECTORY Hive statement. When this query is executed, Hive starts a MapReduce job which stores the results to provided directory location. So there can be multiple files in a given directory location.
Important: Hive Export works with Hive 2.3.3.
If any query other than a valid SELECT query is provided, Hive Bulk Export will fail to publish the pipeline. This is because CDAP uses Apache Calcite to parse the SELECT query to verify that it's not any other SQL query.
To run the SELECT query, if the Overwrite Output Directory property is set to no
, the pipeline publish will fail if the output directory already exists. In that case, either remove the directory or allow the directory to be overwritten by setting the Overwrite Output Directory property to yes
.
You might use Hive Export Action to execute a SELECT query on Hive table(s) and write the results in a provided directory location in csv format.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Hive Metastore Username | Yes | User identity for connecting to the specified hive database. Required for databases that need authentication. Optional for databases that do not require authentication. |
Hive Metastore Password | Yes | Password to use to connect to the specified database. Required for databases that need authentication. Optional for databases that do not require authentication. |
JDBC Connection String | Yes | Required. JDBC connection string including database name. Use |
Select Statement | Yes | Required. Select command to select values from Hive table(s). |
Output Directory | Yes | Required. HDFS Directory path where exported data will be written. If it does not exist it will get created. If it already exists, we can either overwrite it or fail at publish time based on Overwrite Output Directory property. |
Overwrite Output Directory | Yes | If Default is |
Column Separator |
| Delimiter in the exported file. Values in each column is separated by this delimiter while writing to output file. Default is comma (,). |
Example
This example connects to a Hive database using the specified JDBC Connection String, which means it will connect to the ‘mydb’ database of a Hive instance running on ‘localhost’ and runs the SELECT query as ‘INSERT OVERWRITE DIRECTORY’ statement. It will use path directory /tmp/hive
and delimiter comma to write data into file(s).
Property | Value |
---|---|
Hive Metastore Username |
|
Hive Metastore Password |
|
JDBC Connection String |
|
Select Statement |
|
Output Directory |
|
Overwrite Output Directory | yes |
Column Separator | , |
Created in 2020 by Google Inc.