Apache Hive Metastore support

6th April, 2022
Applies to: Dataedo 10.x versions, Article available also for: 24.x (current), 23.x
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

Hive Metastore is a relational database that is part of Apache Hive and stores all the metadata information on tables, columns, data types, etc. of HDFS clusters. It can be hosted on Apache Derby, MySQL, MariaDB, Oracle, SQL Server or AzureSQL, PostgreSQL. Dataedo provdes a native connector to Hive Metastore database that allows to extract metadata about tables, views, and other database objects.

Not only Apache Hive can hold metadata in Hive Mestastore, other platforms can do it too. Therefore you can use Hive Metastore connector to document:

Connector

Supported Versions

Hive Metastore 3.x

Supported Metadata

  • Tables
    • Table type:
      • Managed Table
      • Index Table
      • External Table
    • Columns:
      • Data type
      • Nullable
      • Default value
      • Check constraint
      • Unqiue constraint
    • Primary keys
      • Columns
    • Foreign keys
      • Columns
    • Relations
  • Views
    • View type:
      • Materialized view
      • Virtual view
    • Columns (see Tables)

Data profiling

Datedo does not support data profiling in Hive Metastore.

Connect to Hive Metastore

Get connection details of Hive Metastore database

You can find all required details in hive-site.xml file which contains a configuration of Hive Metastore database connection. File is located in Hive installation directory in conf catalog. In most cases it will be: /etc/hive/conf/hive-site.xml. In this file you need to look for properties:

  • javax.jdo.option.ConnectionURL
  • javax.jdo.option.ConnectionUserName
  • javax.jdo.option.ConnectionPassword

You should find all of them in configuration element of hive-site.xml file.

<configuration>
....
....
....
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://10.10.10.10:3306/hive_metastore?createDatabaseIfNotExist=true</value>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>Hive</value>
</property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>verySecurePassword</value>
</property>
....
....
....
</configuration>

Once you have all the properties, you need to extract details:

  • javax.jdo.option.ConnectionUserName - Dataedo username.
  • javax.jdo.option.ConnectionPassword - Dataedo password.
  • javax.jdo.option.ConnectionURL - here you need to extract infromation from jdbc connection string which has format: jdbc:[database_engine]://[database_address]:[database_port]/[metastore_database]?others_options. Previous example of configuration uses MySQL jdbc connector, although others database engines follow convention.

In our example, collected data is:

  • database: mysql
  • host: 10.10.10.10
  • port: 3306
  • user: Hive
  • password: verySecurePassword
  • metastore_database: hive_metastore

Add new connection

To connect to Hive Metastore and create new documentation by clicking Add documentation and choosing Database connection.

Connection to SQL Server

On the connection screen choose Apache Hive Metastore as DBMS.

Hive Metastore selection

You need to select database engine hosting Hive Metastore:

Hive Metastore database

Connection details

Then you need to provide connection details accordingly for selected database:

Following connection example will be based on MySQL, although others databases will have similar fields required for connection:

  • Host - hostname or IP address of server on which database is available,
  • Port - port under which database is available,
  • User - username,
  • Password - password,
  • Metastore database - name of MySQL database hosting the metastore. You can expand list of databases by clicking ... button.
  • Database - name of Hive database. You can expand list of databases by clicking ... button. You will get a table of Hive databases with Database Name, Description, Catalog Name and Catalog description columns:

Hive_dbs

Obtaining all details is described in chapter How to get connection details of Hive Metastore database.

Hive MySQL conn details

Importing metadata

When connection was successful Dataedo will read objects and show a list of objects found. You can choose which objects to import. You can also use advanced filter to narrow down list of objects.

Hive Objects

Confirm list of objects to import by clicking Next.

Next screen allow you to change default name of the documentation under which it will be visible in Dataedo repository.

Hive documentation name

Click Import to start the import. Once import is done, close import window with Finish button.

Outcome

Your database has been imported to new documentation in the repository.

Hive documentation