Apache Spark SQL support (ODBC)

Applies to: Dataedo 24.x (current) versions, Article available also for: 10.x, 23.x

Apache Spark SQL is currently not officially supported. Connection is possible with generic ODBC driver. Metadata returned depends on driver version and provider.

We have tested and successfully connected to and imported metadata from Apache Spark SQL with ODBC drivers listed below. It is highly likely it will work with other drivers as well.

Tested ODBC Driver and environments:

We have tested and successfully connected to and imported metadata in following distributions:

  • Cloudera

  • Hortonworks

Hive ODBC version: 2.6.1 (64 bit)

Supported schema elements and metadata

Dataedo reads following metadata from Apache Spark SQL.

  • Tables
    • Columns
      • Data type with length
      • Nullable
  • Views (displayed as a table)
    • Columns
      • Data type with length
      • Nullable

Data Profiling

Dataedo does not support profiling in Apache Spark SQL.

Data Lineage

Dataedo does not support data lineage in Apache Spark SQL.

ODBC Driver

Download Hive ODBC driver

ODBC configuration

Learn more

Connect to Apache Spark SQL