Apache Avro support

Applies to: Dataedo 23.x versions, Article available also for: 24.x (current), 10.x
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

Dataedo 9.3 added support for Avro files. Dataedo scans Avro files (or Avro schema provided in avsc file) and builds a structure seperately for each record. Such structure contains fields (schemas) that can be:

  • Primitive types
  • Unions
  • Nested records
  • Enums
  • Arrays
  • Maps
  • Fixed data type

If file contains only one schema definition, structure containing only this definition will be created.

Supported Metadata

Metadata

Dataedo reads following metadata for each field (schema) definition:

  • Name
  • Namespace (if exists)
  • Data Type
  • Nullability
  • Description

Namespace for complex data types is included in Data type column

Data profiling

Dataedo does not support profiling Avro files.

Importing file in Dataedo

To add Avro file:

  • right click on any database or Structures folder, choose Add Object, then Add/Import Structure, or
  • on main ribbon select Add Object then Structure/File, or
  • select Structures folder and on main ribbon select Add Structure/File.

Select Paste Document if you want to paste avro schema represented in JSON format or Import from File to select file saved locally on PC. Then choose Avro from available formats and either paste schema or point a binary avro file or avsc schema file. Then click Next to scan provided schema/file.

If provided file/pasted structure contains definition for only one record or schema, then as a result one Dataedo structure is created. Otherwise (if there is more than one record) Dataedo creates a structure listing all records in file, and one structure for each Avro record.

In following example avsc file containing definition for two records was scanned. Dataedo will produce following structures: - Structure of file: Structure of file

Structure of first record:

Structure of first record

Structure of second record

Structure of second record

If anything is wrong with file/pasted structure Dataedo will throw an Error with details of what is wrong. For example:

Structure error

Guide: Adding files to the catalog