Azure Data Lake Storage

19th July, 2022
Applies to: Dataedo 10.x versions, Article available also for: 24.x (current), 23.x
You are looking at documentation for an older release.
Switch to the documentation for Dataedo 24.x (current).

Azure Data Lake Storage is service designed for storing and analysis of Big Data files. One of the key features distinguishing it from Azure Blob Storage are hierarchical namespaces, which organize objects/files into a hierarchy of directories for efficient data access.

Dataedo provides a native Azure Data Lake Storage connector, which allows you to document objects/files stored in this service.

Authentication

In order to document Azure Data Lake Storage you will need to authenticate. Dataedo supports following options of authentication:

  • Acess Key
  • Azure Active Directory - Interactive
  • Connection string
  • Public container (no authentication)
  • Shared Access Signature - Account
  • Shared Access Signature - Directory
  • Shared Access Signature - URL

Out of all options, we recomend Azure Active Directory - Interactive authentication, as it provides best access control in Azure cloud.

How to find storage account name in Azure

Storage account name is a unique identifier of your account within the whole Azure cloud. You can find it in following way (we'll show only Azure Portal method, as others are more advanced):

  1. Sign into Azure Portal.
  2. Search for storage account and select Storage accounts: storage_account_search

  3. You will see a list of Storage Accounts, where name is first column. Copy name of an account you want to document:

storage_accounts_list

Access key

When you create an Azure Storage Account it receives automatically generated keys, that can be used to authenticate to Data Lake Storage in Dataedo.

How to find Access Key in Azure Portal

  1. Search for Storage Accounts, and open the one you would like to document.
  2. On the side tab, look for Access keys and open the page: access_keys

  3. Click Show button to reveal Key and copy it to Dataedo. You can use either of keys. Furhtermore, you can copy Storage account name on the page: access_keys_copy

  4. Paste values in Dataedo connector window:

dataedo_azure_storage_access_keys

Azure Active Directory - Interactive

You can use your Azure AD credentials to access Storage Account. All you need to provide is Account Name. Finding one is described in How to find storage account name in Azure section of this article. Once you click Connect in Dataedo window you wil have to interactively sign in with your Azure account.

azure_ad_interactive\

Azure Active Directory - Interactive authentication cannot be used to automate imports with dataedocmd, as it needs you to manually sign in before import can begin

Connection string

Connection string for Azure Data Storage contains all required information to connect to Storage Account, hence there is only one field in Dataedo connector for this type of authentication.

How to find Connection string in Azure Portal

  1. Search for Storage Accounts, and open the one you would like to document.
  2. On the side tab, look for Access keys and open the page: access_keys

  3. Click Show button to reveal Connection string and copy it to Dataedo. You can use either of conection strings. Furhtermore, you can copy Storage account name on the page:

account_connection_string

  1. Paste connection string in Dataedo connector window:

azure_storage_conn_string_dataedo

Public container

Some of the contaiers can be accessed publicly, without need of any authentication. In such case, the Dataedo will require you to provide storage account name and container name. Obtaining first one is decribed in How to find storage account name in Azure section. Public container name can be found in following way:

  1. Search for Storage Accounts, and open the one you would like to document.
  2. On the side tab, look for Containers and open that page: containers_azure_portal

  3. On the Containers page you will find list of containers avaiable in Storage Account. Make sure selected container has Container access level.

containers_list_azure_portal

Connecting to Azure Data Lake Storage

Connect

Once you filled required connection details, hit Connect button.

dataedo_connect

Once connected set the documentation title.

Select format (optional)

On next screen you will need to select format of files that are going to be documented. This step is optional, as if you will select files in different formats, Dataedo will automatically detect the format and use appropriate connector.

format_selection

Select files

On next screen select files that are to be documented. As stated earlier, files can be in various formats. In next step, Dataedo will try to document files based on their extensions.

files_select.

Once you selected files, hit Import button

Choose Objects to import

During this step you can check/unchek objects and their columns/fields for import. To verify columns/fields, select a file from the list. Furthermore, you can change format which Dataedo recognized, by expanding list in Type column.

choose_objects

Import

Once you're sure everything is correct hit Import button to import selected files.

Outcome

Selected files metadata was loaded into a new documentation.

data_lake_storage_outcome