Enterprises must be aware of the data sources in-house to use the data for analytics and reporting purposes. It is very important that the data within an enterprise adds value, is of good quality, is traceable and is accessible based on the need. All this boils down to the necessity of data governance for organizations - a key component of which is Data Catalog.
I have worked for a couple of years with data catalogs from Informatica, Alteryx and Microsoft and I have learned that development team is faced with some important challenges during implementation of data catalogs. I want to share it with you in this article.
1. Building a team with the right knowledge and relevant skillset
The team which would build a data catalog has to be a balanced mix of technical and business experts. Quick learners with a background of the data line of services, data cataloging tools and are aware of business value that data cataloging can add to the organization are a most essential part of the team, which can seem rare at times.
2. Cost of data cataloging tool
There are very few open-source tools in the market at this point. Even if an open-source data cataloging tool is chosen, it might not provide very user-friendly features out of the box.
3. Gathering the information on source systems across the organization
This can be a very challenging and time-consuming task to the team as it involves bringing the cross-functional teams on the same page as the data cataloging team and convincing them to invest time in sharing the information on data sources available in respective departments.
4. Lack of knowledge on best practices to implement a data catalog
Usually, the teams neglect the point of following the correct process to implement a data catalog to end up into a messy looking result after investing a huge amount of time and money. As data cataloging is not a day to day activity, it must be planned very well before implementation. Minute things such as incorrect naming conventions of the source systems, can prove very costly as they can mislead the users in understanding the catalog metadata.
5. Scalability of data cataloging tool
Not all tools support a huge amount of catalog metadata. The tools when loaded with a humongous amount of metadata, breakdown due to performance issues and some tools can only accommodate a limited amount of metadata on data assets. The evaluation by technical experts in the data cataloging team before implementation is very important as it could prove very costly in the later stages of the project.
6. Tool support for data sources
The data cataloging team should list down all the available data sources in the enterprise and evaluate if the tool chosen for data cataloging supports the extraction of metadata from all the sources systems out-of-the-box or the tool needs third party plugins. If this evaluation is not performed before implementation, there can be a risk of building an incomplete catalog at the end of the project.
Data cataloging projects are usually time and resource-consuming ones and no team would want to land up with a data catalog that cannot be used or maintained. It is advised to consider the above-mentioned points before and throughout the implementation of a data catalog.
Do’s for the data cataloging team
Here is my advice for the developers to do before you choose the tool and begin the implementation:
- Try to list down almost all the data sources available in the enterprise before implementation.
- Check if the data cataloging tool has extensive support for the types of source systems.
- Check if the data cataloging tool needs any third-party plugin to extract the metadata and evaluate the ease of doing it.
- Stick to the data cataloging best practices to avoid issues in the latter part of the project.
- Check on the scalability of the data cataloging tool – this can be useful in the hyper care/maintenance phase of the project.
- Document each phase of implementation of the project in detail to avoid the knowledge gap after passing on the project to other teams in the future.
- Share the knowledge and explain the importance of data cataloging to all the team members involved in the project which could prove to be a very important and motivating factor throughout the project.
I hope that this will give you a good start to kick-off your data catalog project.