All articles · Metadata Management · Database Design & Metadata · Application Metadata · Metadata Tools · Products and News

Embarking on a Data Catalog Project

In my role as Data Success Manager at Dataedo, I often ask customers what led them to begin working on their data governance catalog. A frequent response is something like, “We have lots of data in various systems, but no one really knows what’s there; we need to understand our data better.” This seems to be a common situation and frankly, I’ve been there myself.

As organizations, we collect a lot of data. And it’s increasing. Yet we commonly have a “less than ideal” understanding of that data. Perhaps you have older (unscalable) data systems. Perhaps you have redundant data systems. Perhaps you have undocumented data systems. Perhaps you have poorly designed (and unsustainable) data systems. Whatever the situation, the fact that you’re reading this makes me think that you recognize the need to take control and bring order to the chaos.

My goal here isn’t to give you all the answers - I don’t know all the answers. Rather, my goal is to provide some pointers that hopefully will help, but more importantly, start you thinking about ways to make a difference. Each step you take toward an effective data governance catalog (i.e., toward providing a better understanding of your data systems) will open your eyes to other ways to add even more value.

Don’t try to “boil the ocean”

Trying to “do it all” and “get it done immediately” may sound good when presenting your project to management, but I suggest you resist the urge to make promises that will be very difficult to keep. You know your systems are confusing. It will take time to understand them.

My approach is to start with one, perhaps two, critical data systems to document. What you want to do is restrict your work to a portion of the data systems so you aren’t overwhelmed, and by concentrating on that portion, you can show value and get a more immediate return on your investment (time, effort, and funding).

Identify your audience

Who are the primary consumers of your data governance catalog? If you know who the audience is, you can focus your efforts on generating content of immediate value to them. For example, you will (or ought to) approach your project differently for a business audience than you would for a more technical group of data scientists. They have different needs and different expectations. When you have identified the audience, you can focus your attention on providing the information that group needs. Then expand the content later, as appropriate. In other words, knowing your audience helps you decide where to focus your effort, as I discuss in the next section.

Focus on the essentials

Along with knowing your audience, it is vital to put your effort into aspects of the data catalog that are most important. Analyze the situation—what will help your users understand the data better, a business glossary or the core of a data dictionary (descriptions of tables and columns)? Are users in desperate need of privacy classifications or can that wait while you concentrate on something else? Are there other things you can identify to quickly provide value to your audience?

As you prioritize the parts of the data governance catalog most needed by your audience, you build into your project structure that will guide your efforts. The idea here is to create clarity in your mind:

  1. What is the most important data source?
  2. Who needs the data catalog first?
  3. What is the most important aspect of the data catalog for that group?

Get help if you can

When you have the clarity I described above, you can establish an actual work plan. You may have limited understanding of the data system selected so it is a good idea to rely on others to help. Of course, this means you must find those who have the expertise needed and entice them to help you. Demonstrate the value of the project to them in terms they can appreciate. The data governance catalog will save them time and money in the long run. Help them to see the big picture.

The best approach is to enable others to help you. In other words, make it easy for them to help. If you’re working on descriptions of columns, for example, you might investigate the data yourself and propose descriptions that the subject matter expert can simply review and accept. This requires time on your part, but isn’t that effort worthwhile if it greatly increases the likelihood of getting the subject matter expert to help? If you send them a long list of columns with a request to provide descriptions, you might get a reply—after a very long time—or you might not get anything at all. You’re busy; so are they. The difference is that you’re already committed (or have been assigned) to creating the data catalog. They may not be committed (yet) so making it easy for them to participate increases the chances of success.

Remember that you have limits

Be kind to yourself. You’re probably doing the project in addition to your normal duties so you may not have the luxury of devoting a large amount of time to creating your data catalog all at once. However, if you continue to devote consistent effort to the project, you’ll soon see progress.

If your efforts aren’t consistent, you may lose momentum and feel lost, overwhelmed, or discouraged when you come back to the project after time spent working on other things. Spending as little as an hour or two a week will help you maintain focus and you’ll soon see the effects of your efforts. Of course, more time is better, but the key is consistency.

Conclusion

I applaud your desire to increase understanding of your data systems and I sympathize with how overwhelming this may feel. But you’re here! You know the value an effective data catalog can bring to your organization and the people working there.

Whatever the current state of order or disorder in your data systems, you can succeed. Time is required. Effort is required. As you plan your project, you can reduce stress by taking a “divide and conquer” approach. Be methodical and be consistent. Get help. Most of all, be kind to yourself and don’t get discouraged - the outcome is worth the work!

Richard Monk

Richard Monk is a seasoned professional with over 20 years of experience in the realm of data management, databases, and knowledge management. With a career spanning across diverse industries and roles, he has amassed a deep understanding of the challenges and opportunities inherent in these domains. As an instructor in our Dataedo Bootcamp, he brings real-world expertise to help you navigate technical and cultural challenges in the realm of data cataloging and management.

Comments are only visible when the visitor has consented to statistics cookies. To see and add comments please accept statistics cookies.
0
There are no comments. Click here to write the first comment.

Recommendations