Data Quality Cartoons

Real World vs Data #3

Real World vs Data #3

Data is (is supposed to be) a representation of real world. But is it complete?

United States and the U.S.

United States and the U.S.

If this seems relevant, you might need Reference Data Management. Simply agreeing on different sets of values, allowed values (or codes) and their meaining.

Do We Trust This Data?

Do We Trust This Data?

Have you ever heard of GIGO? Garbage In - Garbage Out? Don't expect to get useful insights and directions from your dashboards if you feed it with garbage data.

Random Data

Random Data

Poor quality data can be useless or even dangerous.

Importance of a Where Clause

Importance of a Where Clause

It's really important to remember about all the conditions in your WHERE statement. You might get flooded with the results.

Data Cleansing

Data Cleansing

Data cleansing sounds like something you could outsource, right? Mostly not the case. Someone from outside can help you but they don't understand your business, processes, etc. That's why organizations assign data owners that are accountable for cleansing, and owners recruit and appoint data stewards from within the organization.

Best Selling Product

Best Selling Product

That is a simple report... Select from orders, group by product category, add a bar chart. Looking good. Send.

Confidence in Data

Confidence in Data

To trust data you should know its quality.

Data Confidence in Healthcare

Data Confidence in Healthcare

If it's about our health, they better get those numbers right.

Quality Over Quantity

Quality Over Quantity

Having a lot of data doesn't mean you are able to make many data-driven decisions. Maybe you'll just drown in data.

Never made an error

Never made an error

Sometimes reports seem too good to be true... How much trust do you have in your data?

Data Swamp

Data Swamp

Data Lake is a repository of data stored in raw format (CSV, JSON, XML, text, binary, documents, etc.), in contrast to a traditional (relational) Database that enforces strict predefined schema (where data is arranged in tables and columns).

Data Swamp is a Data Lake is so messy that it is unusable and does not allow you to find or get value from your data.

Sum(id)

Sum(id)

If the numbers look too good, check the SQL.

'Do not use' client

'Do not use' client

Without good understanding of database schema it is easy to make mistake joining tables which can cause serious error in the results.

Subscribe weekly cartoon Use cartoons