✨ New Tool: Modern Data Stack ROI Calculator
Data Quality 101: Building Trust in Your Pipelines
Data Quality

Data Quality 101: Building Trust in Your Pipelines

E
Eficsy Team
Author
December 10, 2024
Published
11 min
Read time
Data QualityTestingGreat ExpectationsObservability

The Hidden Cost of Bad Data

Imagine your CEO making a strategic decision based on a dashboard that shows 20% growth, when in reality, a duplicate data bug masked a 10% decline. Data trust is hard to gain and easy to lose.

Data Analysis

The 6 Dimensions of Data Quality

1. Accuracy

Does the data reflect reality? (e.g., Is the customer's age actually 25?)

2. Completeness

Is all the required data present? (e.g., No missing timestamps)

3. Consistency

Does data match across systems? (CRM says $100, ERP says $100)

4. Timeliness

Is the data available when needed? (Real-time vs batch delay)

5. Uniqueness

Are there duplicates? (e.g., Same order ID appearing twice)

6. Validity

Does data follow the rules? (e.g., Email must have @ symbol)

Implementing Automated Tests

Don't rely on hope. Use tools like Great Expectations or dbt tests to enforce quality.

Example: dbt Schema Tests

version: 2

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'returned']
              
      - name: total_amount
        tests:
          - dbt_utils.expression_is_true:
              expression: ">= 0"

Data Observability: The Next Frontier

Testing catches known unknowns. But what about unknown unknowns? (e.g., Volume of orders suddenly drops by 50%).

This is where Data Observability tools come in. They monitor:

  • Volume: Row counts over time.
  • Freshness: Time since last update.
  • Schema Changes: Did a column disappear?
  • Distribution: Did the average order value jump from $50 to $5000?

Conclusion

Data quality is not a one-time fix; it's a continuous process. Start small by adding unique and not_null tests to your primary keys, then expand to business logic tests, and finally implement observability.

Share this article

LET'S TALK

Ready to transform your data into results?

Start Your Project