Skip to content

Defining Superior Data Quality:

Information Undergoes Grading: Like various commodities such as olive oil, flour, gasoline, beef, and cotton, data is also classified into different categories based on its characteristics.

Defining Superior Data Quality:

Data isn't just any old thing, it's like the olive oil, flour, gasoline, beef, and cotton of the information world. Just like those items, data gets categorized based on its nature and characteristics. We look at factors such as its structure, whether it's structured or unstructured, how many times it's been duplicated, its mission-criticality, how quickly it moves within a network, and the level of truth it holds.

All of these factors help determine if we're dealing with high-quality data or not. But why does good data matter, when even low-grade ingredients can fill you up or lull you to sleep? Business analysts argue that refined information streams open up a world of strategic business decisions. They claim that better customer experiences result from purified, verified, and amplified information sources. Better data also helps firms meet enhanced regulatory compliance targets and reduce operational costs.

So, how do we make sure our data stays high quality? According to Ken Stott, Field CTO at Hasura, the secret lies in creating high-velocity feedback loops between data producers and consumers. These loops enable continuous communication, which prevents problems from arising and lays the groundwork for ongoing innovation and improvement.

Stott explains that traditional data quality approaches rely on incident reporting systems and source checks. While effective for single domains, these systems fail at intersection points where data combines across boundaries. For instance, margin calculations might be accurate individually, but problems emerge when they're combined. When downstream teams discover these issues, their quality assessments often remain isolated, rarely flowing back upstream in a way source teams can understand or act upon.

To improve data quality, Stott recommends establishing rapid feedback loops. By doing so, businesses can identify issues at the point of delivery and provide structured feedback for swift resolution. Understanding the typical organizational structure concerning data responsibilities is also crucial. Drawing from experience, the Hasura tech leader divides data responsibility into three teams:

  1. Data domain owners: consisting of data scientists, database administrators, network architects, and software application developers who manage data models, define quality rules, and ensure data provisioning.
  2. Federated data teams: overseeing metadata standards, offering data discovery tools, and managing data provisioning.
  3. Cross-domain data teams: creating data products, building reports, developing applications and models, and facing unique challenges due to combining data across domains.

"Cross-domain data users often create the critical datasets that reach executive leadership and regulatory bodies," detailed Stott. "Their need to compose and validate data across domains demands a modern approach: real-time validation with standardized metadata-driven feedback, centralized rule standards to maintain flexibility for local needs, integrated observability across the data lifecycle, and self-service composition capabilities for cross-domain teams."

Success requires both organizational and technical foundations. The modern approach becomes a reality through lightweight additions to existing architectures: an extensible metadata-driven data access layer, automatable data quality rules, and data-quality-as-a-service to assess composited datasets at the point of use. These enhancements create feedback loops without disrupting established data flows or requiring massive reorganization.

"By adopting this approach," concluded Hasura's Stott, "organizations can incrementally evolve their practices. Teams maintain their current workflows while gaining new collaboration capabilities. Clear ownership and accountability of data naturally emerge as feedback loops connect data producers and consumers. This foundation also paves the way for advanced capabilities like anomaly detection and sophisticated data profiling."

In Summary, by implementing automated feedback loops at the point of data delivery, organizations can reduce the time and effort needed to identify and resolve data quality issues while preserving existing investments in data architecture and team structures. High-quality data is essential for businesses to make informed decisions, achieve operational efficiency, gain a competitive advantage, and maintain a positive reputation and trust. Organizations should implement data quality management practices, establish a data governance framework, prioritize continuous improvement, provide employee education and training, and utilize advanced technology and tools to maintain high-quality data.

  1. To refine and maintain the quality of data, Ken Stott, Field CTO at Hasura, recommends adopting a modern approach that involves creating high-velocity feedback loops between data producers and consumers, enhancing existing architectures with automated feedback loops at the point of data delivery, and using advanced technology and tools for data quality management.
  2. Cross-domain data users, such as those who create critical datasets reaching executive leadership and regulatory bodies, demand a modern approach for validating data across domains, including real-time validation with standardized metadata-driven feedback, centralized rule standards to maintain flexibility for local needs, integrated observability across the data lifecycle, and self-service composition capabilities for cross-domain teams.
  3. Establishing rapid feedback loops and understanding the typical organizational structure concerning data responsibilities are essential to improving data quality. This structure comprises three teams: data domain owners, federated data teams, and cross-domain data teams, each with specific roles and responsibilities in managing, maintaining, and improving data quality.

Read also:

    Latest