DataOps-driven CI/CD for analytics repositories
Dmytro Valiaiev
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
The proliferation of SQL for data processing has often occurred without the rigor of traditional software development, leading to siloed efforts, logic replication, and increased risk. This ad-hoc approach hampers data governance and makes validation nearly impossible. Organizations are adopting DataOps, a methodology combining Agile, Lean, and DevOps principles to address these challenges to treat analytics pipelines as production systems. However, a standardized framework for implementing DataOps is lacking. This perspective proposes a qualitative design for a DataOps-aligned validation framework. It introduces a DataOps Controls Scorecard, derived from a multivocal literature review, which distills key concepts into twelve testable controls. These controls are then mapped to a modular, extensible CI/CD pipeline framework designed to govern a single source of truth (SOT) SQL repository. The framework consists of five stages: Lint, Optimize, Parse, Validate, and Observe, each containing specific, automated checks. A Requirements Traceability Matrix (RTM) demonstrates how each high-level control is enforced by concrete pipeline checks, ensuring qualitative completeness. This approach provides a structured mechanism for enhancing data quality, governance, and collaboration, allowing teams to scale analytics development with transparency and control.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992