Continuous Delivery for Machine Learning
3.1 Key Insight: Continuous Delivery for Machine Learning requires treating code, data, and models as three distinct but interconnected axes of change, each needing specialized versioning, testing, and deployment strategies while maintaining the core CD principle that software should always be in a releasable state.
This article extends Continuous Delivery principles to Machine Learning systems, addressing the unique challenges of managing code, data, and models as three axes of change. The authors define CD4ML as a software engineering approach where cross-functional teams produce ML applications in small, safe increments that can be reproduced and reliably released at any time. Using a sales forecasting application as an example, they walk through technical components including data versioning, reproducible model training with DVC, model serving patterns, testing strategies, experiment tracking with MLflow, deployment orchestration with GoCD, and production monitoring. The article emphasizes that while ML model outputs may be non-deterministic, the release process itself should be reliable and automated, creating feedback loops for continuous improvement.
5 A common symptom is having models that only work in a lab environment and never leave the proof-of-concept phase. Or if they make it to production, in a manual ad-hoc way, they bec…
4 This makes the decision about when to release it a business decision rather than a technical one.
3 While the model outputs can be non-deterministic and hard to reproduce, the process of releasing ML software into production is reliable and reproducible, leveraging automation as …
ArchitectureAgile & XP