%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
sequenceDiagram
participant Dev as Dev
participant Git as Git
participant CI as CI/CD<br>Pipeline
participant Test as Test
participant Prod as Prod
Dev->>Dev: Write code & test locally
Dev->>Git: Commit & push changes
Git->>CI: Trigger automation
CI->>CI: Run tests & checks
CI->>Test: Deploy to Test environment
Test->>Test: Integration testing
Test->>Test: User acceptance testing
Note over Test,CI: Manual approval or<br/>automated promotion
Test->>Git: Tag release (Git Promote)
Git->>CI: Trigger production deployment
CI->>CI: Build & validate
CI->>Prod: Deploy to Production
Prod->>Prod: Monitor & validate
Note over Prod: Rollback<br>if needed
Lab 5: Deployments and Code Promotion
Comprehension questions
The questions below come from the Deployments and Code Promotion chapter.
Data Science Environments
Write down a mental map of the relationship between the three environments for data science. Include the following terms: Git Promote, CI/CD, automation, deployment, dev, test, prod.
The figures from the chapter do an excellent job illustrating these environments, but I’ve added a sequence diagram:
DEV Environments
The development environment (Dev) is the primary location where data scientists write and test their code. It’s typically used for iteration and experimentation, and it’s connected to version control (Git).
Test Environments
The testing environment (Test) is a mirror of the production environment with realistic data (often anonymized). Test environments are integrated and validates with other systems, and user acceptance testing occurs here.
Production Environments
The ‘live’ environment that serves the end users is the production environment (Prod). Prod has strictly controlled access with a focus on stability and monitoring.
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart TB
Dev[Development Environment]
Git[Git Repository]
CI[CI: Continuous Integration]
Test[Test Environment]
Promote[Git Promote]
CD[CD: Continuous Deployment]
Prod[Production Environment]
Dev -->|commit & push| Git
Git -->|webhook triggers| CI
CI -->|tests pass| Test
Test -->|validation complete| Promote
Promote -->|create tag v1.0| Git
Git -->|tag triggers| CD
CD -->|deploy| Prod
Prod -.->|rollback needed| Git
Git -.->|deploy previous tag| CD
style Dev fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
style Test fill:#fff3e0,stroke:#f57c00,stroke-width:3px
style Prod fill:#e8f5e9,stroke:#388e3c,stroke-width:3px
style Git fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style CI fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style CD fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style Promote fill:#fff9c4,stroke:#f9a825,stroke-width:3px
Git promotion is the process of tagging a commit as ready for production. This creates an immutable reference point and triggers production deployment pipeline. The CI/CD Pipeline is the automated process that builds, tests, and deploys the code.
This ensures consistency across environments and reduces human error in deployment. CI/CD automates testing at each stage, deployment between environments, rollback on failures, and monitoring and alerting.
Git & Data Science
Why is Git so important to a good code promotion strategy?
There is a canonical DevOps rule of thumb,
If a change isn’t represented as a Git commit, it didn’t happen.
In mature CI-CD systems, Git defines the promotion workflow, not the pipeline. For example, if we have the following Git branching model:
feature: develop features and write unit tests
test: write integration tests
main: production deployment
Changes are committed to feature* → merged to test → merged to main
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
gitGraph
commit id: "Initial commit"
branch test
checkout test
commit id: "Set up tests"
branch feature
checkout feature
commit id: "Develop feature"
commit id: "Unit tests"
checkout test
merge feature id: "Merge feature"
commit id: "Integration tests"
checkout main
merge test id: "Release"
The pipeline simply reacts to Git events. The code promotion logic lives in the branches, tags, merge policies, and not in brittle pipeline scripts.
A core CI-CD principle:
Build once, promote many times.
All commits are signed–we know who made the changes–and the main branch requires approval to merge (i.e., it’s protected). This artifact immutability connects ‘builds’ to specific commit SHAs or by tagging releases (like v1.4.3).
The same code is promoted across environments so we can be sure untested code never reaches production.
Without Git, the code we need to be rebuilt in each environment. The tested code wouldn’t necessarily be the deployed code, and environment drift would be inevitable.
Git is the gatekeeper for CI-CD policies and integrates directly with enforcement mechanisms.
If it’s not enforced in Git, it will be bypassed
Git enables environment parity. In DevOps, environment drift is the enemy. Dev, test, and prod differ only by data and credentials, and changes are promoted, not re-created.
From a DevOps audit perspective, Git provides the code change/approval history, and the pipeline results tied to commits. Each release is also tagged. This makes it easy to answer the inevitable question,
“Which commit introduced this behavior in production?”
When production inevitably breaks, we can use Git to roll back by redeploying previous commits/tags. This avoids making ad-hoc changes in a production server.
Git promotion mechanics are standardized, so pipelines become reusable templates. This turns CI-CD into a system, not a script.
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart TB
Feat("<code>feature/*</code>") -->|<em>Pull request</em>| Dev("<code>develop</code>")
Dev -->|<em>Release pull request</em>| Main("<code>main</code>")
Main -->|<em>Deploy</em>| Prod["<strong>production</strong>"]
Main -->|<em>Promote</em>| St("<code>staging</code>")
St -->|<em>Approve</em>| Prod
Git is like a state machine, controlling the flow of changes.
With Git-centered promotion:
- Git = intent
- Pipeline = execution
- Artifact = immutable
- Promotion = merge or tag
Can you have a code promotion strategy without Git?
Git & CI/CD
What is the relationship between Git and CI/CD?
What’s the benefit of using Git and CI/CD together?
Git, GitHub, CI/CD, Actions, & Version Control
Write out a mental map of the relationship of the following terms: Git, GitHub, CI/CD, GitHub Actions, Version Control.