%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart TD
Git["Git"]
Git --> Track["<strong>Traceability</strong>:<br>Every change is logged with author, timestamp,<br>and purpose"]
Git --> Branch["<strong>Isolation</strong>:<br>Branches let developers experiment without<br>disturbing stable code"]
Git --> Gate["<strong>Gating</strong>:<br>Pull requests act as enforced review checkpoints<br>between environments"]
Track --> Audit["Supports auditing<br>& rollback"]
Branch --> Parallel["Enables parallel<br>development"]
Gate --> Quality["Maintains code<br>quality standards"]
Lab 5: Deployments and Code Promotion
Comprehension questions
The questions below come from the Deployments and Code Promotion chapter.
Data Science Environments
Write down a mental map of the relationship between the three environments for data science. Include the following terms: Git Promote, CI/CD, automation, deployment, dev, test, prod.
The three environments for data science are: development, test, and production (not to be confused with the three layers of data science environments1).
DEV Environments
The development environment (dev) is the primary location where data scientists write new code. It’s typically used for iteration and experimentation, and it’s connected to version control (Git).
Test Environments
The testing environment (test) is a mirror of the production environment with realistic data (often anonymized). Test environments are integrated and validates with other systems, and user acceptance testing occurs here.
Production Environments
The ‘live’ environment that serves the end users is the production environment (prod). Prod has strictly controlled access with a focus on stability and monitoring.
The figures from the chapter do an excellent job illustrating these environments, but I’ve added a sequence diagram:
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
sequenceDiagram
participant Dev as Dev
participant Git as Git
participant CI as CI/CD<br>Pipeline
participant Test as Test
participant Prod as Prod
Dev->>Dev: Write code & test locally
Dev->>Git: Commit & push changes
Git->>CI: Trigger automation
CI->>CI: Run tests & checks
CI->>Test: Deploy to Test environment
Test->>Test: Integration testing
Test->>Test: User acceptance testing
Note over Test,CI: Manual approval or<br/>automated promotion
Test->>Git: Tag release (Git Promote)
Git->>CI: Trigger production deployment
CI->>CI: Build & validate
CI->>Prod: Deploy to Production
Prod->>Prod: Monitor & validate
Note over Prod: Rollback<br>if needed
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart TD
subgraph DevEnv["🔧 Development"]
DevCode("Code Writing<br>& Iteration")
Git("Version Control (Git)")
DevCode <--> Git
end
subgraph TestEnv["🧪 Testing"]
AnonData("Realistic<br>(Anonymized) Data")
SysVal("System <br>Validation")
UAT("User Acceptance<br>Testing (UAT)")
AnonData --> SysVal --> UAT
end
subgraph ProdEnv["🚀 Production"]
LiveApp("Live Application")
Monitor("Monitoring & Stability")
EndUser("End Users")
LiveApp --> Monitor
LiveApp --> EndUser
end
Git -->|"promote"| TestEnv
UAT -->|"approve & promote"| ProdEnv
style DevEnv fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
style TestEnv fill:#fff3e0,stroke:#f57c00,stroke-width:3px
style ProdEnv fill:#e8f5e9,stroke:#388e3c,stroke-width:3px
Git promotion is the process of tagging a commit as ready for production. This creates an immutable reference point and triggers a production deployment pipeline. A CI/CD Pipeline is an automated process that builds, tests, and deploys the code.
Code promotion ensures consistency across all three environments and reduces human error in deployment.
Git & Data Science
Why is Git so important to a good code promotion strategy?
There is a canonical DevOps rule of thumb,
If a change isn’t represented as a Git commit, it didn’t happen.
In mature CI-CD systems, Git defines the promotion workflow, not the pipeline. For example, if we have the following Git branching model:
feature: new capability (e.g., new model, new preprocessing step, new data connector). Branch off develop ormain, open pull request early, iterate.
develop: integration branch for ongoing work.featurebranches merge here first; periodically promoted tomain(release).
staging: mirrors production (main), but for final validation.main→staging(or vice versa) depending on release process. Used for end-to-end pipeline runs on near-prod data/infra.
main: production-ready code and (often) reproducible pipeline definitions. Onlymergevia pull request; every commit should pass tests, lint, data checks. Often corresponds to what’s deployed (or what can be deployed).
Changes are committed to feature* → merged to develop → merged to main:
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart TB
F("<code>feature/*</code>") -->|"pull request"| D("<code>develop</code>")
D -->|release pull request| M("<code>main</code>")
M -->|deploy| P["<strong>production</strong>"]
M -->|promote| S("<code>staging</code>")
S -->|approve| P
The pipeline simply reacts to Git events. The code promotion logic lives in the branches, tags, merge policies, and not in brittle pipeline scripts.
A core CI-CD principle:
Build once, promote many times.
All commits are signed–we know who made the changes–and the main branch requires approval to merge (i.e., it’s protected). This artifact immutability connects ‘builds’ to specific commit SHAs or by tagging releases (like v1.4.1).
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
gitGraph
commit tag: "Initial commit" type: HIGHLIGHT
branch test
checkout test
commit tag: "test-setup"
branch feature
checkout feature
commit tag: "develop-feature"
commit
commit
commit tag: "test-feature"
checkout test
merge feature tag: "merge"
commit
commit tag: "integration-tests"
checkout main
merge test id: "v1.4.1" tag: "Release" type: REVERSE
The same code is promoted across environments so we can be sure untested code never reaches production.
Without Git, the code we need to be rebuilt in each environment. The tested code wouldn’t necessarily be the deployed code, and environment drift would be inevitable.
Git is the gatekeeper for CI-CD policies and integrates directly with enforcement mechanisms.
If it’s not enforced in Git, it will be bypassed
Git enables environment parity. In DevOps, environment drift is the enemy. Dev, test, and prod differ only by data and credentials, and changes are promoted, not re-created.
From a DevOps audit perspective, Git provides the code change/approval history, and the pipeline results tied to commits. Each release is also tagged. This makes it easy to answer the inevitable question,
“Which commit introduced this behavior in production?”
When production inevitably breaks, we can use Git to roll back by redeploying previous commits/tags. This avoids making ad-hoc changes in a production server.
Git promotion mechanics are standardized, so pipelines become reusable templates. This turns CI-CD into a system, not a script.
Git is like a state machine, controlling the flow of changes.
With Git-centered promotion:
- Git = intent
- Pipeline = execution
- Artifact = immutable
- Promotion = merge or tag
Can you have a code promotion strategy without Git?
Yes — you can do code promotion without Git, as long as you still have three things:
- Some way of creating ‘versioned artifacts’ (i.e., something immutable that can be promoted)
- Blocks/gates requiring tests/approvals that must be passed before promotion.
- Some kind of promotion mechanism to copy/move/retag the versioned artifact between environments.
Without Git you still need substitutes for:
- Diff / review: code review tooling in your platform, or document-style review on “release bundles”
- Rollback: keep old artifacts and switch the environment pointer back
- Traceability: manifests + checksums + approvals + build logs
A good minimum is: artifact hash + signed manifest + CI logs + approval record.
Git & CI/CD
What is the relationship between Git and CI/CD?
What’s the benefit of using Git and CI/CD together?
Git, GitHub, CI/CD, Actions, & Version Control
Write out a mental map of the relationship of the following terms: Git, GitHub, CI/CD, GitHub Actions, Version Control.
These three layers are Packages (Python + R Packages), System (Python + R Packages, Other System Libraries, and Operating System), and Hardware (Virtual Hardware and Physical Hardware).↩︎