%%{init: {'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
mindmap
root((Code<br>Promotion))
Dev
Code Writing
Experimentation
Git Promote
Push to feature<br>branch
Pull request<br>triggers CI/CD
Test
Automation
Unit tests
Integration<br>tests
UAT
CI/CD
Validates<br>build
Runs test<br>suite
Prod
Deployment
Automated<br>release
Version<br>tagged
Monitoring
Stability<br>checks
Access control
Lab 5: Deployments and Code Promotion
Comprehension questions
The questions below come from the Deployments and Code Promotion chapter.
Data Science Environments
Write down a mental map of the relationship between the three environments for data science. Include the following terms: Git Promote, CI/CD, automation, deployment, dev, test, prod.
The three environments for data science are: development, test, and production (not to be confused with the three layers of data science environments1).
The figures from the chapter do an excellent job illustrating these environments, but I’ve added a mindmap:
Development Environments
The development environment (dev) is the primary location where data scientists write new code. It’s typically used for iteration and experimentation, and it’s connected to version control (Git).
Test Environments
The testing environment (test) is a mirror of the production environment with realistic data (often anonymized). Test environments are designed to check integration and validation with other systems. The test environment also typically includes user acceptance testing and/or system tests.
Production Environments
The ‘live’ environment that serves the end users is the production environment (prod). The production branch has strictly controlled access with a focus on stability and monitoring.
Git & Data Science
Why is Git so important to a good code promotion strategy?
Code promotion is the process of tagging Git commits as ‘ready for production.’ This creates a reference point, which can trigger tests (i.e., in the test branch) or a production deployment pipeline (in the prod branch). With version control, if changes aren’t represented as Git commits, they literally didn’t happen. Git and code promotion ensures consistency across all three environments (dev, test, and prod) and reduces human error in deployment.
Artifact immutability
With Git, all changes are signed–we –with specific commit SHAs or by tagging releases. Before any changes can be merged into a prod branch, approval is required (i.e., it’s protected). This artifact immutability connects ‘builds’ as they are promoted across environments, so we can be sure untested/non-reviewed code never reaches production.
Environment drift
Environment drift is when the production code differs from the tested code. Without a code promotion strategy and version control, the code would need to be rebuilt in each environment, so the tested code wouldn’t necessarily be the code deployed into production. In DevOps, environment drift is the enemy because it can lead to unpredictable behavior and makes debugging difficult.
Environment parity
Git enables environment parity. When used properly, our three environments (dev, test, and prod) differ only by data and credentials. Changes are promoted, not re-created. When production code inevitably breaks, developers can use Git to roll back and redeploying previous versions (via commit SHAs or tagged releases). This avoids making ad-hoc changes in a prod environment.
Can you have a code promotion strategy without Git?
Sure — code promotion can be done without Git, but we’d need to find a way to:
- Create some kind of ‘immutable versioned artifact’ (i.e., someway of tracking ‘who did what’, and preferably ‘when’)
- Blocks/gates requiring tests/approvals that must be passed before code is promoted
- A promotion mechanism to copy/move/retag each immutable versioned artifact between environments
Also need substitutes for:
- A way to view/review ‘diffs’, either by performing code review (or document-style review)
- The ability to keep previous artifacts and switch between each environment
- Some means of traceability that connects the artifacts, manifests, approvals, and build logs
Git & CI/CD
What is the relationship between Git and CI/CD?
A CI/CD Pipeline is an automated process that builds, tests, and deploys code. In CI-CD systems, Git ‘events’ define the code promotion workflow, not the pipeline. The pipeline simply reacts to Git events. The code promotion logic lives in the branches, tags, merge policies, etc. This enables a “build once, promote many times” strategy, where the same tested artifact is promoted across environments.
Git provides the code change/approval history, and the pipeline results are tied to commits. Each release is also tagged, making it easy to track commits and introduced changes. This traceability is crucial for debugging and accountability. Git serves as the gatekeeper for CI-CD policies and integrates directly with enforcement mechanisms.
Using Git turns CI-CD into a system, not a script.
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart LR
GitEv(["Git Event<br>(push, PR, merge)"]) -->|"triggers"| CICD["CI/CD Pipeline"]
CICD --> CI("Continuous<br>Integration (CI)<br><em>Build & Test</em>")
CICD --> CD("Continuous<br>Delivery (CD)<br><em>Deploy</em>")
style CI fill:#F9E79F,stroke:#D4AC0D,color:#000
style CD fill:#D4E6F1,stroke:#2E86C1,color:#000
style GitEv fill:#D5F5E3,stroke:#1E8449,color:#000
style CICD fill:#FADBD8,stroke:#C0392B,color:#000
Think of Git as the source of truth and CI/CD as an automated response to changes in that truth. Git events — pushes, pull requests, merges — can fire a CI/CD pipeline.
Key distinctions between Git and CI/CD:
| Git | CI/CD | |
|---|---|---|
| Role | Tracks & versions code | Automates build, test, deploy |
| Trigger | Human action (commit, merge) | Git events |
| Output | A versioned history | A tested, deployed artifact |
What’s the benefit of using Git and CI/CD together?
Without Git, CI/CD has no reliable source to build from. Without CI/CD, Git changes require manual promotion between environments — which is slow and error-prone.
%%{init: {'theme': 'neutral', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"14px"}}}%%
gitGraph
commit id: "Initial codebase"
branch dev
checkout dev
commit id: "Experiment: model v1"
commit id: "Iterate: model v2"
commit id: "Iterate: model v3 ✅"
branch test
checkout test
commit id: "Deploy to Test env"
commit id: "Integration checks ✅"
commit id: "System tests ✅"
commit id: "UAT approved ✅"
checkout main
merge test id: "Merge to main"
commit id: "Tag release v1.0"
branch prod
checkout prod
commit id: "Deploy to Prod 🚀"
commit id: "Monitoring active 📊"
In the gitGraph above:
devis where all experimentation happens — it never touchesmaindirectly
testbranches fromdevonce code is stable enough to validate, keepingprod-like conditions isolated
mainonly receives code that has passed full UAT and system testing — it acts as the gatekeeper
prodbranches frommainafter a release is tagged, representing exactly what is live at any given time, making rollbacks straightforward
Git, GitHub, CI/CD, Actions, & Version Control
Write out a mental map of the relationship of the following terms: Git, GitHub, CI/CD, GitHub Actions, Version Control.
%%{init: {'theme': 'default', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
mindmap
root((Code<br>Management))
<strong>Version Control</strong>
<em>Tracks changes<br>over time</em>
<em>Enables<br>rollback</em>
<em>Supports<br>collaboration</em>
<strong>Git</strong>
<em>Local<br>repositories</em>
<em>Branching<br>and merging</em>
<em>Commit history</em>
<strong>GitHub</strong>
<em>Remote<br>repositories</em>
<em>Pull<br>requests</em>
<em>Code<br>review</em>
<strong>GitHub Actions</strong>
<em>Workflow<br>files</em>
<em>Triggered by<br>Git events</em>
<strong>CICD</strong>
<em>Continuous<br>Integration</em>
<em>Automated<br>testing</em>
<em>Build<br>validation</em>
<em>Continuous<br>Delivery</em>
<em>Automated<br>deployment</em>
<em>Environment<br>promotion</em>
These three layers are Packages (Python + R Packages), System (Python + R Packages, Other System Libraries, and Operating System), and Hardware (Virtual Hardware and Physical Hardware).↩︎