Lab 5: Deployments and Code Promotion

Published

2026-05-22

Comprehension questions

The questions below come from the Deployments and Code Promotion chapter.

Data Science Environments

Write down a mental map of the relationship between the three environments for data science. Include the following terms: Git Promote, CI/CD, automation, deployment, dev, test, prod.

The three environments for data science are: development, test, and production (not to be confused with the three layers of data science environments1).

The figures from the chapter do an excellent job illustrating these environments, but I’ve added a mindmap:

%%{init: {'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
mindmap
  root((Code<br>Promotion))
    Dev
      Code Writing
      Experimentation
      Git Promote
        Push to feature<br>branch
        Pull request<br>triggers CI/CD
    Test
      Automation
        Unit tests
        Integration<br>tests
        UAT
      CI/CD
        Validates<br>build
        Runs test<br>suite
    Prod
      Deployment
        Automated<br>release
        Version<br>tagged
      Monitoring
        Stability<br>checks
        Access control

Data Science Environments

Development Environments

The development environment (dev) is the primary location where data scientists write new code. It’s typically used for iteration and experimentation, and it’s connected to version control (Git).

Test Environments

The testing environment (test) is a mirror of the production environment with realistic data (often anonymized). Test environments are designed to check integration and validation with other systems. The test environment also typically includes user acceptance testing and/or system tests.

Production Environments

The ‘live’ environment that serves the end users is the production environment (prod). The production branch has strictly controlled access with a focus on stability and monitoring.

Git & Data Science

Why is Git so important to a good code promotion strategy?

Code promotion is the process of tagging Git commits as ‘ready for production.’ This creates a reference point, which can trigger tests (i.e., in the test branch) or a production deployment pipeline (in the prod branch). With version control, if changes aren’t represented as Git commits, they literally didn’t happen. Git and code promotion ensures consistency across all three environments (dev, test, and prod) and reduces human error in deployment.

Artifact immutability

With Git, all changes are signed–we –with specific commit SHAs or by tagging releases. Before any changes can be merged into a prod branch, approval is required (i.e., it’s protected). This artifact immutability connects ‘builds’ as they are promoted across environments, so we can be sure untested/non-reviewed code never reaches production.

Environment drift

Environment drift is when the production code differs from the tested code. Without a code promotion strategy and version control, the code would need to be rebuilt in each environment, so the tested code wouldn’t necessarily be the code deployed into production. In DevOps, environment drift is the enemy because it can lead to unpredictable behavior and makes debugging difficult.

Environment parity

Git enables environment parity. When used properly, our three environments (dev, test, and prod) differ only by data and credentials. Changes are promoted, not re-created. When production code inevitably breaks, developers can use Git to roll back and redeploying previous versions (via commit SHAs or tagged releases). This avoids making ad-hoc changes in a prod environment.


Can you have a code promotion strategy without Git?

Sure — code promotion can be done without Git, but we’d need to find a way to:

  • Create some kind of ‘immutable versioned artifact’ (i.e., someway of tracking ‘who did what’, and preferably ‘when’)
  • Blocks/gates requiring tests/approvals that must be passed before code is promoted
  • A promotion mechanism to copy/move/retag each immutable versioned artifact between environments

Also need substitutes for:

  1. A way to view/review ‘diffs’, either by performing code review (or document-style review)
  2. The ability to keep previous artifacts and switch between each environment
  3. Some means of traceability that connects the artifacts, manifests, approvals, and build logs

Git & CI/CD

What is the relationship between Git and CI/CD?

A CI/CD Pipeline is an automated process that builds, tests, and deploys code. In CI-CD systems, Git ‘events’ define the code promotion workflow, not the pipeline. The pipeline simply reacts to Git events. The code promotion logic lives in the branches, tags, merge policies, etc. This enables a “build once, promote many times” strategy, where the same tested artifact is promoted across environments.

Git provides the code change/approval history, and the pipeline results are tied to commits. Each release is also tagged, making it easy to track commits and introduced changes. This traceability is crucial for debugging and accountability. Git serves as the gatekeeper for CI-CD policies and integrates directly with enforcement mechanisms.

Using Git turns CI-CD into a system, not a script.

%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart LR
    GitEv(["Git Event<br>(push, PR, merge)"]) -->|"triggers"| CICD["CI/CD Pipeline"]
    CICD --> CI("Continuous<br>Integration (CI)<br><em>Build & Test</em>")
    CICD --> CD("Continuous<br>Delivery (CD)<br><em>Deploy</em>")

    style CI fill:#F9E79F,stroke:#D4AC0D,color:#000
    style CD fill:#D4E6F1,stroke:#2E86C1,color:#000
    style GitEv fill:#D5F5E3,stroke:#1E8449,color:#000
    style CICD fill:#FADBD8,stroke:#C0392B,color:#000

Git & CI/CD

Think of Git as the source of truth and CI/CD as an automated response to changes in that truth. Git events — pushes, pull requests, merges — can fire a CI/CD pipeline.

Key distinctions between Git and CI/CD:

Git CI/CD
Role Tracks & versions code Automates build, test, deploy
Trigger Human action (commit, merge) Git events
Output A versioned history A tested, deployed artifact

What’s the benefit of using Git and CI/CD together?

Without Git, CI/CD has no reliable source to build from. Without CI/CD, Git changes require manual promotion between environments — which is slow and error-prone.

%%{init: {'theme': 'neutral', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"14px"}}}%%

gitGraph
   commit id: "Initial codebase"

   branch dev
   checkout dev
   commit id: "Experiment: model v1"
   commit id: "Iterate: model v2"
   commit id: "Iterate: model v3 ✅"

   branch test
   checkout test
   commit id: "Deploy to Test env"
   commit id: "Integration checks ✅"
   commit id: "System tests ✅"
   commit id: "UAT approved ✅"

   checkout main
   merge test id: "Merge to main"
   commit id: "Tag release v1.0"

   branch prod
   checkout prod
   commit id: "Deploy to Prod 🚀"
   commit id: "Monitoring active 📊"

Git & CI/CD

In the gitGraph above:

  • dev is where all experimentation happens — it never touches main directly
  • test branches from dev once code is stable enough to validate, keeping prod-like conditions isolated
  • main only receives code that has passed full UAT and system testing — it acts as the gatekeeper
  • prod branches from main after a release is tagged, representing exactly what is live at any given time, making rollbacks straightforward

Git, GitHub, CI/CD, Actions, & Version Control

Write out a mental map of the relationship of the following terms: Git, GitHub, CI/CD, GitHub Actions, Version Control.

%%{init: {'theme': 'default', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%

mindmap
  root((Code<br>Management))
    <strong>Version Control</strong>
      <em>Tracks changes<br>over time</em>
      <em>Enables<br>rollback</em>
      <em>Supports<br>collaboration</em>
        <strong>Git</strong>
          <em>Local<br>repositories</em>
          <em>Branching<br>and merging</em>
          <em>Commit history</em>
            <strong>GitHub</strong>
              <em>Remote<br>repositories</em>
              <em>Pull<br>requests</em>
              <em>Code<br>review</em>
                <strong>GitHub Actions</strong>
                  <em>Workflow<br>files</em>
                  <em>Triggered by<br>Git events</em>
                    <strong>CICD</strong>
                      <em>Continuous<br>Integration</em>
                        <em>Automated<br>testing</em>
                        <em>Build<br>validation</em>
                      <em>Continuous<br>Delivery</em>
                        <em>Automated<br>deployment</em>
                        <em>Environment<br>promotion</em>

Git, GitHub, CI/CD, Actions, & Version Control


  1. These three layers are Packages (Python + R Packages), System (Python + R Packages, Other System Libraries, and Operating System), and Hardware (Virtual Hardware and Physical Hardware).↩︎