Lab 5: Deployments and Code Promotion

Published

2026-03-15

WarningCaution

This section is being revised. Thank you for your patience.

Comprehension questions

The questions below come from the Deployments and Code Promotion chapter.

Data Science Environments

Write down a mental map of the relationship between the three environments for data science. Include the following terms: Git Promote, CI/CD, automation, deployment, dev, test, prod.

The three environments for data science are: development, test, and production (not to be confused with the three layers of data science environments1).

%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
flowchart TD
    Git["Git"]

    Git --> Track["<strong>Traceability</strong>:<br>Every change is logged with author, timestamp,<br>and purpose"]
    Git --> Branch["<strong>Isolation</strong>:<br>Branches let developers experiment without<br>disturbing stable code"]
    Git --> Gate["<strong>Gating</strong>:<br>Pull requests act as enforced review checkpoints<br>between environments"]

    Track --> Audit["Supports auditing<br>& rollback"]
    Branch --> Parallel["Enables parallel<br>development"]
    Gate --> Quality["Maintains code<br>quality standards"]

Data Science Environments

DEV Environments

The development environment (dev) is the primary location where data scientists write new code. It’s typically used for iteration and experimentation, and it’s connected to version control (Git).

Test Environments

The testing environment (test) is a mirror of the production environment with realistic data (often anonymized). Test environments are integrated and validates with other systems, and user acceptance testing occurs here.

Production Environments

The ‘live’ environment that serves the end users is the production environment (prod). Prod has strictly controlled access with a focus on stability and monitoring.

The figures from the chapter do an excellent job illustrating these environments, but I’ve added a sequence diagram:

%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%


sequenceDiagram
    participant Dev as Dev
    participant Git as Git
    participant CI as CI/CD<br>Pipeline
    participant Test as Test
    participant Prod as Prod
    
    Dev->>Dev: Write code & test locally
    Dev->>Git: Commit & push changes
    Git->>CI: Trigger automation
    CI->>CI: Run tests & checks
    CI->>Test: Deploy to Test environment
    Test->>Test: Integration testing
    Test->>Test: User acceptance testing
    
    Note over Test,CI: Manual approval or<br/>automated promotion
    
    Test->>Git: Tag release (Git Promote)
    Git->>CI: Trigger production deployment
    CI->>CI: Build & validate
    CI->>Prod: Deploy to Production
    Prod->>Prod: Monitor & validate
    
    Note over Prod: Rollback<br>if needed

Data Science Environments

%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%


flowchart TD
    subgraph DevEnv["🔧 Development"]
        DevCode("Code Writing<br>& Iteration")
        Git("Version Control (Git)")
        DevCode <--> Git
    end

    subgraph TestEnv["🧪 Testing"]
        AnonData("Realistic<br>(Anonymized) Data")
        SysVal("System <br>Validation")
        UAT("User Acceptance<br>Testing (UAT)")
        AnonData --> SysVal --> UAT
    end

    subgraph ProdEnv["🚀 Production"]
        LiveApp("Live Application")
        Monitor("Monitoring & Stability")
        EndUser("End Users")
        LiveApp --> Monitor
        LiveApp --> EndUser
    end

    Git -->|"promote"| TestEnv
    UAT -->|"approve & promote"| ProdEnv

    style DevEnv fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
    style TestEnv fill:#fff3e0,stroke:#f57c00,stroke-width:3px
    style ProdEnv fill:#e8f5e9,stroke:#388e3c,stroke-width:3px
    

Data Science Environments

Git promotion is the process of tagging a commit as ready for production. This creates an immutable reference point and triggers a production deployment pipeline. A CI/CD Pipeline is an automated process that builds, tests, and deploys the code.

Code promotion ensures consistency across all three environments and reduces human error in deployment.

Git & Data Science

Why is Git so important to a good code promotion strategy?

There is a canonical DevOps rule of thumb,

If a change isn’t represented as a Git commit, it didn’t happen.

In mature CI-CD systems, Git defines the promotion workflow, not the pipeline. For example, if we have the following Git branching model:

  • feature: new capability (e.g., new model, new preprocessing step, new data connector). Branch off develop or main, open pull request early, iterate.
  • develop: integration branch for ongoing work. feature branches merge here first; periodically promoted to main (release).
  • staging: mirrors production (main), but for final validation. mainstaging (or vice versa) depending on release process. Used for end-to-end pipeline runs on near-prod data/infra.
  • main: production-ready code and (often) reproducible pipeline definitions. Only merge via pull request; every commit should pass tests, lint, data checks. Often corresponds to what’s deployed (or what can be deployed).

Changes are committed to feature* → merged to develop → merged to main:

%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%

flowchart TB
  F("<code>feature/*</code>") -->|"pull request"| D("<code>develop</code>")
  D -->|release pull request| M("<code>main</code>")
  M -->|deploy| P["<strong>production</strong>"]
  M -->|promote| S("<code>staging</code>")
  S -->|approve| P
    

Typical CI-CD promotion pattern

The pipeline simply reacts to Git events. The code promotion logic lives in the branches, tags, merge policies, and not in brittle pipeline scripts.

A core CI-CD principle:

Build once, promote many times.

All commits are signed–we know who made the changes–and the main branch requires approval to merge (i.e., it’s protected). This artifact immutability connects ‘builds’ to specific commit SHAs or by tagging releases (like v1.4.1).

%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"18px"}}}%%
gitGraph
    commit tag: "Initial commit" type: HIGHLIGHT
    branch test
    checkout test
    commit tag: "test-setup"

    branch feature
    checkout feature
    commit tag: "develop-feature"
    commit
    commit
    commit tag: "test-feature"

    checkout test
    merge feature tag: "merge"
    commit
    commit tag: "integration-tests"

    checkout main
    merge test id: "v1.4.1" tag: "Release" type: REVERSE

Git artifact immutability

The same code is promoted across environments so we can be sure untested code never reaches production.

Without Git, the code we need to be rebuilt in each environment. The tested code wouldn’t necessarily be the deployed code, and environment drift would be inevitable.

Git is the gatekeeper for CI-CD policies and integrates directly with enforcement mechanisms.

If it’s not enforced in Git, it will be bypassed

Git enables environment parity. In DevOps, environment drift is the enemy. Dev, test, and prod differ only by data and credentials, and changes are promoted, not re-created.

From a DevOps audit perspective, Git provides the code change/approval history, and the pipeline results tied to commits. Each release is also tagged. This makes it easy to answer the inevitable question,

“Which commit introduced this behavior in production?”

When production inevitably breaks, we can use Git to roll back by redeploying previous commits/tags. This avoids making ad-hoc changes in a production server.

Git promotion mechanics are standardized, so pipelines become reusable templates. This turns CI-CD into a system, not a script.

Git is like a state machine, controlling the flow of changes.

With Git-centered promotion:

  • Git = intent
  • Pipeline = execution
  • Artifact = immutable
  • Promotion = merge or tag

Can you have a code promotion strategy without Git?

Yes — you can do code promotion without Git, as long as you still have three things:

  • Some way of creating ‘versioned artifacts’ (i.e., something immutable that can be promoted)
  • Blocks/gates requiring tests/approvals that must be passed before promotion.
  • Some kind of promotion mechanism to copy/move/retag the versioned artifact between environments.

Without Git you still need substitutes for:

  1. Diff / review: code review tooling in your platform, or document-style review on “release bundles”
  2. Rollback: keep old artifacts and switch the environment pointer back
  3. Traceability: manifests + checksums + approvals + build logs
    A good minimum is: artifact hash + signed manifest + CI logs + approval record.

Git & CI/CD

What is the relationship between Git and CI/CD?

What’s the benefit of using Git and CI/CD together?

Git, GitHub, CI/CD, Actions, & Version Control

Write out a mental map of the relationship of the following terms: Git, GitHub, CI/CD, GitHub Actions, Version Control.


  1. These three layers are Packages (Python + R Packages), System (Python + R Packages, Other System Libraries, and Operating System), and Hardware (Virtual Hardware and Physical Hardware).↩︎