├── app-api.py
├── app-api.R
├── eda-db.qmd
├── model-db.qmd
├── py-data-load.py
└── r-data-load.R- 1
- Python Shiny app
- 2
- R Shiny app
- 3
- EDA files (quarto)
- 4
-
Database connection via R
- 5
- Database connection via Python
The original files in _labs/lab03 are below:
├── app-api.py
├── app-api.R
├── eda-db.qmd
├── model-db.qmd
├── py-data-load.py
└── r-data-load.RIn the previous labs, we build a Python model API with vetiver (I also repeated this lab in R). This lab continues with accessing databases and using APIs.
1. Draw two mental maps for connecting to a database, one using a database driver (in a Python or R package) vs. an ODBC or JDBC driver. You should (at a minimum) include the nodes: database package, DBI (R only), driver, system driver, ODBC, JDBC, and database.
In Python, there is a large database ecosystem (i.e., many driver options). ORM support with SQLAlchemy and it’s flexible (i.e., it can bypass standards if needed).
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'monospace'}}}%%
graph TB
subgraph PyApp["Python App"]
PyCode("<strong>app-api.py</strong>")
end
subgraph PyPkg["<strong>Python Packages</strong>"]
DBPkg("DB Package<br><strong>psycopg2</strong>, <strong>pymysql</strong>, <strong>sqlite3</strong>")
ODBCPkg("ODBC Package<br><strong>pyodbc</strong>")
JDBCPkg("JDBC Package<br><strong>jaydebeapi</strong>")
end
%%% Tints of #35605a
subgraph Driver["<strong>Driver Layer</strong>"]
NatDrv("Native Driver<br>libpq, MySQL C API")
ODBCDrv("ODBC Driver<br>Vendor-specific")
JDBCDrv("JDBC Driver<br>.jar files")
end
%% Tints of #6b818c
subgraph System["<strong>System Layer</strong>"]
ODBCSys("ODBC Manager<br>System ODBC")
JAVASys("Java Runtime<br>JVM/JRE")
end
%% Tints of #00120b
subgraph DB["Database"]
DBServ("<strong>Database Server</strong>")
end
%% Direct pathway (bypasses system layer)
PyCode --> DBPkg
DBPkg --> NatDrv
NatDrv --"`Bypass <strong>System Layer</strong>`"--> DBServ
%% ODBC pathway
PyCode --> ODBCPkg
ODBCPkg --> ODBCDrv
ODBCDrv --> ODBCSys
ODBCSys --> DBServ
%% JDBC pathway
PyCode --> JDBCPkg
JDBCPkg --> JDBCDrv
JDBCDrv --> JAVASys
JAVASys --> DBServ
style PyApp fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style PyPkg fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style Driver fill:#fbf7ec,stroke:#2A6F77,color:#1B2A41
style System fill:#fbf7ec,stroke:#2A6F77,color:#1B2A41
style DB fill:#fbf7ec,stroke:#2A6F77,color:#1B2A41
style PyCode fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style DBPkg fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style ODBCPkg fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style JDBCPkg fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style NatDrv fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style ODBCDrv fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style JDBCDrv fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style ODBCSys fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style JAVASys fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style DBServ fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
In R, the DBI package provides a universal interface (same code works across drivers). This is a simpler model because all paths converge on DBI package. There is also better integration with R’s data structures (i.e., data.frames).
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'monospace'}}}%%
graph TB
subgraph RApp["<strong>R App</strong>"]
A("<strong>app-api.R</strong>")
end
subgraph RPkgs["<strong>R Packages</strong>"]
B("DB Packages<br><strong>RPostgres</strong>, <strong>RMySQL</strong>, <strong>RSQLite</strong>")
C("ODBC Package<br><strong>odbc</strong>")
D("JDBC Package<br><strong>RJDBC</strong>")
end
%%% Tints of #35605a
subgraph DBI["<strong>R Standard Interface</strong>"]
E("DBI<br>Database Interface")
end
%% Tints of #6b818c
subgraph Driver["<strong>Driver Layer</strong>"]
F("Native Driver<br>libpq, MySQL C API")
G("ODBC Driver<br>Vendor-specific")
H("JDBC Driver<br>.jar files")
end
%% Tints of #6b818c
subgraph System["<strong>System Layer</strong>"]
I("ODBC Manager<br>System ODBC")
J("Java Runtime<br>JVM/JRE")
end
subgraph DB["<strong>Database</strong>"]
K("<strong>Database Server</strong>")
end
%% All paths go through DBI
A --> B
A --> C
A --> D
B --> E
C --> E
D --> E
%% Direct pathway (bypasses system layer)
E --> F
F --"bypasses system layer"--> K
%% ODBC pathway
E --> G
G --> I
I --> K
%% JDBC pathway
E --> H
H --> J
J --> K
style RApp fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style RPkgs fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style DBI fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style Driver fill:#fbf7ec,stroke:#2A6F77,color:#1B2A41
style System fill:#fbf7ec,stroke:#2A6F77,color:#1B2A41
style DB fill:#fbf7ec,stroke:#2A6F77,color:#1B2A41
style A fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style B fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style C fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style D fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style E fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style F fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style G fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style H fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style I fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style J fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style K fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
Both R and Python native database drivers bypass the system layer and connect directly to databases. The system layer is only involved when using ODBC or JDBC connections.
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'monospace'}}}%%
graph LR
subgraph RPhil["R Philosophy"]
A("DBI as Universal<br>Interface") --> B("All paths<br>converge to<br>DBI")
B --> C("Consistent API<br>regardless of<br>driver")
end
subgraph PyPhil["Python Philosophy"]
D("DB-API 2.0 as<br>Standard") --> E("Multiple<br>competing<br>approaches")
E --> F("Choose best<br>tool for job")
end
style RPhil fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style PyPhil fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style A fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style B fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style C fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style D fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style E fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style F fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
2. Draw a mental map for using an API from R and Python. You should (at a minimum) include nodes for
{requests}/{httr2}, request, HTTP verb/request method, headers, query parameters, body, JSON, response, and response code.
In general, APIs are like the digital ‘plumbing’ that makes connections between modern software ecosystems possible.
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'monospace'}}}%%
graph LR
DS("Data Scientist") --> Req("API Request")
Req --> ApiServ("API Server")
ApiServ --> DataProcMod("Data Processing/Model")
DataProcMod --> Res("Results")
Res --> ApiServ
ApiServ --> APIResp("API Response")
APIResp --> DS
style DS fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style Req fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style ApiServ fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style DataProcMod fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style Res fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style APIResp fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
APIs have standardized language that allows different software systems to communicate and share functionality (without needing to understand how the internal implementation works).
What do they communicate?
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'monospace'}}}%%
graph LR
subgraph "Python APIs"
A2("<strong>Python Code</strong>") --> B2("<strong>requests</strong> package")
B2 --> C2("<strong>requests.method</strong> call")
C2 --> D2("<strong>method</strong> parameter<br>GET/POST/PUT/DELETE")
C2 --> E2("<strong>headers</strong> parameter<br>dict of headers")
C2 --> F2("<strong>params</strong> parameter<br>URL query parameters")
C2 --> G2("<strong>json</strong> parameter<br>Request payload")
D2 --> H2("HTTP Request")
E2 --> H2
F2 --> H2
G2 --> H2
H2 --> I2("API Server")
I2 --> J2("HTTP Response")
J2 --> K2("<strong>response.status_code</strong><br>200, 404, 500, etc.")
J2 --> L2("<strong>response.headers</strong><br>Response metadata")
J2 --> M2("<strong>response.json</strong><br>JSON parsing")
K2 --> N2("<strong>Python Response Object</strong>")
L2 --> N2
M2 --> N2
end
style A2 fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style B2 fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style C2 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style D2 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style E2 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style F2 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style G2 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style H2 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style I2 fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style J2 fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style K2 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style L2 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style M2 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style N2 fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
%%{init: {'theme': 'base', 'themeVariables': {'fontFamily': 'monospace'}}}%%
graph LR
subgraph RApi["R APIs"]
A1("R Code") --> B1("<strong>httr2</strong><br>package")
B1 --> C1("<strong>request()</strong> object")
C1 --> D1("<strong>req_method()</strong><br>GET/POST/PUT/DELETE")
C1 --> E1("<strong>req_headers()</strong><br>Content-Type, Auth, etc.")
C1 --> F1("<strong>req_url_query()</strong><br>?param=value&key=data")
C1 --> G1("<strong>req_body_json()</strong><br>Request payload")
D1 --> H1("HTTP Request")
E1 --> H1
F1 --> H1
G1 --> H1
H1 --> I1("API Server")
I1 --> J1("HTTP Response")
J1 --> K1("<strong>resp_status()</strong><br>200, 404, 500, etc.")
J1 --> L1("<strong>resp_headers()</strong><br>Response metadata")
J1 --> M1("<strong>resp_body_json()</strong><br>JSON parsing")
K1 --> N1("R Response Object")
L1 --> N1
M1 --> N1
end
style RApi fill:#fbf7ec,stroke:#5B8C5A,color:#1B2A41
style A1 fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style B1 fill:#5B8C5A,stroke:#000000,stroke-width:1px,color:#ffffff
style C1 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style D1 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style E1 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style F1 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style G1 fill:#D2562B,stroke:#000000,stroke-width:1px,color:#ffffff
style H1 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style I1 fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style J1 fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
style K1 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style L1 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style M1 fill:#485466,stroke:#000000,stroke-width:1px,color:#ffffff
style N1 fill:#2A6F77,stroke:#000000,stroke-width:1px,color:#ffffff
3. How can environment variables be used to keep secrets secure in your code?
We previously turned our Python scikit-learn model into a VetiverModel(), saved it to a pins board, and then created a VetiverAPI().1
lm() and fastDummies), converted it to a vetiver_model(), put it in a board_folder(), ‘pinned’ it with vetiver_pin_write(), then converted it into an API with plumber::pr() and vetiver_api().2The sections below cover how to create and communicate with APIs. I’ve split each lab into separate Python/ and R/ folders (and included RStudio project files).
The original Python App files can be found in the _labs/lab03/ folder.
_labs/lab03/
├── app-api.py
├── model-db.qmd
└── py-data-load.pyThe model-db.qmd file contains the EDA from lab 1.
import duckdb
from pandas import get_dummies
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn import preprocessingcon = duckdb.connect('my-db.duckdb')
df = con.execute("SELECT * FROM penguins").fetchdf().dropna()
con.close()
df.head(3)X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']
model = LinearRegression().fit(X, y)print(f"R^2 {model.score(X,y)}")
print(f"Intercept {model.intercept_}")
print(f"Columns {X.columns}")
print(f"Coefficients {model.coef_}")app-api.py creates an Python Shiny application that calls the vetiver API.
from shiny import App, render, ui, reactive
import requests
api_url = 'http://127.0.0.1:8080/predict'
app_ui = ui.page_fluid(
ui.panel_title("Penguin Mass Predictor"),
ui.layout_sidebar(
ui.panel_sidebar(
[ui.input_slider("bill_length", "Bill Length (mm)", 30, 60, 45, step = 0.1),
ui.input_select("sex", "Sex", ["Male", "Female"]),
ui.input_select("species", "Species", ["Adelie", "Chinstrap", "Gentoo"]),
ui.input_action_button("predict", "Predict")]
),
ui.panel_main(
ui.h2("Penguin Parameters"),
ui.output_text_verbatim("vals_out"),
ui.h2("Predicted Penguin Mass (g)"),
ui.output_text("pred_out")
)
)
)def server(input, output, session):
@reactive.Calc
def vals():
d = {
"bill_length_mm" : input.bill_length(),
"sex_Male" : input.sex() == "Male",
"species_Gentoo" : input.species() == "Gentoo",
"species_Chinstrap" : input.species() == "Chinstrap"
}
return d
@reactive.Calc
@reactive.event(input.predict)
def pred():
r = requests.post(api_url, json = vals())
return r.json().get('predict')[0]
@output
@render.text
def vals_out():
return f"{vals()}"
@output
@render.text
def pred_out():
return f"{round(pred())}"app = App(app_ui, server)import duckdb
from palmerpenguins import penguins
con = duckdb.connect('my-db.duckdb')
df = penguins.load_penguins()
con.execute('CREATE TABLE penguins AS SELECT * FROM df')
con.close()The R App API files can be found in the _labs/lab03/ folder.
_labs/lab03/
├── app-api.R
├── eda-db.qmd
└── r-data-load.RThe eda-db.qmd file has the EDA R code from lab 1.
It begins by loading the libraries:
library(DBI) # add this
library(duckdb) # add this
library(palmerpenguins) # add this
library(dplyr)
library(ggplot2)
library(dbplyr)The code below creates a connection to the database, but needs to be changed slightly so the penguins data is registered with duckdb (before the queries can be run).
# load penguins data first
penguins_data <- palmerpenguins::penguins
# then connect
con <- DBI::dbConnect(
duckdb::duckdb(),
dbdir = "my-db.duckdb"
)
# register table
duckdb_register(con, "penguins_data", penguins_data)
# create table
dbExecute(con, "CREATE OR REPLACE TABLE penguins AS SELECT * FROM penguins_data")
# create df
df <- dbGetQuery(con, "SELECT * FROM penguins") |> na.omit()The table below displays the means of the measurements (body mass, flipper length, and bill length/depth) by species and sex:
df %>%
group_by(species, sex) %>%
summarise(
across(
ends_with("mm") | ends_with("g"),
\(x) mean(x, na.rm = TRUE)
)
) %>%
dplyr::collect() %>%
knitr::kable()The graph displays the best-fit linear relationship between bill_length_mm and body_mass_g by species:
df %>%
ggplot(
aes(
x = bill_length_mm,
y = body_mass_g,
color = species)
) +
geom_point() +
geom_smooth(method = "lm")And we finish by closing the connection to the database.
DBI::dbDisconnect(con)app-api.R creates an R Shiny application that calls the vetiver API. The top of the file creates an api_url:
api_url <- "http://127.0.0.1:8080/predict"The UI definition is below:
ui <- fluidPage(
titlePanel("Penguin Mass Predictor"),
# Model input values
sidebarLayout(
sidebarPanel(
sliderInput(
"bill_length",
"Bill Length (mm)",
min = 30,
max = 60,
value = 45,
step = 0.1
),
selectInput(
"sex",
"Sex",
c("Male", "Female")
),
selectInput(
"species",
"Species",
c("Adelie", "Chinstrap", "Gentoo")
),
# Get model predictions
actionButton(
"predict",
"Predict"
)
),
mainPanel(
h2("Penguin Parameters"),
verbatimTextOutput("vals"),
h2("Predicted Penguin Mass (g)"),
textOutput("pred")
)
)
)The server definition is below:
server <- function(input, output) {
# Input params
vals <- reactive(
list(
bill_length_mm = input$bill_length,
species_Chinstrap = input$species == "Chinstrap",
species_Gentoo = input$species == "Gentoo",
sex_male = input$sex == "Male"
)
)
# Fetch prediction from API
pred <- eventReactive(
input$predict,
httr2::request(api_url) |>
httr2::req_body_json(vals()) |>
httr2::req_perform() |>
httr2::resp_body_json(),
ignoreInit = TRUE
)
# Render to UI
output$pred <- renderText(pred()$predict[[1]])
output$vals <- renderPrint(vals())
}shinyApp(ui = ui, server = server)con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "my-db.duckdb")
DBI::dbWriteTable(con, "penguins", palmerpenguins::penguins)
DBI::dbDisconnect(con)This can be found in /_labs/lab02/model-vetiver/model-vetiver.qmd↩︎
This can be found in _labs/lab02/R/model-vetiver-r.qmd.↩︎