In the previous labs, we build a Python model API with vetiver (I also repeated this lab in R). This lab continues with accessing databases and using APIs.
Comprehension questions
Databases
1. Draw two mental maps for connecting to a database, one using a database driver (in a Python or R package) vs. an ODBC or JDBC driver. You should (at a minimum) include the nodes: database package, DBI (R only), driver, system driver, ODBC, JDBC, and database.
Python and databases
In Python, there is a large database ecosystem (i.e., many driver options). ORM support with SQLAlchemy and it’s flexible (i.e., it can bypass standards if needed).
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace'}}}%%
graph TB
subgraph PyApp["Python App"]
PyCode(["<strong>app-api.py</strong>"])
end
style PyCode fill:#d8e4ff,color:#000
subgraph PyPkg["<strong>Python Packages</strong>"]
DBPkg("DB Package<br/><strong>psycopg2</strong>, <strong>pymysql</strong>, <strong>sqlite3</strong>")
ODBCPkg("ODBC Package<br/><strong>pyodbc</strong>")
JDBCPkg("JDBC Package<br/><strong>jaydebeapi</strong>")
end
%%% Tints of #35605a
style PyPkg fill:#c2cfcd,color:#000
subgraph Driver["<strong>Driver Layer</strong>"]
NatDrv("Native Driver<br/>libpq, MySQL C API")
ODBCDrv("ODBC Driver<br/>Vendor-specific")
JDBCDrv("JDBC Driver<br/>.jar files")
end
%% Tints of #6b818c
style Driver fill:#d2d9dc,color:#000
subgraph System["<strong>System Layer</strong>"]
ODBCSys("ODBC Manager<br/>System ODBC")
JAVASys("Java Runtime<br/>JVM/JRE")
end
%% Tints of #00120b
style System fill:#cccfce
subgraph DB["Database"]
DBServ[("<strong>Database Server</strong>")]
end
style DBServ fill:#31e981
%% Direct pathway (bypasses system layer)
PyCode --> DBPkg
DBPkg --> NatDrv
NatDrv --"`Bypass <strong>System Layer</strong>`"--> DBServ
%% ODBC pathway
PyCode --> ODBCPkg
ODBCPkg --> ODBCDrv
ODBCDrv --> ODBCSys
ODBCSys --> DBServ
%% JDBC pathway
PyCode --> JDBCPkg
JDBCPkg --> JDBCDrv
JDBCDrv --> JAVASys
JAVASys --> DBServ
Python DB Connections
R and databases
In R, the DBI package provides a universal interface (same code works across drivers). This is a simpler model because all paths converge on DBI package. There is also better integration with R’s data structures (i.e., data.frames).
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace'}}}%%
graph TB
subgraph RApp["<strong>R App</strong>"]
A(["<strong>app-api.R</strong>"])
end
style A fill:#d8e4ff,color:#000
subgraph RPkgs["<strong>R Packages</strong>"]
B("DB Packages<br/><strong>RPostgres</strong>, <strong>RMySQL</strong>, <strong>RSQLite</strong>")
C("ODBC Package<br/><strong>odbc</strong>")
D("JDBC Package<br/><strong>RJDBC</strong>")
end
%%% Tints of #35605a
style RPkgs fill:#c2cfcd,color:#000
subgraph DBI["<strong>R Standard Interface</strong>"]
E("DBI<br/>Database Interface")
end
%% Tints of #6b818c
subgraph Driver["<strong>Driver Layer</strong>"]
F("Native Driver<br/>libpq, MySQL C API")
G("ODBC Driver<br/>Vendor-specific")
H("JDBC Driver<br/>.jar files")
end
%% Tints of #6b818c
style Driver fill:#d2d9dc,color:#000
subgraph System["<strong>System Layer</strong>"]
I("ODBC Manager<br/>System ODBC")
J("Java Runtime<br/>JVM/JRE")
end
style System fill:#cccfce
subgraph DB["<strong>Database</strong>"]
K[("<strong>Database Server</strong>")]
end
style K fill:#31e981
%% All paths go through DBI
A --> B
A --> C
A --> D
B --> E
C --> E
D --> E
%% Direct pathway (bypasses system layer)
E --> F
F --"bypasses system layer"--> K
%% ODBC pathway
E --> G
G --> I
I --> K
%% JDBC pathway
E --> H
H --> J
J --> K
R Database Connection Pathways
Both R and Python native database drivers bypass the system layer and connect directly to databases. The system layer is only involved when using ODBC or JDBC connections.
%%{init: {'theme': 'neutral', 'look': 'handDrawn', 'themeVariables': { 'fontFamily': 'monospace', "fontSize":"16px"}}}%%
graph TD
subgraph RPhil["R Philosophy"]
A(["DBI as Universal Interface"]) --> B("All paths converge to DBI")
B --> C(["Consistent API regardless of driver"])
end
subgraph PyPhil["Python Philosophy"]
D(["DB-API 2.0 as Standard"]) --> E("Multiple competing approaches")
E --> F(["Choose best tool for job"])
end
style A fill:#d8e4ff
style D fill:#d8e4ff
style C fill:#31e981
style F fill:#31e981
DB Connection Mental Model Differences
APIs
2. Draw a mental map for using an API from R and Python. You should (at a minimum) include nodes for {requests}/{httr2}, request, HTTP verb/request method, headers, query parameters, body, JSON, response, and response code.
In general, APIs are like the digital ‘plumbing’ that makes connections between modern software ecosystems possible.
APIs have standardized language that allows different software systems to communicate and share functionality (without needing to understand how the internal implementation works).
What do they communicate?
What you can request (endpoints)
How to make the request (HTTP methods, parameters)
3. How can environment variables be used to keep secrets secure in your code?
In lab 2
We previously turned our Python scikit-learn model into a VetiverModel(), saved it to a pins board, and then created a VetiverAPI().1
In R, we created a linear model (with lm() and fastDummies), converted it to a vetiver_model(), put it in a board_folder(), ‘pinned’ it with vetiver_pin_write(), then converted it into an API with plumber::pr() and vetiver_api().2
The sections below cover how to create and communicate with APIs. I’ve split each lab into separate Python/ and R/ folders (and included RStudio project files).
---
title: "Model"
format:
html:
code-fold: true
---
```{python}
import duckdb
from pandas import get_dummies
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
```
## Get Data
```{python}
con = duckdb.connect('my-db.duckdb')
df = con.execute("SELECT * FROM penguins").fetchdf().dropna()
con.close()
df.head(3)
```
## Define Model and Fit
```{python}
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']
model = LinearRegression().fit(X, y)
```
## Get some information
```{python}
print(f"R^2 {model.score(X,y)}")
print(f"Intercept {model.intercept_}")
print(f"Columns {X.columns}")
print(f"Coefficients {model.coef_}")
```
Original app file
app-api.py creates an Python Shiny application that calls the vetiver API.
The code below creates a connection to the database, but needs to be changed slightly so the penguins data is registered with duckdb (before the queries can be run).
# load penguins data firstpenguins_data <- palmerpenguins::penguins# then connectcon <- DBI::dbConnect( duckdb::duckdb(), dbdir ="my-db.duckdb")# register tableduckdb_register(con, "penguins_data", penguins_data)# create tabledbExecute(con, "CREATE OR REPLACE TABLE penguins AS SELECT * FROM penguins_data")# create df df <-dbGetQuery(con, "SELECT * FROM penguins") |>na.omit()
The table below displays the means of the measurements (body mass, flipper length, and bill length/depth) by species and sex: