Modeling in Python

Published

2025-09-01

The model-basic-py.qmd file creates a model using Python’s palmerpenguins, numpy, pandas, and scikit-learn libraries.

View model-basic-py.qmd


---
title: "Model"
format:
  html:
    code-fold: true
---

```{python}
from palmerpenguins import penguins
from pandas import get_dummies
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
```

## Get Data

```{python}
df = penguins.load_penguins().dropna()

df.head(3)
```

## Define Model and Fit

```{python}
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']

model = LinearRegression().fit(X, y)
```

## Get some information

```{python}
print(f"R^2 {model.score(X,y)}")
print(f"Intercept {model.intercept_}")
print(f"Columns {X.columns}")
print(f"Coefficients {model.coef_}")
```

I’ve added the reticulate code below to run this code in Quarto.

library(reticulate)
use_virtualenv("myenv", required = TRUE)
virtualenv_install(
  envname = "myenv", 
  packages = c("palmerpenguins", "pandas", "numpy", 
               "scikit-learn", "jupyter"))
from palmerpenguins import penguins
from pandas import get_dummies
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
df = penguins.load_penguins().dropna()

df.head(3)
  species     island  bill_length_mm  ...  body_mass_g     sex  year
0  Adelie  Torgersen            39.1  ...       3750.0    male  2007
1  Adelie  Torgersen            39.5  ...       3800.0  female  2007
2  Adelie  Torgersen            40.3  ...       3250.0  female  2007

[3 rows x 8 columns]
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']

model = LinearRegression().fit(X, y)
print(f"R^2 {model.score(X,y)}")
R^2 0.8555368759537614
print(f"Intercept {model.intercept_}")
Intercept 2169.2697209393973
print(f"Columns {X.columns}")
Columns Index(['bill_length_mm', 'species_Chinstrap', 'species_Gentoo', 'sex_male'], dtype='object')
print(f"Coefficients {model.coef_}")
Coefficients [  32.53688677 -298.76553447 1094.86739145  547.36692408]