---
title: "Model"
format:
html:
code-fold: true
---
```{python}
from palmerpenguins import penguins
from pandas import get_dummies
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
```
## Get Data
```{python}
df = penguins.load_penguins().dropna()
df.head(3)
```
## Define Model and Fit
```{python}
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']
model = LinearRegression().fit(X, y)
```
## Get some information
```{python}
print(f"R^2 {model.score(X,y)}")
print(f"Intercept {model.intercept_}")
print(f"Columns {X.columns}")
print(f"Coefficients {model.coef_}")
```
Modeling in Python
The model-basic-py.qmd file creates a model using Python’s palmerpenguins, numpy, pandas, and scikit-learn libraries.
View model-basic-py.qmd
I’ve added the reticulate code below to run this code in Quarto.
library(reticulate)
use_virtualenv("myenv", required = TRUE)
virtualenv_install(
envname = "myenv",
packages = c("palmerpenguins", "pandas", "numpy",
"scikit-learn", "jupyter"))from palmerpenguins import penguins
from pandas import get_dummies
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn import preprocessingdf = penguins.load_penguins().dropna()
df.head(3) species island bill_length_mm ... body_mass_g sex year
0 Adelie Torgersen 39.1 ... 3750.0 male 2007
1 Adelie Torgersen 39.5 ... 3800.0 female 2007
2 Adelie Torgersen 40.3 ... 3250.0 female 2007
[3 rows x 8 columns]
X = get_dummies(df[['bill_length_mm', 'species', 'sex']], drop_first = True)
y = df['body_mass_g']
model = LinearRegression().fit(X, y)print(f"R^2 {model.score(X,y)}")R^2 0.8555368759537614
print(f"Intercept {model.intercept_}")Intercept 2169.2697209393973
print(f"Columns {X.columns}")Columns Index(['bill_length_mm', 'species_Chinstrap', 'species_Gentoo', 'sex_male'], dtype='object')
print(f"Coefficients {model.coef_}")Coefficients [ 32.53688677 -298.76553447 1094.86739145 547.36692408]