show/hide
def percent_of_income(expense, income):
return (expense / income) * 100
percent_of_income(expense=1500, income=5000)
#> 30.0A quick reference to the Python concepts used throughout this book, aligned with the R Reference so you can map ideas between the two languages.
Functions perform operations (calculate, transform, model, graph) on various objects that contain information (blood pressure measurements, monthly sales, political party affiliation, etc.)
def percent_of_income(expense, income):
return (expense / income) * 100
percent_of_income(expense=1500, income=5000)
#> 30.0The pattern is def name(inputs): followed by an indented body. Values are returned explicitly with return.
Python’s data containers come from two places: built-in types (scalars, list, dict, datetime) and library types from numpy (arrays) and polars (Series, DataFrame, Categorical, Enum). The library types (together with Python’s datetime module) fill R’s “typed vector” role.
The closest equivalent to R’s typed one-dimensional vector is a numpy array (a single dtype).
The four atomic types: logical (bool), integer, double (float), and character (str).
np.array([True, False, True]) # logical (bool)
#> array([ True, False, True])
np.array([1, 2, 3], dtype="int64") # integer
#> array([1, 2, 3])
np.array([1.5, 2.0, 3.14]) # double (float64)
#> array([1.5 , 2. , 3.14])
np.array(["apple", "banana", "cherry"]) # character (string)
#> array(['apple', 'banana', 'cherry'], dtype='<U6')Python’s built-in datetime module supplies the scalar analogues of R’s date types (dates, date-times, durations), and polars provides the categorical/enum types for R’s factor:
date(2026, 1, 15) # date
#> datetime.date(2026, 1, 15)
datetime(2026, 1, 15, 9, 30, 0) # date-time
#> datetime.datetime(2026, 1, 15, 9, 30)
timedelta(hours=2) # duration
#> datetime.timedelta(seconds=7200)
pl.Series(["low", "med", "high"],
dtype=pl.Enum(["low", "med", "high"])) # factor| enum |
| "low" |
| "med" |
| "high" |
Two dimensional numpy arrays.
np.arange(1, 7).reshape(2, 3)
#> array([[1, 2, 3],
#> [4, 5, 6]])Multidimensional numpy arrays (ndarray).
np.arange(1, 13).reshape(2, 3, 2)
#> array([[[ 1, 2],
#> [ 3, 4],
#> [ 5, 6]],
#>
#> [[ 7, 8],
#> [ 9, 10],
#> [11, 12]]])Rectangular objects: polars.DataFrame is the equivalent of R’s data frame / tibble. Its expression API (filter, select, with_columns, group_by().agg()) mirrors dplyr closely.
pl.DataFrame({
"name": ["Alice", "Bob", "Carol"],
"age": [34, 28, 41],
"paid": [True, False, True],
})| name | age | paid |
|---|---|---|
| str | i64 | bool |
| "Alice" | 34 | true |
| "Bob" | 28 | false |
| "Carol" | 41 | true |
Recursive objects: Python’s list is the closest match to R’s list but unkeyed; dict is the closer fit when elements have names.
[1, 2, 3, "Alice", [True, False]]
#> [1, 2, 3, 'Alice', [True, False]]{
"numbers": [1, 2, 3],
"name": "Alice",
"flags": [True, False],
"nested": {"a": 1, "b": "two"},
}
#> {'numbers': [1, 2, 3], 'name': 'Alice', 'flags': [True, False], 'nested': {'a': 1, 'b': 'two'}}