Python Reference

A quick reference to the Python concepts used throughout this book, aligned with the R Reference so you can map ideas between the two languages.

Functions in Python

Functions perform operations (calculate, transform, model, graph) on various objects that contain information (blood pressure measurements, monthly sales, political party affiliation, etc.)

show/hide
def percent_of_income(expense, income):
    return (expense / income) * 100

percent_of_income(expense=1500, income=5000)
#> 30.0

The pattern is def name(inputs): followed by an indented body. Values are returned explicitly with return.

Types of objects in Python

Python’s data containers come from two places: built-in types (scalars, list, dict, datetime) and library types from numpy (arrays) and polars (Series, DataFrame, Categorical, Enum). The library types (together with Python’s datetime module) fill R’s “typed vector” role.

Vectors

The closest equivalent to R’s typed one-dimensional vector is a numpy array (a single dtype).

Atomic vectors

The four atomic types: logical (bool), integer, double (float), and character (str).

show/hide
np.array([True, False, True])              # logical (bool)
#> array([ True, False,  True])
np.array([1, 2, 3], dtype="int64")         # integer
#> array([1, 2, 3])
np.array([1.5, 2.0, 3.14])                 # double (float64)
#> array([1.5 , 2.  , 3.14])
np.array(["apple", "banana", "cherry"])    # character (string)
#> array(['apple', 'banana', 'cherry'], dtype='<U6')

Typed objects

Python’s built-in datetime module supplies the scalar analogues of R’s date types (dates, date-times, durations), and polars provides the categorical/enum types for R’s factor:

show/hide
date(2026, 1, 15)                                              # date
#> datetime.date(2026, 1, 15)
datetime(2026, 1, 15, 9, 30, 0)                                # date-time
#> datetime.datetime(2026, 1, 15, 9, 30)
timedelta(hours=2)                                             # duration
#> datetime.timedelta(seconds=7200)

pl.Series(["low", "med", "high"],
          dtype=pl.Enum(["low", "med", "high"]))               # factor
shape: (3,)
enum
"low"
"med"
"high"

Matrices

Two dimensional numpy arrays.

show/hide
np.arange(1, 7).reshape(2, 3)
#> array([[1, 2, 3],
#>        [4, 5, 6]])

Arrays

Multidimensional numpy arrays (ndarray).

show/hide
np.arange(1, 13).reshape(2, 3, 2)
#> array([[[ 1,  2],
#>         [ 3,  4],
#>         [ 5,  6]],
#> 
#>        [[ 7,  8],
#>         [ 9, 10],
#>         [11, 12]]])

Data frames

Rectangular objects: polars.DataFrame is the equivalent of R’s data frame / tibble. Its expression API (filter, select, with_columns, group_by().agg()) mirrors dplyr closely.

show/hide
pl.DataFrame({
    "name": ["Alice", "Bob", "Carol"],
    "age":  [34, 28, 41],
    "paid": [True, False, True],
})
shape: (3, 3)
name age paid
str i64 bool
"Alice" 34 true
"Bob" 28 false
"Carol" 41 true

Lists & dicts

Recursive objects: Python’s list is the closest match to R’s list but unkeyed; dict is the closer fit when elements have names.

show/hide
[1, 2, 3, "Alice", [True, False]]
#> [1, 2, 3, 'Alice', [True, False]]
show/hide
{
    "numbers": [1, 2, 3],
    "name":    "Alice",
    "flags":   [True, False],
    "nested":  {"a": 1, "b": "two"},
}
#> {'numbers': [1, 2, 3], 'name': 'Alice', 'flags': [True, False], 'nested': {'a': 1, 'b': 'two'}}