Python: Reference

A quick reference to the Python concepts used throughout this book, aligned with the R Reference so you can map ideas between the two languages.

Functions in Python

Functions perform operations (calculate, transform, model, graph) on various objects that contain information (blood pressure measurements, monthly sales, political party affiliation, etc.)

show/hide

def percent_of_income(expense, income):
    return (expense / income) * 100

percent_of_income(expense=1500, income=5000)
#> 30.0

The pattern is def name(inputs): followed by an indented body. Values are returned explicitly with return.

Types of objects in Python

Python’s data containers come from two places: built-in types (scalars, list, dict, datetime) and library types from numpy (arrays) and polars (Series, DataFrame, Categorical, Enum). The library types (together with Python’s datetime module) fill R’s “typed vector” role.

Vectors

The closest equivalent to R’s typed one-dimensional vector is a numpy array (a single dtype).

Atomic vectors

The four atomic types: logical (bool), integer, double (float), and character (str).

show/hide

np.array([True, False, True])              # logical (bool)
#> array([ True, False,  True])
np.array([1, 2, 3], dtype="int64")         # integer
#> array([1, 2, 3])
np.array([1.5, 2.0, 3.14])                 # double (float64)
#> array([1.5 , 2.  , 3.14])
np.array(["apple", "banana", "cherry"])    # character (string)
#> array(['apple', 'banana', 'cherry'], dtype='<U6')

Typed objects

Python’s built-in datetime module supplies the scalar analogues of R’s date types (dates, date-times, durations), and polars provides the categorical/enum types for R’s factor:

show/hide

date(2026, 1, 15)                                              # date
#> datetime.date(2026, 1, 15)
datetime(2026, 1, 15, 9, 30, 0)                                # date-time
#> datetime.datetime(2026, 1, 15, 9, 30)
timedelta(hours=2)                                             # duration
#> datetime.timedelta(seconds=7200)

pl.Series(["low", "med", "high"],
          dtype=pl.Enum(["low", "med", "high"]))               # factor

shape: (3,)


enum
"low"
"med"
"high"

Matrices

Two dimensional numpy arrays.

show/hide

np.arange(1, 7).reshape(2, 3)
#> array([[1, 2, 3],
#>        [4, 5, 6]])

Arrays

Multidimensional numpy arrays (ndarray).

show/hide

np.arange(1, 13).reshape(2, 3, 2)
#> array([[[ 1,  2],
#>         [ 3,  4],
#>         [ 5,  6]],
#> 
#>        [[ 7,  8],
#>         [ 9, 10],
#>         [11, 12]]])

Data frames

Rectangular objects: polars.DataFrame is the equivalent of R’s data frame / tibble. Its expression API (filter, select, with_columns, group_by().agg()) mirrors dplyr closely.

show/hide

pl.DataFrame({
    "name": ["Alice", "Bob", "Carol"],
    "age":  [34, 28, 41],
    "paid": [True, False, True],
})

shape: (3, 3)

name	age	paid
str	i64	bool
"Alice"	34	true
"Bob"	28	false
"Carol"	41	true

Lists & dicts

Recursive objects: Python’s list is the closest match to R’s list but unkeyed; dict is the closer fit when elements have names.

show/hide

[1, 2, 3, "Alice", [True, False]]
#> [1, 2, 3, 'Alice', [True, False]]

show/hide

{
    "numbers": [1, 2, 3],
    "name":    "Alice",
    "flags":   [True, False],
    "nested":  {"a": 1, "b": "two"},
}
#> {'numbers': [1, 2, 3], 'name': 'Alice', 'flags': [True, False], 'nested': {'a': 1, 'b': 'two'}}