Appendix B — Data files

Published

2024-09-19

Alan J. Perlis’ Epigrams

The ajperlis_epigrams.txt file contains a collection of witty programming and systems design epigrams by Alan J. Perlis.1 These statements cover various topics related to computer programming, software architecture, and philosophy. The epigrams offer nuggets of reflective and forward-thinking wisdom that provoke thought on the complexities of technology, the art of programming, and the interplay between human cognition and computational logic. This collection is invaluable for programmers and computer scientists interested in the cultural and intellectual history of computing.

Music Videos

The file music_vids.tsv is a tab-separated values (TSV) data file that details some of the most expensive music videos ever produced.2 Key information captured in the file includes:

Variable Description
rank The music video’s position based on the production cost.
title The name of the music video.
artists The artist(s) who performed the song.
director The director of the music video.
year The year the music video was released.
cost_nominal The production cost at the release time, presented in U.S. dollars.
cost_adj The production cost adjusted to current values for inflation, also presented in U.S. dollars.

Passwords

pwrds.csv is a CSV (comma-separated values) file containing a comprehensive list of commonly used passwords and attributes.3 The dataset contains information about passwords’ strength, popularity rank, and resilience against online attacks. The dataset includes the following variables:

Variable Description
password The actual password text.
rank Numerical ranking based on the frequency or commonness of the password.
strength A numerical value representing the estimated password strength, where higher numbers indicate stronger passwords.
online_crack An estimate of how long it would take to crack this password using online attack methods, expressed in units from seconds to years.

Roxanne

The file roxanne.txt contains the lyrics to the song “Roxanne” by The Police.4 The structure of the lyrics emphasizes the repeated refrain, “You don’t have to put on the red light,” which is a metaphor for not having to engage in prostitution.

Trees

The file trees.csv is a comma-separated value (CSV) document that catalogs some of the tallest trees in the world, providing detailed information on each tree listed.5 This file is structured to provide a quick reference to some of the most significant trees around the world, highlighting their impressive heights and the diverse locations they inhabit. Each entry in the dataset includes the following fields:

Variable Description
tree Identifier or common name used for the tree.
species Scientific name of the tree species.
class Biological classification (e.g., Conifer, Flowering plant).
ht_meters Height of the tree in meters.
ht_feet Height of the tree in feet.
location Specific location where the tree is found.
continent Continent on which the tree is located.
name The colloquial name given to the tree, if applicable.

Video Game Hall of Fame

The file vg_hof.csv is a comma-separated values (CSV) document that contains a list of video games inducted into a hall of fame over several years, from 2015 to 2024.6 Each record in the dataset includes the year of induction, the game’s name, the developer, and the year the game was initially released.

The fields detailed in the dataset are:

Variable Description
year The year the game was inducted into the hall of fame.
game The name of the video game.
developer The company or individual who developed the game.
year_released The original year of the game’s release.

World Health Organization Tuberculosis

who_tb_data.csv and who_tb_data.tsv are comma and tab-separated values (CSV and TSV) files that provides tuberculosis (TB) case data alongside population figures for selected countries over specific years, as reported by the World Health Organization (WHO).7

The dataset includes the following fields:

Variable Description
country The name of the country where the data was recorded.
year The year in which the data was collected.
type A descriptor of the data type, which can be either ‘cases’ indicating the number of tuberculosis cases reported, or ‘population’ representing the total population of the country in that year.
count The numerical value corresponding to the type, either the number of TB cases or the population size.

Wu-Tang Clan

wu_tang.csv is a comma-separated data file that provides the stage names and real names of the members of the Wu-Tang Clan, a highly influential hip-hop group.8 Each entry in the dataset correlates a member’s popularly known stage name with their legal name, offering insight into the identities behind the personas. The data structure includes two main fields:

Variable Description
Member The stage name of the Wu-Tang Clan member.
Name The real name of the member.

  1. Read the Wikipedia or download the original PDF.↩︎

  2. From the List of most expensive music videos on Wikipedia↩︎

  3. The original data comes from Information is Beautiful↩︎

  4. Read more about Roxanne (The Police song) on the Wikipedia page.↩︎

  5. From the List of tallest trees on Wikipedia.↩︎

  6. From the World Video Game Hall of Fame Wikipedia page.↩︎

  7. From the WHO global tuberculosis programme↩︎

  8. From the Wu-Tang Clan Wikipedia page.↩︎