ggplot2
NOTE: this set of exercises is a little different from the previous weeks. Instead of filling in each solution, I will work through an example, then provide a template for you to use on your own (the code chunks aren’t specific, so you can just swap out the hashtag in the search_tweets()
function, and all the code will run with a new hashtag).
View the slides for this section here.
This lesson will map Twitter data (tweets) based on a search term. We’re going to use the rtweet
package, which requires you to have a Twitter account. If you’re going to be using this application a lot to collect data (which I hope you are!), follow the instructions in the authentication vignette.
rtweet
We’re going to be access Twitter data for these exercises. If you’d like to follow along with your own data, check out the rtweet
package for installation and setup instructions.
The output from rtweet
is rather large (90 columns)
colnames(TweetsRaw)
## [1] "user_id" "status_id"
## [3] "created_at" "screen_name"
## [5] "text" "source"
## [7] "display_text_width" "reply_to_status_id"
## [9] "reply_to_user_id" "reply_to_screen_name"
## [11] "is_quote" "is_retweet"
## [13] "favorite_count" "retweet_count"
## [15] "quote_count" "reply_count"
## [17] "hashtags" "symbols"
## [19] "urls_url" "urls_t.co"
## [21] "urls_expanded_url" "media_url"
## [23] "media_t.co" "media_expanded_url"
## [25] "media_type" "ext_media_url"
## [27] "ext_media_t.co" "ext_media_expanded_url"
## [29] "ext_media_type" "mentions_user_id"
## [31] "mentions_screen_name" "lang"
## [33] "quoted_status_id" "quoted_text"
## [35] "quoted_created_at" "quoted_source"
## [37] "quoted_favorite_count" "quoted_retweet_count"
## [39] "quoted_user_id" "quoted_screen_name"
## [41] "quoted_name" "quoted_followers_count"
## [43] "quoted_friends_count" "quoted_statuses_count"
## [45] "quoted_location" "quoted_description"
## [47] "quoted_verified" "retweet_status_id"
## [49] "retweet_text" "retweet_created_at"
## [51] "retweet_source" "retweet_favorite_count"
## [53] "retweet_retweet_count" "retweet_user_id"
## [55] "retweet_screen_name" "retweet_name"
## [57] "retweet_followers_count" "retweet_friends_count"
## [59] "retweet_statuses_count" "retweet_location"
## [61] "retweet_description" "retweet_verified"
## [63] "place_url" "place_name"
## [65] "place_full_name" "place_type"
## [67] "country" "country_code"
## [69] "geo_coords" "coords_coords"
## [71] "bbox_coords" "status_url"
## [73] "name" "location"
## [75] "description" "url"
## [77] "protected" "followers_count"
## [79] "friends_count" "listed_count"
## [81] "statuses_count" "favourites_count"
## [83] "account_created_at" "verified"
## [85] "profile_url" "profile_expanded_url"
## [87] "account_lang" "profile_banner_url"
## [89] "profile_background_url" "profile_image_url"
We only need a subset of these columns to build our map, and fortunately the rtweet
package comes with some handy functions for reduce this output to a more manageable dataset.
It’s always a good idea to export the raw twitter data you’ve collected, because these data are always subject to change. For example, this tutorial uses data for the #NFL
hashtag, which was collected on a Sunday. We’re not likely to see the same data if we collected data on the following Monday (or Tuesday, for that matter).
data_path <- "../data/wk11-01_intro-to-maps/raw/"
fs::dir_create(data_path)
# make sure to use a date (or time) stamp!
rtweet::write_as_csv(x = TweetsRaw, paste0(data_path, "2021-11-07-NFL-TweetsRaw.csv"))
fs::dir_tree(data_path)
## ../data/wk11-01_intro-to-maps/raw/
## └── 2021-11-07-NFL-TweetsRaw.csv
The rtweet::users_data()
function separates the ‘users’ variables from the ‘tweet’ variables.
users_data()
columnsBelow I combine the base::intersect()
and base::names()
functions to see what variables from TweetsRaw
will end up in the results from rtweet::users_data()
(I added tibble::as_tibble()
so the variables print nicely to the screen)
tibble::as_tibble(
base::intersect(x = base::names(rtweet::users_data(TweetsRaw)) ,
y = base::names(TweetsRaw))
)
Looks like there will be 20 variables in the output from users_data()
.
TweetsUsers
dataWe will separate the user data from the raw data and store this in TweetsUsers
.
TweetsUsers <- rtweet::users_data(TweetsRaw)
glimpse(TweetsUsers)
## Rows: 7,974
## Columns: 20
## $ user_id <chr> "1274085294934032385", "127…
## $ screen_name <chr> "TFFPhilip", "TFFPhilip", "…
## $ name <chr> "ThrillsFantasyFootball", "…
## $ location <chr> "", "", "", "", "", "", "",…
## $ description <chr> "🗓Year-Round Content 🏈Data…
## $ url <chr> "https://t.co/Vk9smO8q78", …
## $ protected <lgl> FALSE, FALSE, FALSE, FALSE,…
## $ followers_count <int> 1771, 1771, 1771, 1771, 177…
## $ friends_count <int> 1601, 1601, 1601, 1601, 160…
## $ listed_count <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ statuses_count <int> 7899, 7899, 7899, 7899, 789…
## $ favourites_count <int> 8799, 8799, 8799, 8799, 879…
## $ account_created_at <dttm> 2020-06-19 21:04:25, 2020-…
## $ verified <lgl> FALSE, FALSE, FALSE, FALSE,…
## $ profile_url <chr> "https://t.co/Vk9smO8q78", …
## $ profile_expanded_url <chr> "http://instagram.com/thril…
## $ account_lang <lgl> NA, NA, NA, NA, NA, NA, NA,…
## $ profile_banner_url <chr> "https://pbs.twimg.com/prof…
## $ profile_background_url <chr> NA, NA, NA, NA, NA, NA, NA,…
## $ profile_image_url <chr> "http://pbs.twimg.com/profi…
The rtweet::tweets_data()
function separates the “tweets data from users data object.”
tweets_data()
columnsWe repeat the process from above to get a look at the columns we’ll get back from the rtweet::tweets_data()
function:
tibble::as_tibble(base::intersect(x = base::names(rtweet::tweets_data(TweetsRaw)) ,
y = base::names(TweetsRaw)))
This dataset will have 68 columns.
TweetsData
We will store the output from tweets_data()
in the TweetsData
object.
TweetsData <- rtweet::tweets_data(TweetsRaw)
glimpse(TweetsData)
## Rows: 7,974
## Columns: 68
## $ user_id <chr> "1274085294934032385", "12…
## $ status_id <chr> "1457583138553663490", "14…
## $ created_at <dttm> 2021-11-08 05:37:55, 2021…
## $ screen_name <chr> "TFFPhilip", "TFFPhilip", …
## $ text <chr> "This week has been a toug…
## $ source <chr> "Twitter for iPhone", "Twi…
## $ display_text_width <dbl> 140, 140, 140, 140, 140, 1…
## $ reply_to_status_id <chr> NA, NA, NA, NA, NA, NA, NA…
## $ reply_to_user_id <chr> NA, NA, NA, NA, NA, NA, NA…
## $ reply_to_screen_name <chr> NA, NA, NA, NA, NA, NA, NA…
## $ is_quote <lgl> FALSE, FALSE, FALSE, FALSE…
## $ is_retweet <lgl> TRUE, TRUE, TRUE, TRUE, TR…
## $ favorite_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ retweet_count <int> 47, 33, 59, 39, 20, 64, 34…
## $ hashtags <list> "NFL", <"WeStillDemBoyz",…
## $ symbols <list> NA, NA, NA, NA, NA, NA, N…
## $ urls_url <list> NA, NA, NA, NA, NA, NA, N…
## $ urls_t.co <list> NA, NA, NA, NA, NA, NA, N…
## $ urls_expanded_url <list> NA, NA, NA, NA, NA, NA, N…
## $ media_url <list> NA, NA, NA, NA, NA, NA, "…
## $ media_t.co <list> NA, NA, NA, NA, NA, NA, "…
## $ media_expanded_url <list> NA, NA, NA, NA, NA, NA, "…
## $ media_type <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_url <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_t.co <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_expanded_url <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_type <chr> NA, NA, NA, NA, NA, NA, NA…
## $ mentions_user_id <list> "1345765795469668355", "1…
## $ mentions_screen_name <list> "NothinBtAirtime", "JNfor…
## $ lang <chr> "en", "en", "en", "en", "e…
## $ quoted_status_id <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_text <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_created_at <dttm> NA, NA, NA, NA, NA, NA, N…
## $ quoted_source <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_favorite_count <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_retweet_count <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_user_id <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_screen_name <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_name <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_followers_count <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_friends_count <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_statuses_count <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_location <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_description <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_verified <lgl> NA, NA, NA, NA, NA, NA, NA…
## $ retweet_status_id <chr> "1457391892673466368", "14…
## $ retweet_text <chr> "This week has been a toug…
## $ retweet_created_at <dttm> 2021-11-07 16:57:58, 2021…
## $ retweet_source <chr> "Twitter Web App", "Twitte…
## $ retweet_favorite_count <int> 46, 36, 55, 34, 21, 66, 45…
## $ retweet_user_id <chr> "1345765795469668355", "11…
## $ retweet_screen_name <chr> "NothinBtAirtime", "JNfors…
## $ retweet_name <chr> "Nothin’ But Airtime", "Jo…
## $ retweet_followers_count <int> 2731, 10809, 8528, 8528, 8…
## $ retweet_friends_count <int> 2570, 11421, 8339, 8339, 9…
## $ retweet_statuses_count <int> 8733, 15348, 29341, 29341,…
## $ retweet_location <chr> "Milwaukee, WI", "New York…
## $ retweet_description <chr> "🏀 Hosted by @craines38, …
## $ retweet_verified <lgl> FALSE, FALSE, FALSE, FALSE…
## $ place_url <chr> NA, NA, NA, NA, NA, NA, NA…
## $ place_name <chr> NA, NA, NA, NA, NA, NA, NA…
## $ place_full_name <chr> NA, NA, NA, NA, NA, NA, NA…
## $ place_type <chr> NA, NA, NA, NA, NA, NA, NA…
## $ country <chr> NA, NA, NA, NA, NA, NA, NA…
## $ country_code <chr> NA, NA, NA, NA, NA, NA, NA…
## $ geo_coords <list> <NA, NA>, <NA, NA>, <NA, …
## $ coords_coords <list> <NA, NA>, <NA, NA>, <NA, …
## $ bbox_coords <list> <NA, NA, NA, NA, NA, NA, …
You may have noticed these data don’t have the latitude or longitude data–we will add these variables below.
lat_lng()
columnsIf we look at the help info on the rtweet::lat_lng()
function, we can see that this will only add two columns to the TweetsRaw
, lat
and lng
(for latitude and longitude).
This would result in quite a few variables, but we don’t need all the variables from TweetsRaw
.
Fortunately, we now know how to combine dplyr
’s select()
function to only get the variables we want from TweetsRaw
, which include user_id
, created_at
, screen_name
, text
, retweet_count
, favorite_count
, country
, location
, country_code
friends_count
, and the new lat
and lng
variables.
TweetsLatLng <- rtweet::lat_lng(TweetsRaw) %>%
select(user_id, created_at, screen_name, text,
retweet_count, favorite_count, country,
location, country_code, friends_count,
lat, lng)
glimpse(TweetsLatLng)
## Rows: 7,974
## Columns: 12
## $ user_id <chr> "1274085294934032385", "12740852949…
## $ created_at <dttm> 2021-11-08 05:37:55, 2021-11-08 05…
## $ screen_name <chr> "TFFPhilip", "TFFPhilip", "TFFPhili…
## $ text <chr> "This week has been a tough one. St…
## $ retweet_count <int> 47, 33, 59, 39, 20, 64, 34, 31, 39,…
## $ favorite_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0,…
## $ country <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ location <chr> "", "", "", "", "", "", "", "", "",…
## $ country_code <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ friends_count <int> 1601, 1601, 1601, 1601, 1601, 1601,…
## $ lat <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ lng <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
lat
& lng
observationsIt’s always good to check how many valid observations we have for the lat
and lng
columns, because not every Twitter user allows these data to be collected. We can check this with some help from dplyr::distinct()
dplyr::distinct(.data = TweetsLatLng, lat, lng)
We can see this is a small fraction of the overall twitter data, but it’s enough for us to build a map!
ggplot2
To build a map with ggplot2
, we need to have a canvas (i.e. data-points) to plot with. We can do this with the gggplot2::map_data()
function.
map_data("world)
The map_data("world")
returns a dataset from the maps
package that is “suitable for plotting with ggplot2
”
World <- ggplot2::map_data("world")
World %>% glimpse(78)
## Rows: 99,338
## Columns: 6
## $ long <dbl> -69.89912, -69.89571, -69.94219, -70.00415, -70.06612, -70…
## $ lat <dbl> 12.45200, 12.42300, 12.43853, 12.50049, 12.54697, 12.59707…
## $ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18,…
## $ region <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aru…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
The World
data is a dataset with latitude (lat
), longitude (long
), and group (group
) values across the entire world. We’re going to use these values to ‘sketch’ a map outline using geom_point()
and geom_polygon()
below:
coord_quickmap()
with pointsJust like with other plots, we want to build the labels first (we’ll store some information about the data in the labels so it’s clear where it came from).
labs_geom_point <- ggplot2::labs(
title = "Basic World Map (geom_point)",
subtitle = "map_data('world')")
We will start by creating a map using geom_point()
layer, but add the coord_quickmap()
function (which projects a portion of the earth, which is approximately spherical, onto a flat 2D plane):
ggplot()
function to initialize the plotgeom_point()
layer, specifying the x
as long
and y
as lat
World %>%
ggplot() + # initializes graph
geom_point(aes(x = long, y = lat), show.legend = FALSE) +
coord_quickmap() +
labs_geom_point
What we’ve done here is plot data-points that outline each continent (and fit the spherical location of the long
and lat
to a 2-D plane). But the points make the continent outlines sloppy–we should be using lines.
coord_quickmap()
with polygonsWe want to convert to point outline into lines, which we can do using geom_polygon()
(read more about this here). The steps are very similar:
ggplot()
function to initialize the plotgeom_polygon()
layer, specifying the x
as long
, y
as lat
, and group
as group
labs_geom_polygon <- ggplot2::labs(
title = "Basic World Map (geom_polygon)",
subtitle = "map_data('world')")
World %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group)) +
coord_quickmap() +
labs_geom_polygon
That looks much better, but we should clean it up a bit by removing the x
and y
axes, and reducing some of the color contrast.
fill
, color
and alpha
arguments to lighten the color of the continentsggplot2::theme_void()
layer (specifically designed to remove excess chart junk and give it a ‘minimal’ look)We will save this map as ggp_word_map
ggp_word_map <- World %>%
ggplot() +
geom_polygon(aes(x = long, y = lat, group = group),
# these are outside the aes() function!
fill = "grey75", color = "white", alpha = 0.8) +
coord_quickmap() +
# add theme
ggplot2::theme_void() +
# don't forget the labels
labs_geom_polygon
ggp_word_map
This looks much better! Now we’re ready to add our Twitter data.
The default map in the geom_polygon()
is what’s referred to as the mercator
projection. The Mercator projection works well for navigation because the meridians are equally spaced (the grid lines that runs north and south), but the parallels (the lines that run east/west around) are not equally spaced.
This causes a distortion in the land masses at both poles. The map above makes it look like Greenland is roughly 1/2 or 2/3 the size of Africa, when in reality Africa is 14x larger. These are well-known limitations of this projection, so there’s nothing wrong with using it (but it’s good information to know!)
ggplot2
has a handy function for creating a map quickly (appropriately called coord_quickmap()
), and the mercator
projection is the default setting. If we want to add our data in TweetsLatLng
to the existing graph, we need to include these data in a new geom_polygon()
layer.
Recall that the existing map is using the World
data (printed below)
glimpse(World)
## Rows: 99,338
## Columns: 6
## $ long <dbl> -69.89912, -69.89571, -69.94219, -70.004…
## $ lat <dbl> 12.45200, 12.42300, 12.43853, 12.50049, …
## $ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2…
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 1…
## $ region <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aru…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
We need to rename lng
to long
, (so the match the variables names in the World
data), and remove the data with empty latitude and longitude. Store these new data in TweetsMap
.
TweetsMap <- TweetsLatLng %>%
# rename to match
dplyr::rename(long = lng) %>%
# remove missing
filter(!is.na(long) & !is.na(lat))
glimpse(TweetsMap)
## Rows: 197
## Columns: 12
## $ user_id <chr> "1391401897035243527", "1485612379"…
## $ created_at <dttm> 2021-11-08 05:37:26, 2021-11-08 05…
## $ screen_name <chr> "BarstoolMKE1", "amandaheger613", "…
## $ text <chr> "BREAKING: Sports media taking Mond…
## $ retweet_count <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
## $ favorite_count <int> 0, 0, 0, 0, 2, 2, 1, 0, 0, 0, 1, 3,…
## $ country <chr> "United States", "United States", "…
## $ location <chr> "Milwaukee, WI", "Goodyear, AZ", "G…
## $ country_code <chr> "US", "US", "US", "US", "US", "US",…
## $ friends_count <int> 278, 40, 40, 3156, 1424, 3282, 2613…
## $ lat <dbl> 43.06754, 33.41287, 33.41287, 32.81…
## $ long <dbl> -88.02554, -112.42498, -112.42498, …
We will update the labels:
labs_rtweet_coord_quickmap <- ggplot2::labs(
title = " #NFL hashtags = World Map (labs_coord_quickmap())",
subtitle = " rtweet data")
And add a ggplot2::geom_point()
to include our tweets on the map:
ggp_word_map +
ggplot2::geom_point(data = TweetsMap,
aes(x = long, y = lat)) +
# add titles/labels
labs_rtweet_coord_quickmap
Now we can see the tweets have been added as data points to the existing map projection! We can see most of these data are limited to the US, so we will build a US map below.
We can use the ggplot2::map_data("usa")
function to create a US map (USmap
) dataset.
USmap <- ggplot2::map_data("usa")
USmap %>% glimpse(78)
## Rows: 7,243
## Columns: 6
## $ long <dbl> -101.4078, -101.3906, -101.3620, -101.3505, -101.3219, -10…
## $ lat <dbl> 29.74224, 29.74224, 29.65056, 29.63911, 29.63338, 29.64484…
## $ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ region <chr> "main", "main", "main", "main", "main", "main", "main", "m…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
We can see these data are very similar to World
, and have the same long
, lat
, and group
variables.
coord_quickmap()
)We can plot USmap
the same way we did above (with geom_polygon()
and coord_quickmap()
), after creating a new set of labels:
USmap
data to the ggplot()
function to initialize the plotgeom_polygon()
layer, specifying the x
as long
, y
as lat
, and group
as group
labs_geom_polygon_usa <- ggplot2::labs(
title = " US Map (geom_polygon)",
subtitle = " map_data('usa')")
USmap %>%
ggplot2::ggplot() +
ggplot2::geom_polygon(aes(x = long,
y = lat,
group = group)) +
ggplot2::coord_quickmap() +
labs_geom_polygon_usa
Once again we see this map of the US needs some customizing, so we include the fill
, color
, and alpha
arguments.
fill
, color
and alpha
arguments to lighten the color of the continentsggplot2::theme_void()
layer (specifically designed to remove excess chart junk and give it a ‘minimal’ look)Save this as ggp_us_map
ggp_us_map <- USmap %>%
ggplot2::ggplot() +
ggplot2::geom_polygon(aes(x = long,
y = lat,
group = group),
fill = "grey70", color = "white", alpha = 0.8) +
ggplot2::coord_quickmap() +
ggplot2::theme_void() +
labs_geom_polygon_usa
ggp_us_map
Now we have a canvas to work with–lets filter the TweetsMap
data to only those tweets from the US using the country_code
(first we count this variable to see what the codes are).
TweetsMap %>%
count(country_code, sort = TRUE)
It looks like we have 150 tweets from the US–we will store these data in UsTweets
UsTweets <- TweetsMap %>% filter(country_code == "US")
glimpse(UsTweets)
## Rows: 150
## Columns: 12
## $ user_id <chr> "1391401897035243527", "1485612379"…
## $ created_at <dttm> 2021-11-08 05:37:26, 2021-11-08 05…
## $ screen_name <chr> "BarstoolMKE1", "amandaheger613", "…
## $ text <chr> "BREAKING: Sports media taking Mond…
## $ retweet_count <int> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,…
## $ favorite_count <int> 0, 0, 0, 0, 2, 2, 1, 1, 2, 0, 0, 90…
## $ country <chr> "United States", "United States", "…
## $ location <chr> "Milwaukee, WI", "Goodyear, AZ", "G…
## $ country_code <chr> "US", "US", "US", "US", "US", "US",…
## $ friends_count <int> 278, 40, 40, 3156, 1424, 3282, 1989…
## $ lat <dbl> 43.06754, 33.41287, 33.41287, 32.81…
## $ long <dbl> -88.02554, -112.42498, -112.42498, …
We will updated the labels for the US map.
geom_polygon()
)geom_point()
layer to the existing ggp_us_map
data = UsTweets
argument at this layer, and specify the x = long
and y = lat
# new labels
labs_coord_quickmap_tweets_usa <- ggplot2::labs(
title = " #NFL tweets = Basic US Map (coord_quickmap)",
subtitle = " map_data('usa')")
ggp_us_map +
# twitter data layer
ggplot2::geom_point(data = UsTweets,
aes(x = long, y = lat)) +
ggplot2::theme_void() +
labs_coord_quickmap_tweets_usa
We can see this map output is including the tweets from Hawaii (which is skewing the map projection), so we will combine ggplot2
s layers and dplyr
s data manipulation functions together to remove these points without changing the data in UsTweets
.
We can use str_view_all()
to take a look at the location
variable and see if we can find the Hawaii location:
# search for ", HI" pattern
str_view_all(string = UsTweets$location, pattern = ", HI", match = TRUE)
We can see two tweets with location
as Honolulu, HI
, so we will filter()
these data inside the geom_point()
layer in the data
argument.
!str_detect()
to remove any observations that match the Honolulu, HI
patternsize
of the points to 0.9
and the color
of the points to "firebrick"
labs_coord_quickmap_no_hi <- ggplot2::labs(
title = "#NFL tweets = Basic US Map (coord_quickmap)",
subtitle = "map_data('usa')",
caption = "Tweets from Honolulu, HI have been removed")
ggp_us_map +
ggplot2::geom_point(
data = filter(UsTweets,
!str_detect(location, "Honolulu, HI")),
aes(x = long, y = lat),
size = 0.9, # reduce size of points
color = "firebrick") +
ggplot2::theme_void() +
labs_coord_quickmap_no_hi
This is looking better, but we should recall we have a few additional variables on the tweets in UsTweets
. Let’s use what we’ve learned in previous lessons/exercises to view the distribution of favorite_count
across the various locations in UsTweets
.
favorite_count
UsTweets
data to filter()
and remove the tweets from Hawaiiggplot()
and assign favorite_count
to the x
aestheticgeom_density()
layerfacet_wrap()
and facet the plots by location
theme_minimal()
to reduce the chart elementslabs_facet_wrap_favorite_count <- ggplot2::labs(
title = "Favorite counts by location (US Twitter data)",
x = "Favorite count",
y = "count"
)
UsTweets %>%
filter(!str_detect(location, "Honolulu, HI")) %>%
ggplot(aes(x = favorite_count)) +
geom_density() +
facet_wrap(. ~ location) +
theme_minimal() +
labs_facet_wrap_favorite_count
One of the drawbacks of the density plot is that the y
axis is hard to interpret, but it’s OK in this case, because this graph is telling us all we need to know: Some of these aren’t like the others
favorite_count
to US MapIf we want to add the favorite_count
variable to the plot, we can do this with the size argument in geom_point()
, which will make the size of the point relative to the number of favorite_count
at each long
and lat
.
paste0()
and mean()
to get the average created_at
time for the tweetsgeom_point()
layer to ggp_us_map
, filtering out the Hawaii locationsx
and y
as long
and lat
inside the aes()
function, and set size
to favorite_count
outside the aes()
functioncolor
to "firebrick"
againtheme()
layer and move the legend using legend.position = 'bottom'
labs_coord_quickmap_favs <- ggplot2::labs(
title = " #NFL Tweets",
subtitle = paste0(" Tweets collected around ",
mean(UsTweets$created_at, na.rm = TRUE)),
size = "Favorites")
ggp_us_map +
ggplot2::geom_point(
data = filter(UsTweets,
!str_detect(location, "Honolulu, HI")),
aes(x = long,
y = lat,
size = favorite_count),
color = "firebrick") +
theme(legend.position = 'bottom') +
labs_coord_quickmap_favs
This has been a very short introduction to maps with ggplot2
. In the next lesson, we will introduce how to make maps with leaflet
and plotly
, two popular mapping packages for creating interactive maps.
Be sure to check out the resources below for building maps: