1 Outline

  1. Import Twitter Data
  2. Wrangle Data
  3. Latitude and Longitude Data
  4. Maps with ggplot2

NOTE: this set of exercises is a little different from the previous weeks. Instead of filling in each solution, I will work through an example, then provide a template for you to use on your own (the code chunks aren’t specific, so you can just swap out the hashtag in the search_tweets() function, and all the code will run with a new hashtag).

2 Materials

View the slides for this section here.

3 Twitter data

This lesson will map Twitter data (tweets) based on a search term. We’re going to use the rtweet package, which requires you to have a Twitter account. If you’re going to be using this application a lot to collect data (which I hope you are!), follow the instructions in the authentication vignette.

3.1 rtweet

We’re going to be access Twitter data for these exercises. If you’d like to follow along with your own data, check out the rtweet package for installation and setup instructions.

3.1.1 Searching hashtags

We’re going to demonstrate how to use the rtweet::search_tweets() function.

library(rtweet)
TweetsRaw <- search_tweets("#NFL", n = 10000)

You should see the following message when the data collection is complete. If you’re having trouble gathering the Twitter data, you should check the retryonratelimit argument.

Searching for users...
Finished collecting users!

Now let’s review what we got!

3.1.2 Raw Twitter data

The output from rtweet is rather large (90 columns)

colnames(TweetsRaw)
##  [1] "user_id"                 "status_id"              
##  [3] "created_at"              "screen_name"            
##  [5] "text"                    "source"                 
##  [7] "display_text_width"      "reply_to_status_id"     
##  [9] "reply_to_user_id"        "reply_to_screen_name"   
## [11] "is_quote"                "is_retweet"             
## [13] "favorite_count"          "retweet_count"          
## [15] "quote_count"             "reply_count"            
## [17] "hashtags"                "symbols"                
## [19] "urls_url"                "urls_t.co"              
## [21] "urls_expanded_url"       "media_url"              
## [23] "media_t.co"              "media_expanded_url"     
## [25] "media_type"              "ext_media_url"          
## [27] "ext_media_t.co"          "ext_media_expanded_url" 
## [29] "ext_media_type"          "mentions_user_id"       
## [31] "mentions_screen_name"    "lang"                   
## [33] "quoted_status_id"        "quoted_text"            
## [35] "quoted_created_at"       "quoted_source"          
## [37] "quoted_favorite_count"   "quoted_retweet_count"   
## [39] "quoted_user_id"          "quoted_screen_name"     
## [41] "quoted_name"             "quoted_followers_count" 
## [43] "quoted_friends_count"    "quoted_statuses_count"  
## [45] "quoted_location"         "quoted_description"     
## [47] "quoted_verified"         "retweet_status_id"      
## [49] "retweet_text"            "retweet_created_at"     
## [51] "retweet_source"          "retweet_favorite_count" 
## [53] "retweet_retweet_count"   "retweet_user_id"        
## [55] "retweet_screen_name"     "retweet_name"           
## [57] "retweet_followers_count" "retweet_friends_count"  
## [59] "retweet_statuses_count"  "retweet_location"       
## [61] "retweet_description"     "retweet_verified"       
## [63] "place_url"               "place_name"             
## [65] "place_full_name"         "place_type"             
## [67] "country"                 "country_code"           
## [69] "geo_coords"              "coords_coords"          
## [71] "bbox_coords"             "status_url"             
## [73] "name"                    "location"               
## [75] "description"             "url"                    
## [77] "protected"               "followers_count"        
## [79] "friends_count"           "listed_count"           
## [81] "statuses_count"          "favourites_count"       
## [83] "account_created_at"      "verified"               
## [85] "profile_url"             "profile_expanded_url"   
## [87] "account_lang"            "profile_banner_url"     
## [89] "profile_background_url"  "profile_image_url"

We only need a subset of these columns to build our map, and fortunately the rtweet package comes with some handy functions for reduce this output to a more manageable dataset.

3.1.3 Export Raw Data!

It’s always a good idea to export the raw twitter data you’ve collected, because these data are always subject to change. For example, this tutorial uses data for the #NFL hashtag, which was collected on a Sunday. We’re not likely to see the same data if we collected data on the following Monday (or Tuesday, for that matter).

data_path <- "../data/wk11-01_intro-to-maps/raw/"
fs::dir_create(data_path)
# make sure to use a date (or time) stamp!
rtweet::write_as_csv(x = TweetsRaw, paste0(data_path, "2021-11-07-NFL-TweetsRaw.csv"))
fs::dir_tree(data_path)
## ../data/wk11-01_intro-to-maps/raw/
## └── 2021-11-07-NFL-TweetsRaw.csv

3.2 User data

The rtweet::users_data() function separates the ‘users’ variables from the ‘tweet’ variables.

3.2.1 users_data() columns

Below I combine the base::intersect() and base::names() functions to see what variables from TweetsRaw will end up in the results from rtweet::users_data() (I added tibble::as_tibble() so the variables print nicely to the screen)

tibble::as_tibble(
base::intersect(x = base::names(rtweet::users_data(TweetsRaw)) ,
                y = base::names(TweetsRaw))
)

Looks like there will be 20 variables in the output from users_data().

3.2.2 TweetsUsers data

We will separate the user data from the raw data and store this in TweetsUsers.

TweetsUsers <- rtweet::users_data(TweetsRaw)
glimpse(TweetsUsers)
## Rows: 7,974
## Columns: 20
## $ user_id                <chr> "1274085294934032385", "127…
## $ screen_name            <chr> "TFFPhilip", "TFFPhilip", "…
## $ name                   <chr> "ThrillsFantasyFootball", "…
## $ location               <chr> "", "", "", "", "", "", "",…
## $ description            <chr> "🗓Year-Round Content 🏈Data…
## $ url                    <chr> "https://t.co/Vk9smO8q78", …
## $ protected              <lgl> FALSE, FALSE, FALSE, FALSE,…
## $ followers_count        <int> 1771, 1771, 1771, 1771, 177…
## $ friends_count          <int> 1601, 1601, 1601, 1601, 160…
## $ listed_count           <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ statuses_count         <int> 7899, 7899, 7899, 7899, 789…
## $ favourites_count       <int> 8799, 8799, 8799, 8799, 879…
## $ account_created_at     <dttm> 2020-06-19 21:04:25, 2020-…
## $ verified               <lgl> FALSE, FALSE, FALSE, FALSE,…
## $ profile_url            <chr> "https://t.co/Vk9smO8q78", …
## $ profile_expanded_url   <chr> "http://instagram.com/thril…
## $ account_lang           <lgl> NA, NA, NA, NA, NA, NA, NA,…
## $ profile_banner_url     <chr> "https://pbs.twimg.com/prof…
## $ profile_background_url <chr> NA, NA, NA, NA, NA, NA, NA,…
## $ profile_image_url      <chr> "http://pbs.twimg.com/profi…

3.3 Tweets data

The rtweet::tweets_data() function separates the “tweets data from users data object.

3.3.1 tweets_data() columns

We repeat the process from above to get a look at the columns we’ll get back from the rtweet::tweets_data() function:

tibble::as_tibble(base::intersect(x = base::names(rtweet::tweets_data(TweetsRaw)) ,
                y = base::names(TweetsRaw)))

This dataset will have 68 columns.

3.3.2 TweetsData

We will store the output from tweets_data() in the TweetsData object.

TweetsData <- rtweet::tweets_data(TweetsRaw)
glimpse(TweetsData)
## Rows: 7,974
## Columns: 68
## $ user_id                 <chr> "1274085294934032385", "12…
## $ status_id               <chr> "1457583138553663490", "14…
## $ created_at              <dttm> 2021-11-08 05:37:55, 2021…
## $ screen_name             <chr> "TFFPhilip", "TFFPhilip", …
## $ text                    <chr> "This week has been a toug…
## $ source                  <chr> "Twitter for iPhone", "Twi…
## $ display_text_width      <dbl> 140, 140, 140, 140, 140, 1…
## $ reply_to_status_id      <chr> NA, NA, NA, NA, NA, NA, NA…
## $ reply_to_user_id        <chr> NA, NA, NA, NA, NA, NA, NA…
## $ reply_to_screen_name    <chr> NA, NA, NA, NA, NA, NA, NA…
## $ is_quote                <lgl> FALSE, FALSE, FALSE, FALSE…
## $ is_retweet              <lgl> TRUE, TRUE, TRUE, TRUE, TR…
## $ favorite_count          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ retweet_count           <int> 47, 33, 59, 39, 20, 64, 34…
## $ hashtags                <list> "NFL", <"WeStillDemBoyz",…
## $ symbols                 <list> NA, NA, NA, NA, NA, NA, N…
## $ urls_url                <list> NA, NA, NA, NA, NA, NA, N…
## $ urls_t.co               <list> NA, NA, NA, NA, NA, NA, N…
## $ urls_expanded_url       <list> NA, NA, NA, NA, NA, NA, N…
## $ media_url               <list> NA, NA, NA, NA, NA, NA, "…
## $ media_t.co              <list> NA, NA, NA, NA, NA, NA, "…
## $ media_expanded_url      <list> NA, NA, NA, NA, NA, NA, "…
## $ media_type              <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_url           <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_t.co          <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_expanded_url  <list> NA, NA, NA, NA, NA, NA, "…
## $ ext_media_type          <chr> NA, NA, NA, NA, NA, NA, NA…
## $ mentions_user_id        <list> "1345765795469668355", "1…
## $ mentions_screen_name    <list> "NothinBtAirtime", "JNfor…
## $ lang                    <chr> "en", "en", "en", "en", "e…
## $ quoted_status_id        <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_text             <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_created_at       <dttm> NA, NA, NA, NA, NA, NA, N…
## $ quoted_source           <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_favorite_count   <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_retweet_count    <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_user_id          <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_screen_name      <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_name             <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_followers_count  <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_friends_count    <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_statuses_count   <int> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_location         <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_description      <chr> NA, NA, NA, NA, NA, NA, NA…
## $ quoted_verified         <lgl> NA, NA, NA, NA, NA, NA, NA…
## $ retweet_status_id       <chr> "1457391892673466368", "14…
## $ retweet_text            <chr> "This week has been a toug…
## $ retweet_created_at      <dttm> 2021-11-07 16:57:58, 2021…
## $ retweet_source          <chr> "Twitter Web App", "Twitte…
## $ retweet_favorite_count  <int> 46, 36, 55, 34, 21, 66, 45…
## $ retweet_user_id         <chr> "1345765795469668355", "11…
## $ retweet_screen_name     <chr> "NothinBtAirtime", "JNfors…
## $ retweet_name            <chr> "Nothin’ But Airtime", "Jo…
## $ retweet_followers_count <int> 2731, 10809, 8528, 8528, 8…
## $ retweet_friends_count   <int> 2570, 11421, 8339, 8339, 9…
## $ retweet_statuses_count  <int> 8733, 15348, 29341, 29341,…
## $ retweet_location        <chr> "Milwaukee, WI", "New York…
## $ retweet_description     <chr> "🏀 Hosted by @craines38, …
## $ retweet_verified        <lgl> FALSE, FALSE, FALSE, FALSE…
## $ place_url               <chr> NA, NA, NA, NA, NA, NA, NA…
## $ place_name              <chr> NA, NA, NA, NA, NA, NA, NA…
## $ place_full_name         <chr> NA, NA, NA, NA, NA, NA, NA…
## $ place_type              <chr> NA, NA, NA, NA, NA, NA, NA…
## $ country                 <chr> NA, NA, NA, NA, NA, NA, NA…
## $ country_code            <chr> NA, NA, NA, NA, NA, NA, NA…
## $ geo_coords              <list> <NA, NA>, <NA, NA>, <NA, …
## $ coords_coords           <list> <NA, NA>, <NA, NA>, <NA, …
## $ bbox_coords             <list> <NA, NA, NA, NA, NA, NA, …

3.4 Latitude & longitude data

You may have noticed these data don’t have the latitude or longitude data–we will add these variables below.

3.4.1 lat_lng() columns

If we look at the help info on the rtweet::lat_lng() function, we can see that this will only add two columns to the TweetsRaw, lat and lng (for latitude and longitude).

This would result in quite a few variables, but we don’t need all the variables from TweetsRaw.

Fortunately, we now know how to combine dplyr’s select() function to only get the variables we want from TweetsRaw, which include user_id, created_at, screen_name, text, retweet_count, favorite_count, country, location, country_code friends_count, and the new lat and lng variables.

TweetsLatLng <- rtweet::lat_lng(TweetsRaw) %>% 
  select(user_id, created_at, screen_name, text, 
         retweet_count, favorite_count, country, 
         location, country_code, friends_count, 
         lat, lng)
glimpse(TweetsLatLng)
## Rows: 7,974
## Columns: 12
## $ user_id        <chr> "1274085294934032385", "12740852949…
## $ created_at     <dttm> 2021-11-08 05:37:55, 2021-11-08 05…
## $ screen_name    <chr> "TFFPhilip", "TFFPhilip", "TFFPhili…
## $ text           <chr> "This week has been a tough one. St…
## $ retweet_count  <int> 47, 33, 59, 39, 20, 64, 34, 31, 39,…
## $ favorite_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0,…
## $ country        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ location       <chr> "", "", "", "", "", "", "", "", "",…
## $ country_code   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ friends_count  <int> 1601, 1601, 1601, 1601, 1601, 1601,…
## $ lat            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ lng            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…

3.4.2 lat & lng observations

It’s always good to check how many valid observations we have for the lat and lng columns, because not every Twitter user allows these data to be collected. We can check this with some help from dplyr::distinct()

dplyr::distinct(.data = TweetsLatLng, lat, lng)

We can see this is a small fraction of the overall twitter data, but it’s enough for us to build a map!

3.5 Building maps in ggplot2

To build a map with ggplot2, we need to have a canvas (i.e. data-points) to plot with. We can do this with the gggplot2::map_data() function.

3.5.1 map_data("world)

The map_data("world") returns a dataset from the maps package that is “suitable for plotting with ggplot2

World <- ggplot2::map_data("world")
World %>% glimpse(78)
## Rows: 99,338
## Columns: 6
## $ long      <dbl> -69.89912, -69.89571, -69.94219, -70.00415, -70.06612, -70…
## $ lat       <dbl> 12.45200, 12.42300, 12.43853, 12.50049, 12.54697, 12.59707…
## $ group     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ order     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18,…
## $ region    <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aru…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

The World data is a dataset with latitude (lat), longitude (long), and group (group) values across the entire world. We’re going to use these values to ‘sketch’ a map outline using geom_point() and geom_polygon() below:

3.5.2 coord_quickmap() with points

Just like with other plots, we want to build the labels first (we’ll store some information about the data in the labels so it’s clear where it came from).

labs_geom_point <- ggplot2::labs(
  title = "Basic World Map (geom_point)", 
  subtitle = "map_data('world')")

We will start by creating a map using geom_point() layer, but add the coord_quickmap() function (which projects a portion of the earth, which is approximately spherical, onto a flat 2D plane):

  1. Pipe the data to the ggplot() function to initialize the plot
  2. Add the geom_point() layer, specifying the x as long and y as lat
  3. Include labels
World %>% 
  ggplot() + # initializes graph
  geom_point(aes(x = long, y = lat), show.legend = FALSE) +
  coord_quickmap() + 
  labs_geom_point

What we’ve done here is plot data-points that outline each continent (and fit the spherical location of the long and lat to a 2-D plane). But the points make the continent outlines sloppy–we should be using lines.

3.5.3 coord_quickmap() with polygons

We want to convert to point outline into lines, which we can do using geom_polygon() (read more about this here). The steps are very similar:

  1. Pipe the data to the ggplot() function to initialize the plot
  2. Add the geom_polygon() layer, specifying the x as long, y as lat, and group as group
  3. Include labels
labs_geom_polygon <- ggplot2::labs(
  title = "Basic World Map (geom_polygon)", 
  subtitle = "map_data('world')")
World %>% 
  ggplot() + 
  geom_polygon(aes(x = long, y = lat, group = group)) +
  coord_quickmap() + 
  labs_geom_polygon

That looks much better, but we should clean it up a bit by removing the x and y axes, and reducing some of the color contrast.

3.5.4 Customizing our map

  1. add fill, color and alpha arguments to lighten the color of the continents
  2. Include the ggplot2::theme_void() layer (specifically designed to remove excess chart junk and give it a ‘minimal’ look)
  3. Include labels

We will save this map as ggp_word_map

ggp_word_map <- World %>% 
  ggplot() + 
  geom_polygon(aes(x = long, y = lat, group = group), 
               # these are outside the aes() function!
               fill = "grey75", color = "white", alpha = 0.8) +
  coord_quickmap() + 
  # add theme
  ggplot2::theme_void() +
  # don't forget the labels
  labs_geom_polygon
ggp_word_map

This looks much better! Now we’re ready to add our Twitter data.

3.5.5 Map projections

The default map in the geom_polygon() is what’s referred to as the mercator projection. The Mercator projection works well for navigation because the meridians are equally spaced (the grid lines that runs north and south), but the parallels (the lines that run east/west around) are not equally spaced.

This causes a distortion in the land masses at both poles. The map above makes it look like Greenland is roughly 1/2 or 2/3 the size of Africa, when in reality Africa is 14x larger. These are well-known limitations of this projection, so there’s nothing wrong with using it (but it’s good information to know!)

3.6 Add twitter data (world)

ggplot2 has a handy function for creating a map quickly (appropriately called coord_quickmap()), and the mercator projection is the default setting. If we want to add our data in TweetsLatLng to the existing graph, we need to include these data in a new geom_polygon() layer.

3.6.1 Match variables

Recall that the existing map is using the World data (printed below)

glimpse(World)
## Rows: 99,338
## Columns: 6
## $ long      <dbl> -69.89912, -69.89571, -69.94219, -70.004…
## $ lat       <dbl> 12.45200, 12.42300, 12.43853, 12.50049, …
## $ group     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2…
## $ order     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 1…
## $ region    <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aru…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

We need to rename lng to long, (so the match the variables names in the World data), and remove the data with empty latitude and longitude. Store these new data in TweetsMap.

TweetsMap <- TweetsLatLng %>% 
  # rename to match
  dplyr::rename(long = lng) %>%
  # remove missing
  filter(!is.na(long) & !is.na(lat))
glimpse(TweetsMap)
## Rows: 197
## Columns: 12
## $ user_id        <chr> "1391401897035243527", "1485612379"…
## $ created_at     <dttm> 2021-11-08 05:37:26, 2021-11-08 05…
## $ screen_name    <chr> "BarstoolMKE1", "amandaheger613", "…
## $ text           <chr> "BREAKING: Sports media taking Mond…
## $ retweet_count  <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
## $ favorite_count <int> 0, 0, 0, 0, 2, 2, 1, 0, 0, 0, 1, 3,…
## $ country        <chr> "United States", "United States", "…
## $ location       <chr> "Milwaukee, WI", "Goodyear, AZ", "G…
## $ country_code   <chr> "US", "US", "US", "US", "US", "US",…
## $ friends_count  <int> 278, 40, 40, 3156, 1424, 3282, 2613…
## $ lat            <dbl> 43.06754, 33.41287, 33.41287, 32.81…
## $ long           <dbl> -88.02554, -112.42498, -112.42498, …

3.6.2 Add Twitter data layer

We will update the labels:

labs_rtweet_coord_quickmap <- ggplot2::labs(
  title = "  #NFL hashtags = World Map (labs_coord_quickmap())", 
  subtitle = "  rtweet data")

And add a ggplot2::geom_point() to include our tweets on the map:

ggp_word_map +
        ggplot2::geom_point(data = TweetsMap,
                        aes(x = long, y = lat)) +
        # add titles/labels
        labs_rtweet_coord_quickmap

Now we can see the tweets have been added as data points to the existing map projection! We can see most of these data are limited to the US, so we will build a US map below.

3.7 US Maps

We can use the ggplot2::map_data("usa") function to create a US map (USmap) dataset.

USmap <- ggplot2::map_data("usa") 
USmap %>% glimpse(78)
## Rows: 7,243
## Columns: 6
## $ long      <dbl> -101.4078, -101.3906, -101.3620, -101.3505, -101.3219, -10…
## $ lat       <dbl> 29.74224, 29.74224, 29.65056, 29.63911, 29.63338, 29.64484…
## $ group     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ order     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ region    <chr> "main", "main", "main", "main", "main", "main", "main", "m…
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

We can see these data are very similar to World, and have the same long, lat, and group variables.

3.7.1 US map (coord_quickmap())

We can plot USmap the same way we did above (with geom_polygon() and coord_quickmap()), after creating a new set of labels:

  1. Create labels (offset by a few spaces)
  2. Pipe the USmap data to the ggplot() function to initialize the plot
  3. Add the geom_polygon() layer, specifying the x as long, y as lat, and group as group
  4. Add labels
labs_geom_polygon_usa <- ggplot2::labs(
  title = "  US Map (geom_polygon)", 
  subtitle = "  map_data('usa')")
USmap %>% 
  ggplot2::ggplot() +
  ggplot2::geom_polygon(aes(x = long,
                            y = lat,
                            group = group)) + 
  ggplot2::coord_quickmap() + 
  labs_geom_polygon_usa

Once again we see this map of the US needs some customizing, so we include the fill, color, and alpha arguments.

3.7.2 Customizing US map

  1. add fill, color and alpha arguments to lighten the color of the continents
  2. Include the ggplot2::theme_void() layer (specifically designed to remove excess chart junk and give it a ‘minimal’ look)
  3. Include labels

Save this as ggp_us_map

ggp_us_map <- USmap %>% 
  ggplot2::ggplot() +
  ggplot2::geom_polygon(aes(x = long,
                            y = lat,
                            group = group),
                        fill = "grey70", color = "white", alpha = 0.8) + 
  ggplot2::coord_quickmap() + 
  ggplot2::theme_void() + 
  labs_geom_polygon_usa
ggp_us_map

3.8 Adding twitter data (US)

Now we have a canvas to work with–lets filter the TweetsMap data to only those tweets from the US using the country_code (first we count this variable to see what the codes are).

TweetsMap %>% 
  count(country_code, sort = TRUE)

It looks like we have 150 tweets from the US–we will store these data in UsTweets

UsTweets <- TweetsMap %>% filter(country_code == "US")
glimpse(UsTweets)
## Rows: 150
## Columns: 12
## $ user_id        <chr> "1391401897035243527", "1485612379"…
## $ created_at     <dttm> 2021-11-08 05:37:26, 2021-11-08 05…
## $ screen_name    <chr> "BarstoolMKE1", "amandaheger613", "…
## $ text           <chr> "BREAKING: Sports media taking Mond…
## $ retweet_count  <int> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,…
## $ favorite_count <int> 0, 0, 0, 0, 2, 2, 1, 1, 2, 0, 0, 90…
## $ country        <chr> "United States", "United States", "…
## $ location       <chr> "Milwaukee, WI", "Goodyear, AZ", "G…
## $ country_code   <chr> "US", "US", "US", "US", "US", "US",…
## $ friends_count  <int> 278, 40, 40, 3156, 1424, 3282, 1989…
## $ lat            <dbl> 43.06754, 33.41287, 33.41287, 32.81…
## $ long           <dbl> -88.02554, -112.42498, -112.42498, …

We will updated the labels for the US map.

3.8.1 US twitter data (geom_polygon())

  1. Create new labels
  2. Add a geom_point() layer to the existing ggp_us_map
  3. Include the data = UsTweets argument at this layer, and specify the x = long and y = lat
  4. Add labels
# new labels
labs_coord_quickmap_tweets_usa <- ggplot2::labs(
  title = "  #NFL tweets = Basic US Map (coord_quickmap)", 
  subtitle = "  map_data('usa')")

ggp_us_map +
  # twitter data layer
  ggplot2::geom_point(data = UsTweets,
                      aes(x = long, y = lat)) +
  ggplot2::theme_void() + 
  labs_coord_quickmap_tweets_usa

We can see this map output is including the tweets from Hawaii (which is skewing the map projection), so we will combine ggplot2s layers and dplyrs data manipulation functions together to remove these points without changing the data in UsTweets.

3.8.2 Locate outliers

We can use str_view_all() to take a look at the location variable and see if we can find the Hawaii location:

# search for ", HI" pattern
str_view_all(string = UsTweets$location, pattern = ", HI", match = TRUE)

We can see two tweets with location as Honolulu, HI, so we will filter() these data inside the geom_point() layer in the data argument.

  1. Create new labels
  2. Use !str_detect() to remove any observations that match the Honolulu, HI pattern
  3. Change the size of the points to 0.9 and the color of the points to "firebrick"
  4. Add theme
  5. Add labels
labs_coord_quickmap_no_hi <- ggplot2::labs(
  title = "#NFL tweets = Basic US Map (coord_quickmap)", 
  subtitle = "map_data('usa')",
  caption = "Tweets from Honolulu, HI have been removed")

ggp_us_map +
  ggplot2::geom_point(
    data = filter(UsTweets, 
                  !str_detect(location, "Honolulu, HI")),
                      aes(x = long, y = lat),
                      size = 0.9, # reduce size of points
                      color = "firebrick") +
  ggplot2::theme_void() + 
  labs_coord_quickmap_no_hi

This is looking better, but we should recall we have a few additional variables on the tweets in UsTweets. Let’s use what we’ve learned in previous lessons/exercises to view the distribution of favorite_count across the various locations in UsTweets.

3.8.3 Distribution of favorite_count

  1. Build labels
  2. Pipe the UsTweets data to filter() and remove the tweets from Hawaii
  3. Initiate the plot with ggplot() and assign favorite_count to the x aesthetic
  4. Add a geom_density() layer
  5. Add facet_wrap() and facet the plots by location
  6. Include a theme_minimal() to reduce the chart elements
  7. Add labels
labs_facet_wrap_favorite_count <- ggplot2::labs(
  title = "Favorite counts by location (US Twitter data)",
  x = "Favorite count", 
  y = "count"
)

UsTweets %>% 
  filter(!str_detect(location, "Honolulu, HI")) %>% 
  ggplot(aes(x = favorite_count)) + 
  geom_density() + 
  facet_wrap(. ~ location) + 
  theme_minimal() + 
  labs_facet_wrap_favorite_count

One of the drawbacks of the density plot is that the y axis is hard to interpret, but it’s OK in this case, because this graph is telling us all we need to know: Some of these aren’t like the others

3.8.4 Add favorite_count to US Map

If we want to add the favorite_count variable to the plot, we can do this with the size argument in geom_point(), which will make the size of the point relative to the number of favorite_count at each long and lat.

  1. Create new labels using paste0() and mean() to get the average created_at time for the tweets
  2. Add a geom_point() layer to ggp_us_map, filtering out the Hawaii locations
  3. Specify the x and y as long and lat inside the aes() function, and set size to favorite_count outside the aes() function
  4. Set the color to "firebrick" again
  5. Include a custom theme() layer and move the legend using legend.position = 'bottom'
  6. Add labels
labs_coord_quickmap_favs <- ggplot2::labs(
  title = "  #NFL Tweets", 
  subtitle = paste0("  Tweets collected around ", 
                    mean(UsTweets$created_at, na.rm = TRUE)),
  size = "Favorites")

ggp_us_map +
  ggplot2::geom_point(
    data = filter(UsTweets, 
                  !str_detect(location, "Honolulu, HI")),
      aes(x = long, 
          y = lat, 
          size = favorite_count),
          color = "firebrick") +
  theme(legend.position = 'bottom') +
  labs_coord_quickmap_favs

3.9 Summary

This has been a very short introduction to maps with ggplot2. In the next lesson, we will introduce how to make maps with leaflet and plotly, two popular mapping packages for creating interactive maps.

Be sure to check out the resources below for building maps:

  1. The maps chapter of the ggplot2 book
  2. The draw maps chapter of Data Visualization: A practical introduction
  3. This tutorial from r-spatial for extending ggplot2 with maps