Hooked on Feelings
Read the Submission Directions
Please submit a PDF using an Rmarkdown file for this task1 both to eCampus and Slack.
You can use a chapter book of your choice or get one using the process below a go.
Getting Text
In addition to the packages listed in the example, you can use the library gutenbergr
which gives you the ability to download and process public domain works from the Project Gutenberg collection. Follow the mini walkthrough below to see how to get a text of your choice2.
library(tidyverse)
library(gutenbergr)
library(tidytext)
library(flextable)
##
## Attaching package: 'flextable'
## The following object is masked from 'package:purrr':
##
## compose
## The following objects are masked from 'package:kableExtra':
##
## as_image, footnote
gutenberg_metadata
## # A tibble: 51,997 x 8
## gutenberg_id title author gutenberg_autho… language gutenberg_books… rights
## <int> <chr> <chr> <int> <chr> <chr> <chr>
## 1 0 <NA> <NA> NA en <NA> Publi…
## 2 1 "The … Jeffer… 1638 en United States L… Publi…
## 3 2 "The … United… 1 en American Revolu… Publi…
## 4 3 "John… Kenned… 1666 en <NA> Publi…
## 5 4 "Linc… Lincol… 3 en US Civil War Publi…
## 6 5 "The … United… 1 en American Revolu… Publi…
## 7 6 "Give… Henry,… 4 en American Revolu… Publi…
## 8 7 "The … <NA> NA en <NA> Publi…
## 9 8 "Abra… Lincol… 3 en US Civil War Publi…
## 10 9 "Abra… Lincol… 3 en US Civil War Publi…
## # … with 51,987 more rows, and 1 more variable: has_text <lgl>
Let’s say we want The War of the Worlds. We can get it by running the following
war <- gutenberg_works() %>%
filter(title == "The War of the Worlds")
war
## # A tibble: 1 x 8
## gutenberg_id title author gutenberg_author… language gutenberg_books… rights
## <int> <chr> <chr> <int> <chr> <chr> <chr>
## 1 36 The W… Wells,… 30 en Movie Books/Sci… Publi…
## # … with 1 more variable: has_text <lgl>
Well that’s a bit difficult to read. Instead of using the DT
package, let’s give flextable
a go.
war_flex <- flextable(war) %>%
autofit() %>%
theme_booktabs()
war_flex
gutenberg_id | title | author | gutenberg_author_id | language | gutenberg_bookshelf | rights | has_text |
36 | The War of the Worlds | Wells, H. G. (Herbert George) | 30 | en | Movie Books/Science Fiction | Public domain in the USA. | TRUE |
That helps! If you’re interested, more customization options can be found on the package site.
Anyway we actually need the number, or index given by the ‘gutenberg_id’ column because we haven’t done anything barring grabbing the catalog. In our case that index number is 36
. We can use that identification along with the gutenberg_download()
function to get the entire text.
war_get <- gutenberg_works() %>%
filter(title == "The War of the Worlds")
war_get
## # A tibble: 1 x 8
## gutenberg_id title author gutenberg_author… language gutenberg_books… rights
## <int> <chr> <chr> <int> <chr> <chr> <chr>
## 1 36 The W… Wells,… 30 en Movie Books/Sci… Publi…
## # … with 1 more variable: has_text <lgl>
war_text <- gutenberg_download(36)
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
war_flex_text <- war_text %>%
head(n = 15) %>%
flextable() %>%
autofit() %>%
theme_booktabs()
war_flex_text
gutenberg_id | text |
36 | The War of the Worlds |
36 | |
36 | by H. G. Wells [1898] |
36 | |
36 | |
36 | But who shall dwell in these worlds if they be |
36 | inhabited? . . . Are we or they Lords of the |
36 | World? . . . And how are all things made for man?-- |
36 | KEPLER (quoted in The Anatomy of Melancholy) |
36 | |
36 | |
36 | |
36 | BOOK ONE |
36 | |
36 | THE COMING OF THE MARTIANS |
But recall what we need is at least the title but here we’ll also get chapter information.
war_text_tc <- gutenberg_download(36,
meta_fields = "title")
war_flex_text_tc <- war_text_tc %>%
head(n = 15) %>%
flextable() %>%
autofit() %>%
theme_booktabs()
war_flex_text_tc
gutenberg_id | text | title |
36 | The War of the Worlds | The War of the Worlds |
36 | The War of the Worlds | |
36 | by H. G. Wells [1898] | The War of the Worlds |
36 | The War of the Worlds | |
36 | The War of the Worlds | |
36 | But who shall dwell in these worlds if they be | The War of the Worlds |
36 | inhabited? . . . Are we or they Lords of the | The War of the Worlds |
36 | World? . . . And how are all things made for man?-- | The War of the Worlds |
36 | KEPLER (quoted in The Anatomy of Melancholy) | The War of the Worlds |
36 | The War of the Worlds | |
36 | The War of the Worlds | |
36 | The War of the Worlds | |
36 | BOOK ONE | The War of the Worlds |
36 | The War of the Worlds | |
36 | THE COMING OF THE MARTIANS | The War of the Worlds |
Now divide into documents, each representing one chapter. Please Note that this assumes that the text column includes the term chapter. So you can ether amend the mutate
below or look at the data set to make sure it includes the term to indicate the different chapters.
war_chapters <- war_text_tc %>%
group_by(title) %>%
mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
ungroup() %>%
filter(chapter > 0) %>%
unite(document, title, chapter)
war_chapters %>%
head(n = 15) %>%
flextable() %>%
autofit() %>%
theme_booktabs()
gutenberg_id | text | document |
36 | CHAPTER ONE | The War of the Worlds_1 |
36 | The War of the Worlds_1 | |
36 | THE EVE OF THE WAR | The War of the Worlds_1 |
36 | The War of the Worlds_1 | |
36 | The War of the Worlds_1 | |
36 | No one would have believed in the last years of the nineteenth | The War of the Worlds_1 |
36 | century that this world was being watched keenly and closely by | The War of the Worlds_1 |
36 | intelligences greater than man's and yet as mortal as his own; that as | The War of the Worlds_1 |
36 | men busied themselves about their various concerns they were | The War of the Worlds_1 |
36 | scrutinised and studied, perhaps almost as narrowly as a man with a | The War of the Worlds_1 |
36 | microscope might scrutinise the transient creatures that swarm and | The War of the Worlds_1 |
36 | multiply in a drop of water. With infinite complacency men went to | The War of the Worlds_1 |
36 | and fro over this globe about their little affairs, serene in their | The War of the Worlds_1 |
36 | assurance of their empire over matter. It is possible that the | The War of the Worlds_1 |
36 | infusoria under the microscope do the same. No one gave a thought to | The War of the Worlds_1 |
Then split the into words
war_chapters_word <- war_chapters %>%
unnest_tokens(word, text)
and assess document-word counts
war_word_counts <- war_chapters_word %>%
anti_join(stop_words) %>%
count(document, word, sort = TRUE) %>%
ungroup()
## Joining, by = "word"
war_word_counts
## # A tibble: 15,367 x 3
## document word n
## <chr> <chr> <int>
## 1 The War of the Worlds_16 brother 50
## 2 The War of the Worlds_25 ulla 28
## 3 The War of the Worlds_14 brother 26
## 4 The War of the Worlds_16 road 25
## 5 The War of the Worlds_14 people 24
## 6 The War of the Worlds_16 people 24
## 7 The War of the Worlds_12 people 20
## 8 The War of the Worlds_16 lane 20
## 9 The War of the Worlds_12 water 19
## 10 The War of the Worlds_19 martians 19
## # … with 15,357 more rows
or better yet
war_word_counts %>%
head(n = 15) %>%
flextable() %>%
autofit() %>%
theme_booktabs()
document | word | n |
The War of the Worlds_16 | brother | 50 |
The War of the Worlds_25 | ulla | 28 |
The War of the Worlds_14 | brother | 26 |
The War of the Worlds_16 | road | 25 |
The War of the Worlds_14 | people | 24 |
The War of the Worlds_16 | people | 24 |
The War of the Worlds_12 | people | 20 |
The War of the Worlds_16 | lane | 20 |
The War of the Worlds_12 | water | 19 |
The War of the Worlds_19 | martians | 19 |
The War of the Worlds_15 | guns | 17 |
The War of the Worlds_20 | machine | 17 |
The War of the Worlds_14 | martians | 16 |
The War of the Worlds_14 | street | 16 |
The War of the Worlds_24 | night | 16 |
Now that is a form you should be able to work with!
Tasks
Perform the following
a chapterwise or bookwise sentient analysis of your text using AFINN, Bing, and NRC with a visual of each.
BONUS3: Follow the approach given in Text Mining with R to construct a topic model of the work with a visual of the top 10 topics by prevalence.
:::