Hooked on Feelings

Read the Submission Directions
Getting Text
Tasks

Read the Submission Directions

Please submit a PDF using an Rmarkdown file for this task¹ both to eCampus and Slack.

You can use a chapter book of your choice or get one using the process below a go.

Getting Text

In addition to the packages listed in the example, you can use the library gutenbergr which gives you the ability to download and process public domain works from the Project Gutenberg collection. Follow the mini walkthrough below to see how to get a text of your choice².

library(tidyverse)
library(gutenbergr)
library(tidytext)
library(flextable)

## 
## Attaching package: 'flextable'

## The following object is masked from 'package:purrr':
## 
##     compose

## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote

gutenberg_metadata

## # A tibble: 51,997 x 8
##    gutenberg_id title  author  gutenberg_autho… language gutenberg_books… rights
##           <int> <chr>  <chr>              <int> <chr>    <chr>            <chr> 
##  1            0  <NA>  <NA>                  NA en       <NA>             Publi…
##  2            1 "The … Jeffer…             1638 en       United States L… Publi…
##  3            2 "The … United…                1 en       American Revolu… Publi…
##  4            3 "John… Kenned…             1666 en       <NA>             Publi…
##  5            4 "Linc… Lincol…                3 en       US Civil War     Publi…
##  6            5 "The … United…                1 en       American Revolu… Publi…
##  7            6 "Give… Henry,…                4 en       American Revolu… Publi…
##  8            7 "The … <NA>                  NA en       <NA>             Publi…
##  9            8 "Abra… Lincol…                3 en       US Civil War     Publi…
## 10            9 "Abra… Lincol…                3 en       US Civil War     Publi…
## # … with 51,987 more rows, and 1 more variable: has_text <lgl>

Let’s say we want The War of the Worlds. We can get it by running the following

war <- gutenberg_works() %>%
            filter(title == "The War of the Worlds")

war

## # A tibble: 1 x 8
##   gutenberg_id title  author  gutenberg_author… language gutenberg_books… rights
##          <int> <chr>  <chr>               <int> <chr>    <chr>            <chr> 
## 1           36 The W… Wells,…                30 en       Movie Books/Sci… Publi…
## # … with 1 more variable: has_text <lgl>

Well that’s a bit difficult to read. Instead of using the DT package, let’s give flextable a go.

war_flex <- flextable(war) %>%
                  autofit() %>%
                  theme_booktabs()

war_flex

gutenberg_id	title	author	gutenberg_author_id	language	gutenberg_bookshelf	rights	has_text
36	The War of the Worlds	Wells, H. G. (Herbert George)	30	en	Movie Books/Science Fiction	Public domain in the USA.	TRUE

That helps! If you’re interested, more customization options can be found on the package site.

Anyway we actually need the number, or index given by the ‘gutenberg_id’ column because we haven’t done anything barring grabbing the catalog. In our case that index number is 36. We can use that identification along with the gutenberg_download() function to get the entire text.

war_get <- gutenberg_works() %>%
                filter(title == "The War of the Worlds")

war_get

## # A tibble: 1 x 8
##   gutenberg_id title  author  gutenberg_author… language gutenberg_books… rights
##          <int> <chr>  <chr>               <int> <chr>    <chr>            <chr> 
## 1           36 The W… Wells,…                30 en       Movie Books/Sci… Publi…
## # … with 1 more variable: has_text <lgl>

war_text <- gutenberg_download(36)

## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest

## Using mirror http://aleph.gutenberg.org

war_flex_text <- war_text %>%
                      head(n = 15) %>%
                      flextable() %>%
                      autofit() %>%
                      theme_booktabs()

war_flex_text

gutenberg_id	text
36	The War of the Worlds
36
36	by H. G. Wells [1898]
36
36
36	But who shall dwell in these worlds if they be
36	inhabited? . . . Are we or they Lords of the
36	World? . . . And how are all things made for man?--
36	KEPLER (quoted in The Anatomy of Melancholy)
36
36
36
36	BOOK ONE
36
36	THE COMING OF THE MARTIANS

But recall what we need is at least the title but here we’ll also get chapter information.

war_text_tc <- gutenberg_download(36,
                                    meta_fields = "title")

war_flex_text_tc <- war_text_tc %>%
                          head(n = 15) %>%
                          flextable() %>%
                          autofit() %>%
                          theme_booktabs()

war_flex_text_tc

gutenberg_id	text	title
36	The War of the Worlds	The War of the Worlds
36		The War of the Worlds
36	by H. G. Wells [1898]	The War of the Worlds
36		The War of the Worlds
36		The War of the Worlds
36	But who shall dwell in these worlds if they be	The War of the Worlds
36	inhabited? . . . Are we or they Lords of the	The War of the Worlds
36	World? . . . And how are all things made for man?--	The War of the Worlds
36	KEPLER (quoted in The Anatomy of Melancholy)	The War of the Worlds
36		The War of the Worlds
36		The War of the Worlds
36		The War of the Worlds
36	BOOK ONE	The War of the Worlds
36		The War of the Worlds
36	THE COMING OF THE MARTIANS	The War of the Worlds

Now divide into documents, each representing one chapter. Please Note that this assumes that the text column includes the term chapter. So you can ether amend the mutate below or look at the data set to make sure it includes the term to indicate the different chapters.

war_chapters <- war_text_tc %>%
  group_by(title) %>%
  mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
  ungroup() %>%
  filter(chapter > 0) %>%
  unite(document, title, chapter)

war_chapters %>%
  head(n = 15) %>%
  flextable() %>%
  autofit() %>%
  theme_booktabs()

gutenberg_id	text	document
36	CHAPTER ONE	The War of the Worlds_1
36		The War of the Worlds_1
36	THE EVE OF THE WAR	The War of the Worlds_1
36		The War of the Worlds_1
36		The War of the Worlds_1
36	No one would have believed in the last years of the nineteenth	The War of the Worlds_1
36	century that this world was being watched keenly and closely by	The War of the Worlds_1
36	intelligences greater than man's and yet as mortal as his own; that as	The War of the Worlds_1
36	men busied themselves about their various concerns they were	The War of the Worlds_1
36	scrutinised and studied, perhaps almost as narrowly as a man with a	The War of the Worlds_1
36	microscope might scrutinise the transient creatures that swarm and	The War of the Worlds_1
36	multiply in a drop of water. With infinite complacency men went to	The War of the Worlds_1
36	and fro over this globe about their little affairs, serene in their	The War of the Worlds_1
36	assurance of their empire over matter. It is possible that the	The War of the Worlds_1
36	infusoria under the microscope do the same. No one gave a thought to	The War of the Worlds_1

Then split the into words

war_chapters_word <- war_chapters %>%
                      unnest_tokens(word, text)

and assess document-word counts

war_word_counts <- war_chapters_word %>%
                    anti_join(stop_words) %>%
                    count(document, word, sort = TRUE) %>%
                    ungroup()

## Joining, by = "word"

war_word_counts

## # A tibble: 15,367 x 3
##    document                 word         n
##    <chr>                    <chr>    <int>
##  1 The War of the Worlds_16 brother     50
##  2 The War of the Worlds_25 ulla        28
##  3 The War of the Worlds_14 brother     26
##  4 The War of the Worlds_16 road        25
##  5 The War of the Worlds_14 people      24
##  6 The War of the Worlds_16 people      24
##  7 The War of the Worlds_12 people      20
##  8 The War of the Worlds_16 lane        20
##  9 The War of the Worlds_12 water       19
## 10 The War of the Worlds_19 martians    19
## # … with 15,357 more rows

or better yet

war_word_counts %>%
  head(n = 15) %>%
  flextable() %>%
  autofit() %>%
  theme_booktabs()

document	word	n
The War of the Worlds_16	brother	50
The War of the Worlds_25	ulla	28
The War of the Worlds_14	brother	26
The War of the Worlds_16	road	25
The War of the Worlds_14	people	24
The War of the Worlds_16	people	24
The War of the Worlds_12	people	20
The War of the Worlds_16	lane	20
The War of the Worlds_12	water	19
The War of the Worlds_19	martians	19
The War of the Worlds_15	guns	17
The War of the Worlds_20	machine	17
The War of the Worlds_14	martians	16
The War of the Worlds_14	street	16
The War of the Worlds_24	night	16

Now that is a form you should be able to work with!

Tasks

Perform the following

a chapterwise or bookwise sentient analysis of your text using AFINN, Bing, and NRC with a visual of each.
BONUS³: Follow the approach given in Text Mining with R to construct a topic model of the work with a visual of the top 10 topics by prevalence.

:::

If you use an external file for your data, please submit that as well.↩︎
As long as it exists in the public domain.↩︎
Will replace the 2016 general election map activity↩︎

Last updated on September 23, 2024

Edit this page