Package 'Twitmo' reference manual

Title:	Twitter Topic Modeling and Visualization for R
Description:	Tailored for topic modeling with tweets and fit for visualization tasks in R. Collect, pre-process and analyze the contents of tweets using LDA and structural topic models (STM). Comes with visualizing capabilities like tweet and hashtag maps and built-in support for 'LDAvis'.
Authors:	Andreas Buchmueller [aut, cre] (github.com/abuchmueller), Gillian Kant [aut, ths] , Christoph Weisser [aut, ths] , Benjamin Saefken [aut, ths] , Thomas Kneib [rev, ths, dgs] , Krisztina Kis-Katos [rev, ths]
Maintainer:	Andreas Buchmueller <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.5
Built:	2025-03-05 06:35:09 UTC
Source:	https://github.com/abuchmueller/twitmo

Cluster tweets on an interactive map

Description

Plot into clusters on an interactive map

Usage

cluster_tweets(data, ...)
cluster_tweets(data, ...)

Arguments

`data`	A data frame of tweets parsed by load_tweets or returned by pool_tweets.
`...`	Extra arguments passed to markerClusterOptions

Details

This function can be used to create interactive maps on OpenStreetView.

Value

Interactive leaflet map

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

pool <- pool_tweets(mytweets)
cluster_tweets(mytweets)

# OR
cluster_tweets(pool$data)

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

pool <- pool_tweets(mytweets)
cluster_tweets(mytweets)

# OR
cluster_tweets(pool$data)

## End(Not run)

Filter tweets

Description

Filter tweets by keywords.

Usage

filter_tweets(data, keywords, include = TRUE)
filter_tweets(data, keywords, include = TRUE)

Arguments

`data`	Data frame containing tweets and hashtags. Works with any data frame, as long as there is a "text" column of type character string and a "hashtags" column with comma separated character vectors. Can be obtained either by using `load_tweets` on a json object returned by Twitter's API v1.1 or by using `stream_in` on any json file, as long as it has a "text" and "hashtags" field. If you are unsure about the requirements you may load the sample piece of data contained in the package by following the example in the the example section of this help page.
`keywords`	Character string of keywords for black- or whitelisting provided via a comma separated character string.
`include`	Logical. Indicate where to perform exclusive or inclusive filtering. Inclusive filtering is akin to whitelisting keywords. Exclusive filtering is blacklisting certain keywords.

Details

Use this function if you want your Tweets to contain certain keywords. This can be used for iterative filtering to create more coherent topic models. Keyword filtering is always case insensitive (lowercase).

Value

Data frame of Tweets containing specified keywords

Examples

## Not run: 

library(Twitmo)


# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Exclude Tweets that mention "football" and/or "mood"
keyword_dict <- "football,mood"
mytweets_reduced <- filter_tweets(mytweets, keywords = keyword_dict, include = FALSE)

## End(Not run)

## Not run: 

library(Twitmo)


# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Exclude Tweets that mention "football" and/or "mood"
keyword_dict <- "football,mood"
mytweets_reduced <- filter_tweets(mytweets, keywords = keyword_dict, include = FALSE)

## End(Not run)

Find best LDA model

Description

Find the optimal hyperparameter k for your LDA model

Usage

find_lda(pooled_dfm, search_space = seq(1, 10, 2), method = "Gibbs", ...)
find_lda(pooled_dfm, search_space = seq(1, 10, 2), method = "Gibbs", ...)

Arguments

`pooled_dfm`	object of class dfm (see dfm) containing (pooled) tweets
`search_space`	Vector with number of topics to compare different models.
`method`	The method to be used for fitting. Currently method = "VEM" or method = "Gibbs" are supported.
`...`	Additional arguments passed to FindTopicsNumber.

Value

Plot with different metrics compared.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# use the ldatuner to compare different K
find_lda(pooled_dfm, search_space = seq(1, 10, 1), method = "Gibbs")

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# use the ldatuner to compare different K
find_lda(pooled_dfm, search_space = seq(1, 10, 1), method = "Gibbs")

## End(Not run)

Find best STM/CTM

Description

Gridsearch for optimal K for your STM/CTM

Usage

find_stm(data, search_space = seq(4, 20, by = 2), ...)
find_stm(data, search_space = seq(4, 20, by = 2), ...)

Arguments

`data`	Either a pooled dfm object returned by pool_tweets or a named list of pre-processed tweets for stm modeling returned by `fit_stm`.
`search_space`	Vector with number of topics to compare different models.
`...`	Additional parameters passed to searchK

Details

Wrapper function around searchK for pooled dfm objects returned by pool_tweets and prepped stm documents returned by fit_stm.

Value

Plot with different metrics compared.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# compare different K for CTM
find_stm(pooled_dfm, search_space = seq(1, 10, 1))

# OR

# compare different K for STM
prepped_stm <- stm_model$prep
find_stm(prepped_stm, search_space = seq(4, 16, by = 2))

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# compare different K for CTM
find_stm(pooled_dfm, search_space = seq(1, 10, 1))

# OR

# compare different K for STM
prepped_stm <- stm_model$prep
find_stm(prepped_stm, search_space = seq(4, 16, by = 2))

## End(Not run)

Fit CTM (Correlated topic model)

Description

Estimate a CTM topic model.

Usage

fit_ctm(pooled_dfm, n_topics = 2L, ...)
fit_ctm(pooled_dfm, n_topics = 2L, ...)

Arguments

`pooled_dfm`	Object of class dfm (see dfm) containing (pooled) Tweets.
`n_topics`	Integer with number of topics
`...`	Additional arguments passed to stm.

Value

Object of class stm

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your CTM with 7 topics
ctm_model <- fit_ctm(pooled_dfm, n_topics = 7)

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your CTM with 7 topics
ctm_model <- fit_ctm(pooled_dfm, n_topics = 7)

## End(Not run)

Fit LDA Topic Model

Description

Estimate a LDA topic model using VEM or Gibbs Sampling.

Usage

fit_lda(pooled_dfm, n_topics, ...)
fit_lda(pooled_dfm, n_topics, ...)

Arguments

`pooled_dfm`	Object of class dfm (see dfm) containing (pooled) tweets.
`n_topics`	Integer with number of topics.
`...`	Additional arguments passed to LDA.

Value

Object of class LDA.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

## End(Not run)

Fit STM (Structural topic model)

Description

Estimate a structural topic model

Usage

fit_stm(
  data,
  n_topics = 2L,
  xcov,
  remove_punct = TRUE,
  stem = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  stopwords = "en",
  ...
)
fit_stm(
  data,
  n_topics = 2L,
  xcov,
  remove_punct = TRUE,
  stem = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  stopwords = "en",
  ...
)

Arguments

`data`	Data frame containing tweets and hashtags. Works with any data frame, as long as there is a "text" column of type character string and a "hashtags" column with comma separated character vectors. Can be obtained either by using `load_tweets` on a json object returned by Twitter's API v1.1 or by using `stream_in` on any json file, as long as it has a "text" and "hashtags" field. If you are unsure about the requirements you may load the sample piece of data contained in the package by following the example in the the example section of this help page.
`n_topics`	Integer with number of topics.
`xcov`	Either a \[stats]formula with an empty left-hand side specifying external covariates (meta data) to use.e.g. `~favourites_count + retweet_count` or a character vector (`c("favourites_count", "retweet_count")`) or comma separated character string (`"favourites_count,retweet_count"`) with column names implying which metadata to use as external covariates.
`remove_punct`	Logical. Indicates whether punctuation (includes Twitter hashtags and usernames) should be removed. Defaults to TRUE.
`stem`	Logical. If `TRUE` turn on word stemming for terms.
`remove_url`	Logical. If `TRUE` find and eliminate URLs beginning with http(s).
`remove_emojis`	Logical. If `TRUE` all emojis will be removed from tweets.
`stopwords`	a character vector, list of character vectors, dictionary or collocations object. See pattern for details. Defaults to stopwords("english").
`...`	Additional arguments passed to stm.

Details

Use this to function estimate a STM from a data frame of parsed Tweets. Works with unpooled Tweets only. Pre-processing and fitting is done in one run.

Value

Object of class stm. Additionally, pre-processed documents are appended into a named list called "prep".

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# fit STM with tweets
stm_model <- fit_stm(mytweets,
  n_topics = 7,
  xcov = ~ retweet_count + followers_count + reply_count +
    quote_count + favorite_count,
  remove_punct = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  stem = TRUE,
  stopwords = "en"
)

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# fit STM with tweets
stm_model <- fit_stm(mytweets,
  n_topics = 7,
  xcov = ~ retweet_count + followers_count + reply_count +
    quote_count + favorite_count,
  remove_punct = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  stem = TRUE,
  stopwords = "en"
)

## End(Not run)

Sample tweets by streaming or searching

Description

Collect Tweets via streaming or searching.

Usage

get_tweets(
  method = "stream",
  location = c(-180, -90, 180, 90),
  timeout = Inf,
  keywords = "",
  n_max = 100L,
  file_name = NULL,
  ...
)
get_tweets(
  method = "stream",
  location = c(-180, -90, 180, 90),
  timeout = Inf,
  keywords = "",
  n_max = 100L,
  file_name = NULL,
  ...
)

Arguments

`method`	Character string. Supported methods are streaming and searching. The default method is streaming `method = 'stream'`. This is the recommended method as it allows to collect larger volumes of data over time. Use `method = 'search'` if you want to collect Tweets from the past 9 days.
`location`	Character string of location to sample from. Can be a three letter country code i.e. "USA" or a city name like "berlin". Use `Twitmo:::bbox_country` for all supported country locations or `rtweet:::citycoords` for a list of supported cities. Alternatively, use a vector of doubles with four latitude/longitude bounding box points provided via a vector of length 4, in the following format c(sw.long, sw.lat, ne.long, ne.lat) e.g., c(-125, 26, -65, 49).
`timeout`	Integer. Limit streaming time in seconds. By default will stream indefinitely until user interrupts by pressing [ctrl + c].
`keywords`	Character string of keywords provided via a comma separated character string. Only for searching Tweets.If you want to stream Tweets for a certain location AND filter by keywords use the location parameter and after sampling use the filter_tweets function. If you are using the search method instead of streaming keywords WILL work together with a location but will yield only a very limited number of Tweets.
`n_max`	Integer value. Only applies to the `search` method. Limit how many Tweets are collected.
`file_name`	Character string of desired file path and file name where Tweets will be saved. If not specified, will write to stream_tweets.json in the current working directory.
`...`	Additional arguments passed to stream_tweets or search_tweets.

Details

A function that calls on stream_tweets and search_tweets (depending on the specified method) and is specifically tailored for sampling geo-tagged data. This function provides supports additional arguments like location for convenient sampling of geo-tagged Tweets. Tweets can be searched up to 9 days into the past.

Value

Either a json file in the specified directory.

References

https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets https://developer.twitter.com/en/docs/twitter-api/v1/tweets/sample-realtime/api-reference/get-statuses-sample

Examples

## Not run: 

# live stream tweets from Germany for 60 seconds and save to current working directory
get_tweets(
  method = "stream",
  location = "DEU",
  timeout = 60,
  file_name = "german_tweets.json"
)

# OR
# live stream tweets from berlin for an hour
get_tweets(
  method = "stream",
  location = "berlin",
  timeout = 3600,
  file_name = "berlin_tweets.json"
)

# OR
# use your own bounding box coordinates to strean tweets indefinitely (interrupt to stop)
get_tweets(
  method = "stream",
  location = c(-125, 26, -65, 49),
  timeout = Inf
)

## End(Not run)

## Not run: 

# live stream tweets from Germany for 60 seconds and save to current working directory
get_tweets(
  method = "stream",
  location = "DEU",
  timeout = 60,
  file_name = "german_tweets.json"
)

# OR
# live stream tweets from berlin for an hour
get_tweets(
  method = "stream",
  location = "berlin",
  timeout = 3600,
  file_name = "berlin_tweets.json"
)

# OR
# use your own bounding box coordinates to strean tweets indefinitely (interrupt to stop)
get_tweets(
  method = "stream",
  location = c(-125, 26, -65, 49),
  timeout = Inf
)

## End(Not run)

View distribution of fitted LDA Models

Description

View the distribution of your fitted LDA model.

Usage

lda_distribution(lda_model, param = "gamma", tidy = FALSE)
lda_distribution(lda_model, param = "gamma", tidy = FALSE)

Arguments

`lda_model`	Object of class LDA).
`param`	String. Specify either "beta" to return the term distribution over topics (term per document) or "gamma" for the document distribution over. topics (i.e. hashtag pool per topic probability).
`tidy`	Logical. Specify `TRUE` for return distribution in tidy format (tbl).

Value

Data frame or tbl of Term (beta) or document (gamma) distribution over topics.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

# Choose either "beta" to return the term distribution
# over topics (term per document) or "gamma" for the document distribution over
# topics (hashtag pool per topic probability)
lda_distribution(model, param = "gamma")

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

# Choose either "beta" to return the term distribution
# over topics (term per document) or "gamma" for the document distribution over
# topics (hashtag pool per topic probability)
lda_distribution(model, param = "gamma")

## End(Not run)

View Documents (hashtags) heavily associated with topics

Description

Convenience Function to extract the most likely topics for each hashtag.

Usage

lda_hashtags(lda_model)
lda_hashtags(lda_model)

Arguments

lda_model

Fitted LDA Model. Object of class LDA).

Value

Data frame with most likely topic for each hashtag.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

lda_hashtags(model)

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

lda_hashtags(model)

## End(Not run)

View Terms heavily associated with each topic

Description

Convenience Function to extract the most likely terms for each topic.

Usage

lda_terms(lda_model, n_terms = 10)
lda_terms(lda_model, n_terms = 10)

Arguments

`lda_model`	Fitted LDA Model. Object of class LDA).
`n_terms`	Integer number of terms to return.

Value

Data frame with top n terms for each topic.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")


## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")


## End(Not run)

Converts Twitter stream data (JSON file) into parsed data frame

Description

Parse JSON files of collected Tweets

Usage

load_tweets(file_name)
load_tweets(file_name)

Arguments

file_name

Character string. Name of JSON file with data collected by stream_tweets or get_tweets().

Details

This function replaces parse_stream which has been deprecated in rtweet 0.7 but is included here to ensure backwards compatibility for data streamed with older versions of rtweet. Alternatively stream_in in conjunction with tweets_with_users and lat_lng can be used if data has been collected with rtweet 0.7 or newer.

Value

A data frame of tweets data with additional meta data

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")
mytweets <- load_tweets(raw_path)

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")
mytweets <- load_tweets(raw_path)

## End(Not run)

Plot tweets containing certain hashtag

Description

Plot the locations of certain hashtag on a static map with base plot.

Usage

plot_hashtag(
  data,
  region = ".",
  alpha = 0.01,
  hashtag = "",
  ignore_case = TRUE,
  ...
)
plot_hashtag(
  data,
  region = ".",
  alpha = 0.01,
  hashtag = "",
  ignore_case = TRUE,
  ...
)

Arguments

`data`	A data frame of tweets parsed by load_tweets or returned by pool_tweets.
`region`	Character vector specifying region. Returns a world map by default. For higher resolutions specify a region.
`alpha`	A double between 0 and 1 specifying the opacity of plotted points. See iso3166 for country codes.
`hashtag`	Character vector of the hashtag you want to plot.
`ignore_case`	Logical, if TRUE will ignore case of hashtag.
`...`	Extra arguments passed to polygon or lines.

Details

This function can be used to generate high resolution spatial plots of hashtags Works with data frames of tweets returned by pool_tweets as well as data frames read in by load_tweets and then augmented by lat/lng coordinates with lat_lng. For larger view resize the plot window then call plot_tweets again.

Value

Maps where each dot represents a tweet.

Examples

## Not run: 
library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Plot tweets on mainland USA region
plot_hashtag(mytweets,
  region = "USA(?!:Alaska|:Hawaii)",
  hashtag = "breakfast",
  ignore_case = TRUE,
  alpha = 1
)

# Add title
title("My hashtags on a map")

## End(Not run)

## Not run: 
library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Plot tweets on mainland USA region
plot_hashtag(mytweets,
  region = "USA(?!:Alaska|:Hawaii)",
  hashtag = "breakfast",
  ignore_case = TRUE,
  alpha = 1
)

# Add title
title("My hashtags on a map")

## End(Not run)

Plot tweets on a static map

Description

Plot tweets on a static map with base plot.

Usage

plot_tweets(data, region = ".", alpha = 0.01, ...)
plot_tweets(data, region = ".", alpha = 0.01, ...)

Arguments

`data`	A data frame of tweets parsed by load_tweets or returned by pool_tweets.
`region`	Character vector specifying region. Returns a world map by default. For higher resolutions specify a region.
`alpha`	A double between 0 and 1 specifying the opacity of plotted points. See iso3166 for country codes.
`...`	Extra arguments passed to polygon or lines.

Details

This function can be used to generate high resolution spatial plots of tweets. Works with data frames of tweets returned by pool_tweets as well as data frames read in by load_tweets and then augmented by lat/lng coordinates with lat_lng. For larger view resize the plot window then call plot_tweets again.

Value

Maps where each dot represents a tweet.

Examples

## Not run: 

library(Twitmo)

# Plot tweets on mainland USA
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

plot_tweets(mytweets, region = "USA(?!:Alaska|:Hawaii)", alpha = 1)
# Add title
title("My tweets on a map")

## End(Not run)

## Not run: 

library(Twitmo)

# Plot tweets on mainland USA
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

plot_tweets(mytweets, region = "USA(?!:Alaska|:Hawaii)", alpha = 1)
# Add title
title("My tweets on a map")

## End(Not run)

Prepare Tweets for topic modeling by pooling

Description

This function pools a data frame of parsed tweets into document pools.

Usage

pool_tweets(
  data,
  remove_numbers = TRUE,
  remove_punct = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  remove_users = TRUE,
  remove_hashtags = TRUE,
  cosine_threshold = 0.9,
  stopwords = "en",
  n_grams = 1L
)
pool_tweets(
  data,
  remove_numbers = TRUE,
  remove_punct = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE,
  remove_emojis = TRUE,
  remove_users = TRUE,
  remove_hashtags = TRUE,
  cosine_threshold = 0.9,
  stopwords = "en",
  n_grams = 1L
)

Arguments

`data`	Data frame containing tweets and hashtags. Works with any data frame, as long as there is a "text" column of type character string and a "hashtags" column with comma separated character vectors. Can be obtained either by using `load_tweets` on a json object returned by Twitter's API v1.1 or by using `stream_in` on any json file, as long as it has a "text" and "hashtags" field. If you are unsure about the requirements you may load the sample piece of data contained in the package by following the example in the the example section of this help page.
`remove_numbers`	Logical. If `TRUE` remove tokens that consist only of numbers, but not words that start with digits, e.g. 2day. See tokens.
`remove_punct`	Logical. If `TRUE` remove all characters in the Unicode "Punctuation" [P] class, with exceptions for those used as prefixes for valid social media tags if `preserve_tags = TRUE`. See tokens
`remove_symbols`	Logical. If `TRUE` remove all characters in the Unicode "Symbol" [S] class.
`remove_url`	Logical. If `TRUE` find and eliminate URLs beginning with http(s).
`remove_emojis`	Logical. If `TRUE` all emojis will be removed from tweets.
`remove_users`	Logical. If `TRUE` will remove all mentions of user names from documents.
`remove_hashtags`	Logical. If `TRUE` will remove hashtags (not only the symbol but the hashtagged word itself) from documents.
`cosine_threshold`	Double. Value between 0 and 1 specifying the cosine similarity threshold to be used for document pooling. Tweets without a hashtag will be assigned to document (hashtag) pools based upon this metric. Low thresholds will reduce topic coherence by including a large number of tweets without a hashtag into the document pools. Higher thresholds will lead to more coherent topics but will reduce document sizes.
`stopwords`	a character vector, list of character vectors, dictionary or collocations object. See pattern for details. Defaults to stopwords("english").
`n_grams`	Integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a n in the n-gram(s) that are produced. See tokens_ngrams

Details

Pools tweets by hashtags using cosine similarity to create longer pseudo-documents for better LDA estimation and creates n-gram tokens. The method applies an implementation of the pooling algorithm from Mehrotra et al. 2013.

Value

List with corpus object and dfm object of pooled tweets.

References

Mehrotra, Rishabh & Sanner, Scott & Buntine, Wray & Xie, Lexing. (2013). Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. 889-892. 10.1145/2484028.2484166.

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

pool <- pool_tweets(
  data = mytweets,
  remove_numbers = TRUE,
  remove_punct = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE,
  remove_users = TRUE,
  remove_hashtags = TRUE,
  remove_emojis = TRUE,
  cosine_threshold = 0.9,
  stopwords = "en",
  n_grams = 1
)

## End(Not run)


## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

pool <- pool_tweets(
  data = mytweets,
  remove_numbers = TRUE,
  remove_punct = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE,
  remove_users = TRUE,
  remove_hashtags = TRUE,
  remove_emojis = TRUE,
  cosine_threshold = 0.9,
  stopwords = "en",
  n_grams = 1
)

## End(Not run)

Predict topics of tweets using fitted LDA model

Description

Predict topics of tweets using fitted LDA model.

Usage

predict_lda(
  data,
  lda_model,
  response = "max",
  remove_numbers = TRUE,
  remove_punct = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE
)
predict_lda(
  data,
  lda_model,
  response = "max",
  remove_numbers = TRUE,
  remove_punct = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE
)

Arguments

`data`	Data frame containing tweets and hashtags. Works with any data frame, as long as there is a "text" column of type character string and a "hashtags" column with comma separated character vectors. Can be obtained either by using `load_tweets` on a json object returned by Twitter's API v1.1 or by using `stream_in` on any json file, as long as it has a "text" and "hashtags" field. If you are unsure about the requirements you may load the sample piece of data contained in the package by following the example in the the example section of this help page.
`lda_model`	Fitted LDA Model. Object of class LDA.
`response`	Type of response. Either "prob" for probabilities or "max" one topic (default).
`remove_numbers`	Logical. If `TRUE` remove tokens that consist only of numbers, but not words that start with digits, e.g. 2day. See tokens.
`remove_punct`	Logical. If `TRUE` remove all characters in the Unicode "Punctuation" [P] class, with exceptions for those used as prefixes for valid social media tags if `preserve_tags = TRUE`. See tokens
`remove_symbols`	Logical. If `TRUE` remove all characters in the Unicode "Symbol" [S] class.
`remove_url`	Logical. If `TRUE` find and eliminate URLs beginning with http(s).

Value

Data frame of topic predictions or predicted probabilities per topic (see response).

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

# Predict topics of tweets using your fitted LDA model
predict_lda(mytweets, model, response = "prob")

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

# Predict topics of tweets using your fitted LDA model
predict_lda(mytweets, model, response = "prob")

## End(Not run)

Create interactive visualization with LDAvis

Description

Converts LDA topic model to LDAvis compatible json string and starts server. May require servr Package to run properly. For conversion of STM topic models use toLDAvis.

Usage

to_ldavis(fitted, corpus, doc_term)
to_ldavis(fitted, corpus, doc_term)

Arguments

`fitted`	Fitted LDA Model. Object of class LDA)
`corpus`	Document corpus. Object of class corpus)
`doc_term`	document term matrix (dtm).

Details

Beware that to_ldavis might fail if the corpus contains documents that consist ONLY of numbers, emojis or punctuation e.g. do not contain a single character string. This is due to a limitation in the topicmodels package used for model fitting that does not consider such terms as words and omits them causing the posterior to differ in length from the corpus. If you encounter such an error, redo your pre-processing and exclude emojis, punctuation and numbers. When using pool_tweets you can remove emojis by specifying remove_emojis = TRUE.

Value

Invisible Object (see serVis)).

Examples

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix
pooled_corp <- pool$corpus

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

# Explore your topics with LDAvis
to_ldavis(model, pooled_corp, pooled_dfm)

## End(Not run)

## Not run: 

library(Twitmo)

# load tweets (included in package)
mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo"))

# Pool tweets into longer pseudo-documents
pool <- pool_tweets(data = mytweets)
pooled_dfm <- pool$document_term_matrix
pooled_corp <- pool$corpus

# fit your LDA model with 7 topics
model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs")

# Explore your topics with LDAvis
to_ldavis(model, pooled_corp, pooled_dfm)

## End(Not run)

Package 'Twitmo'

Help Index

Cluster tweets on an interactive map

Description

Usage

Arguments

Details

Value

See Also

Examples

Filter tweets

Description

Usage

Arguments

Details

Value

Examples

Find best LDA model

Description

Usage

Arguments

Value

See Also

Examples

Find best STM/CTM

Description

Usage

Arguments

Details

Value

See Also

Examples

Fit CTM (Correlated topic model)

Description

Usage

Arguments

Value

See Also

Examples

Fit LDA Topic Model

Description

Usage

Arguments

Value

Examples

Fit STM (Structural topic model)

Description

Usage

Arguments

Details

Value

See Also

Examples

Sample tweets by streaming or searching

Description

Usage

Arguments

Details

Value

References

See Also

Examples

View distribution of fitted LDA Models

Description

Usage

Arguments

Value

Examples

View Documents (hashtags) heavily associated with topics

Description

Usage

Arguments

Value

Examples

View Terms heavily associated with each topic

Description

Usage

Arguments

Value

Examples