Title: | Twitter Topic Modeling and Visualization for R |
---|---|
Description: | Tailored for topic modeling with tweets and fit for visualization tasks in R. Collect, pre-process and analyze the contents of tweets using LDA and structural topic models (STM). Comes with visualizing capabilities like tweet and hashtag maps and built-in support for 'LDAvis'. |
Authors: | Andreas Buchmueller [aut, cre] (github.com/abuchmueller),
Gillian Kant [aut, ths] |
Maintainer: | Andreas Buchmueller <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.5 |
Built: | 2025-03-05 06:35:09 UTC |
Source: | https://github.com/abuchmueller/twitmo |
Plot into clusters on an interactive map
cluster_tweets(data, ...)
cluster_tweets(data, ...)
data |
A data frame of tweets parsed by load_tweets or returned by pool_tweets. |
... |
Extra arguments passed to markerClusterOptions |
This function can be used to create interactive maps on OpenStreetView.
Interactive leaflet map
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) pool <- pool_tweets(mytweets) cluster_tweets(mytweets) # OR cluster_tweets(pool$data) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) pool <- pool_tweets(mytweets) cluster_tweets(mytweets) # OR cluster_tweets(pool$data) ## End(Not run)
Filter tweets by keywords.
filter_tweets(data, keywords, include = TRUE)
filter_tweets(data, keywords, include = TRUE)
data |
Data frame containing tweets and hashtags. Works with any data frame, as long as there
is a "text" column of type character string and a "hashtags" column with comma separated character vectors.
Can be obtained either by using |
keywords |
Character string of keywords for black- or whitelisting provided via a comma separated character string. |
include |
Logical. Indicate where to perform exclusive or inclusive filtering. Inclusive filtering is akin to whitelisting keywords. Exclusive filtering is blacklisting certain keywords. |
Use this function if you want your Tweets to contain certain keywords. This can be used for iterative filtering to create more coherent topic models. Keyword filtering is always case insensitive (lowercase).
Data frame of Tweets containing specified keywords
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Exclude Tweets that mention "football" and/or "mood" keyword_dict <- "football,mood" mytweets_reduced <- filter_tweets(mytweets, keywords = keyword_dict, include = FALSE) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Exclude Tweets that mention "football" and/or "mood" keyword_dict <- "football,mood" mytweets_reduced <- filter_tweets(mytweets, keywords = keyword_dict, include = FALSE) ## End(Not run)
Find the optimal hyperparameter k for your LDA model
find_lda(pooled_dfm, search_space = seq(1, 10, 2), method = "Gibbs", ...)
find_lda(pooled_dfm, search_space = seq(1, 10, 2), method = "Gibbs", ...)
pooled_dfm |
object of class dfm (see dfm) containing (pooled) tweets |
search_space |
Vector with number of topics to compare different models. |
method |
The method to be used for fitting. Currently method = "VEM" or method = "Gibbs" are supported. |
... |
Additional arguments passed to FindTopicsNumber. |
Plot with different metrics compared.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # use the ldatuner to compare different K find_lda(pooled_dfm, search_space = seq(1, 10, 1), method = "Gibbs") ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # use the ldatuner to compare different K find_lda(pooled_dfm, search_space = seq(1, 10, 1), method = "Gibbs") ## End(Not run)
Gridsearch for optimal K for your STM/CTM
find_stm(data, search_space = seq(4, 20, by = 2), ...)
find_stm(data, search_space = seq(4, 20, by = 2), ...)
data |
Either a pooled dfm object returned by pool_tweets or
a named list of pre-processed tweets for stm modeling returned by |
search_space |
Vector with number of topics to compare different models. |
... |
Additional parameters passed to searchK |
Wrapper function around searchK
for pooled dfm objects returned by
pool_tweets and prepped stm documents returned by fit_stm
.
Plot with different metrics compared.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # compare different K for CTM find_stm(pooled_dfm, search_space = seq(1, 10, 1)) # OR # compare different K for STM prepped_stm <- stm_model$prep find_stm(prepped_stm, search_space = seq(4, 16, by = 2)) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # compare different K for CTM find_stm(pooled_dfm, search_space = seq(1, 10, 1)) # OR # compare different K for STM prepped_stm <- stm_model$prep find_stm(prepped_stm, search_space = seq(4, 16, by = 2)) ## End(Not run)
Estimate a CTM topic model.
fit_ctm(pooled_dfm, n_topics = 2L, ...)
fit_ctm(pooled_dfm, n_topics = 2L, ...)
pooled_dfm |
Object of class dfm (see dfm) containing (pooled) Tweets. |
n_topics |
Integer with number of topics |
... |
Additional arguments passed to stm. |
Object of class stm
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your CTM with 7 topics ctm_model <- fit_ctm(pooled_dfm, n_topics = 7) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your CTM with 7 topics ctm_model <- fit_ctm(pooled_dfm, n_topics = 7) ## End(Not run)
Estimate a LDA topic model using VEM or Gibbs Sampling.
fit_lda(pooled_dfm, n_topics, ...)
fit_lda(pooled_dfm, n_topics, ...)
pooled_dfm |
Object of class dfm (see dfm) containing (pooled) tweets. |
n_topics |
Integer with number of topics. |
... |
Additional arguments passed to LDA. |
Object of class LDA.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") ## End(Not run)
Estimate a structural topic model
fit_stm( data, n_topics = 2L, xcov, remove_punct = TRUE, stem = TRUE, remove_url = TRUE, remove_emojis = TRUE, stopwords = "en", ... )
fit_stm( data, n_topics = 2L, xcov, remove_punct = TRUE, stem = TRUE, remove_url = TRUE, remove_emojis = TRUE, stopwords = "en", ... )
data |
Data frame containing tweets and hashtags. Works with any data frame, as long as there
is a "text" column of type character string and a "hashtags" column with comma separated character vectors.
Can be obtained either by using |
n_topics |
Integer with number of topics. |
xcov |
Either a \[stats]formula with an empty left-hand side specifying external covariates
(meta data) to use.e.g. |
remove_punct |
Logical. Indicates whether punctuation (includes Twitter hashtags and usernames) should be removed. Defaults to TRUE. |
stem |
Logical. If |
remove_url |
Logical. If |
remove_emojis |
Logical. If |
stopwords |
a character vector, list of character vectors, dictionary or collocations object. See pattern for details. Defaults to stopwords("english"). |
... |
Additional arguments passed to stm. |
Use this to function estimate a STM from a data frame of parsed Tweets. Works with unpooled Tweets only. Pre-processing and fitting is done in one run.
Object of class stm. Additionally, pre-processed documents are appended into a named list called "prep".
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # fit STM with tweets stm_model <- fit_stm(mytweets, n_topics = 7, xcov = ~ retweet_count + followers_count + reply_count + quote_count + favorite_count, remove_punct = TRUE, remove_url = TRUE, remove_emojis = TRUE, stem = TRUE, stopwords = "en" ) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # fit STM with tweets stm_model <- fit_stm(mytweets, n_topics = 7, xcov = ~ retweet_count + followers_count + reply_count + quote_count + favorite_count, remove_punct = TRUE, remove_url = TRUE, remove_emojis = TRUE, stem = TRUE, stopwords = "en" ) ## End(Not run)
Collect Tweets via streaming or searching.
get_tweets( method = "stream", location = c(-180, -90, 180, 90), timeout = Inf, keywords = "", n_max = 100L, file_name = NULL, ... )
get_tweets( method = "stream", location = c(-180, -90, 180, 90), timeout = Inf, keywords = "", n_max = 100L, file_name = NULL, ... )
method |
Character string. Supported methods are streaming and searching.
The default method is streaming |
location |
Character string of location to sample from. Can be a three letter country code i.e. "USA" or a city name like "berlin".
Use |
timeout |
Integer. Limit streaming time in seconds. By default will stream indefinitely until user interrupts by pressing [ctrl + c]. |
keywords |
Character string of keywords provided via a comma separated character string. Only for searching Tweets.If you want to stream Tweets for a certain location AND filter by keywords use the location parameter and after sampling use the filter_tweets function. If you are using the search method instead of streaming keywords WILL work together with a location but will yield only a very limited number of Tweets. |
n_max |
Integer value. Only applies to the |
file_name |
Character string of desired file path and file name where Tweets will be saved. If not specified, will write to stream_tweets.json in the current working directory. |
... |
Additional arguments passed to stream_tweets or search_tweets. |
A function that calls on stream_tweets and search_tweets (depending on the specified method) and is specifically tailored for sampling geo-tagged data. This function provides supports additional arguments like location for convenient sampling of geo-tagged Tweets. Tweets can be searched up to 9 days into the past.
Either a json file in the specified directory.
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets https://developer.twitter.com/en/docs/twitter-api/v1/tweets/sample-realtime/api-reference/get-statuses-sample
## Not run: # live stream tweets from Germany for 60 seconds and save to current working directory get_tweets( method = "stream", location = "DEU", timeout = 60, file_name = "german_tweets.json" ) # OR # live stream tweets from berlin for an hour get_tweets( method = "stream", location = "berlin", timeout = 3600, file_name = "berlin_tweets.json" ) # OR # use your own bounding box coordinates to strean tweets indefinitely (interrupt to stop) get_tweets( method = "stream", location = c(-125, 26, -65, 49), timeout = Inf ) ## End(Not run)
## Not run: # live stream tweets from Germany for 60 seconds and save to current working directory get_tweets( method = "stream", location = "DEU", timeout = 60, file_name = "german_tweets.json" ) # OR # live stream tweets from berlin for an hour get_tweets( method = "stream", location = "berlin", timeout = 3600, file_name = "berlin_tweets.json" ) # OR # use your own bounding box coordinates to strean tweets indefinitely (interrupt to stop) get_tweets( method = "stream", location = c(-125, 26, -65, 49), timeout = Inf ) ## End(Not run)
View the distribution of your fitted LDA model.
lda_distribution(lda_model, param = "gamma", tidy = FALSE)
lda_distribution(lda_model, param = "gamma", tidy = FALSE)
lda_model |
Object of class LDA). |
param |
String. Specify either "beta" to return the term distribution over topics (term per document) or "gamma" for the document distribution over. topics (i.e. hashtag pool per topic probability). |
tidy |
Logical. Specify |
Data frame or tbl of Term (beta) or document (gamma) distribution over topics.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") # Choose either "beta" to return the term distribution # over topics (term per document) or "gamma" for the document distribution over # topics (hashtag pool per topic probability) lda_distribution(model, param = "gamma") ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") # Choose either "beta" to return the term distribution # over topics (term per document) or "gamma" for the document distribution over # topics (hashtag pool per topic probability) lda_distribution(model, param = "gamma") ## End(Not run)
Convenience Function to extract the most likely topics for each hashtag.
lda_hashtags(lda_model)
lda_hashtags(lda_model)
lda_model |
Fitted LDA Model. Object of class LDA). |
Data frame with most likely topic for each hashtag.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") lda_hashtags(model) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") lda_hashtags(model) ## End(Not run)
Convenience Function to extract the most likely terms for each topic.
lda_terms(lda_model, n_terms = 10)
lda_terms(lda_model, n_terms = 10)
lda_model |
Fitted LDA Model. Object of class LDA). |
n_terms |
Integer number of terms to return. |
Data frame with top n terms for each topic.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") ## End(Not run)
Parse JSON files of collected Tweets
load_tweets(file_name)
load_tweets(file_name)
file_name |
Character string. Name of JSON file with data collected by
stream_tweets or |
This function replaces parse_stream which has been
deprecated in rtweet 0.7 but is included here to ensure backwards compatibility
for data streamed with older versions of rtweet
.
Alternatively stream_in in conjunction with tweets_with_users
and lat_lng can be used if data has been collected with rtweet 0.7 or newer.
A data frame of tweets data with additional meta data
parse_stream, stream_in, tweets_with_users
## Not run: library(Twitmo) # load tweets (included in package) raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo") mytweets <- load_tweets(raw_path) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo") mytweets <- load_tweets(raw_path) ## End(Not run)
Plot the locations of certain hashtag on a static map with base plot.
plot_hashtag( data, region = ".", alpha = 0.01, hashtag = "", ignore_case = TRUE, ... )
plot_hashtag( data, region = ".", alpha = 0.01, hashtag = "", ignore_case = TRUE, ... )
data |
A data frame of tweets parsed by load_tweets or returned by pool_tweets. |
region |
Character vector specifying region. Returns a world map by default. For higher resolutions specify a region. |
alpha |
A double between 0 and 1 specifying the opacity of plotted points. See iso3166 for country codes. |
hashtag |
Character vector of the hashtag you want to plot. |
ignore_case |
Logical, if TRUE will ignore case of hashtag. |
... |
This function can be used to generate high resolution spatial plots of hashtags
Works with data frames of tweets returned by pool_tweets as well as data frames
read in by load_tweets and then augmented by lat/lng coordinates with lat_lng.
For larger view resize the plot window then call plot_tweets
again.
Maps where each dot represents a tweet.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Plot tweets on mainland USA region plot_hashtag(mytweets, region = "USA(?!:Alaska|:Hawaii)", hashtag = "breakfast", ignore_case = TRUE, alpha = 1 ) # Add title title("My hashtags on a map") ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Plot tweets on mainland USA region plot_hashtag(mytweets, region = "USA(?!:Alaska|:Hawaii)", hashtag = "breakfast", ignore_case = TRUE, alpha = 1 ) # Add title title("My hashtags on a map") ## End(Not run)
Plot tweets on a static map with base plot.
plot_tweets(data, region = ".", alpha = 0.01, ...)
plot_tweets(data, region = ".", alpha = 0.01, ...)
data |
A data frame of tweets parsed by load_tweets or returned by pool_tweets. |
region |
Character vector specifying region. Returns a world map by default. For higher resolutions specify a region. |
alpha |
A double between 0 and 1 specifying the opacity of plotted points. See iso3166 for country codes. |
... |
This function can be used to generate high resolution spatial plots of tweets.
Works with data frames of tweets returned by pool_tweets as well as data frames
read in by load_tweets and then augmented by lat/lng coordinates with lat_lng.
For larger view resize the plot window then call plot_tweets
again.
Maps where each dot represents a tweet.
## Not run: library(Twitmo) # Plot tweets on mainland USA mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) plot_tweets(mytweets, region = "USA(?!:Alaska|:Hawaii)", alpha = 1) # Add title title("My tweets on a map") ## End(Not run)
## Not run: library(Twitmo) # Plot tweets on mainland USA mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) plot_tweets(mytweets, region = "USA(?!:Alaska|:Hawaii)", alpha = 1) # Add title title("My tweets on a map") ## End(Not run)
This function pools a data frame of parsed tweets into document pools.
pool_tweets( data, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE, remove_emojis = TRUE, remove_users = TRUE, remove_hashtags = TRUE, cosine_threshold = 0.9, stopwords = "en", n_grams = 1L )
pool_tweets( data, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE, remove_emojis = TRUE, remove_users = TRUE, remove_hashtags = TRUE, cosine_threshold = 0.9, stopwords = "en", n_grams = 1L )
data |
Data frame containing tweets and hashtags. Works with any data frame, as long as there
is a "text" column of type character string and a "hashtags" column with comma separated character vectors.
Can be obtained either by using |
remove_numbers |
Logical. If |
remove_punct |
Logical. If |
remove_symbols |
Logical. If |
remove_url |
Logical. If |
remove_emojis |
Logical. If |
remove_users |
Logical. If |
remove_hashtags |
Logical. If |
cosine_threshold |
Double. Value between 0 and 1 specifying the cosine similarity threshold to be used for document pooling. Tweets without a hashtag will be assigned to document (hashtag) pools based upon this metric. Low thresholds will reduce topic coherence by including a large number of tweets without a hashtag into the document pools. Higher thresholds will lead to more coherent topics but will reduce document sizes. |
stopwords |
a character vector, list of character vectors, dictionary or collocations object. See pattern for details. Defaults to stopwords("english"). |
n_grams |
Integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a n in the n-gram(s) that are produced. See tokens_ngrams |
Pools tweets by hashtags using cosine similarity to create longer pseudo-documents for better LDA estimation and creates n-gram tokens. The method applies an implementation of the pooling algorithm from Mehrotra et al. 2013.
List with corpus object and dfm object of pooled tweets.
Mehrotra, Rishabh & Sanner, Scott & Buntine, Wray & Xie, Lexing. (2013). Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. 889-892. 10.1145/2484028.2484166.
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) pool <- pool_tweets( data = mytweets, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE, remove_users = TRUE, remove_hashtags = TRUE, remove_emojis = TRUE, cosine_threshold = 0.9, stopwords = "en", n_grams = 1 ) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) pool <- pool_tweets( data = mytweets, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE, remove_users = TRUE, remove_hashtags = TRUE, remove_emojis = TRUE, cosine_threshold = 0.9, stopwords = "en", n_grams = 1 ) ## End(Not run)
Predict topics of tweets using fitted LDA model.
predict_lda( data, lda_model, response = "max", remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE )
predict_lda( data, lda_model, response = "max", remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove_url = TRUE )
data |
Data frame containing tweets and hashtags. Works with any data frame, as long as there
is a "text" column of type character string and a "hashtags" column with comma separated character vectors.
Can be obtained either by using |
lda_model |
Fitted LDA Model. Object of class LDA. |
response |
Type of response. Either "prob" for probabilities or "max" one topic (default). |
remove_numbers |
Logical. If |
remove_punct |
Logical. If |
remove_symbols |
Logical. If |
remove_url |
Logical. If |
Data frame of topic predictions or predicted probabilities per topic (see response).
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") # Predict topics of tweets using your fitted LDA model predict_lda(mytweets, model, response = "prob") ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") # Predict topics of tweets using your fitted LDA model predict_lda(mytweets, model, response = "prob") ## End(Not run)
Converts LDA topic model to LDAvis compatible json string and starts server.
May require servr
Package to run properly. For conversion of STM topic models use toLDAvis.
to_ldavis(fitted, corpus, doc_term)
to_ldavis(fitted, corpus, doc_term)
fitted |
Fitted LDA Model. Object of class LDA) |
corpus |
Document corpus. Object of class corpus) |
doc_term |
document term matrix (dtm). |
Beware that to_ldavis
might fail if the corpus contains documents that consist ONLY of numbers,
emojis or punctuation e.g. do not contain a single character string. This is due to a limitation in the topicmodels
package
used for model fitting that does not consider such terms as words and omits them causing the posterior to differ in length from the corpus.
If you encounter such an error, redo your pre-processing and exclude emojis, punctuation and numbers.
When using pool_tweets
you can remove emojis by specifying remove_emojis = TRUE
.
Invisible Object (see serVis)).
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix pooled_corp <- pool$corpus # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") # Explore your topics with LDAvis to_ldavis(model, pooled_corp, pooled_dfm) ## End(Not run)
## Not run: library(Twitmo) # load tweets (included in package) mytweets <- load_tweets(system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")) # Pool tweets into longer pseudo-documents pool <- pool_tweets(data = mytweets) pooled_dfm <- pool$document_term_matrix pooled_corp <- pool$corpus # fit your LDA model with 7 topics model <- fit_lda(pooled_dfm, n_topics = 7, method = "Gibbs") # Explore your topics with LDAvis to_ldavis(model, pooled_corp, pooled_dfm) ## End(Not run)