|
|
2014 » Papers » Volume 1 » Tweets topic modeling across different countries 1. TWEETS TOPIC MODELING ACROSS DIFFERENT COUNTRIES Authors: ABDELWAHAB Ahmed, Chiru Costin, Robles Jose, Rebedea Traian Volume 1 | DOI: 10.12753/2066-026X-14-019 | Pages: 134-141 | Download PDF | Abstract
Every culture has its own topics of interest and its hot topics. In this paper we present a system that helps for a better understanding of different cultures, starting from the topics that are debated between their members. In order to do that, we recorded and analyzed the content of the messages that are sent by the citizens of different countries on Twitter (a worldwide conversations system), hoping that this way we will be able to capture the topics of interest for each culture and predict their hot topics. We did our analysis on English written tweets, based on the fact that English has become a global language, being spoken even by internet users from non-English speaking countries when they want to share their thoughts and have a global understanding amongst the readers.
Our study is trying to capture the topic model for the tweets and for the URL shared in those tweets separately and then to compare the distribution of topics across different countries for both the tweets and the URLs to check how consistent these models are. For the topic modeling task, we designed a specialized way of developing them that is adapted for tweets (which have a maximum of 140 characters, being too short for applying classic topic modeling methods). Our developed application has been tested on a corpus consisting on English tweets that have a location attached and contains URL which were collected using the Twitter sampler API. In order to eliminate our bias, we extracted tweets without any restrictions (including tweets written in other languages, tweets without URLs, tweets without location attached) and then we checked the percentage of our targeted tweets for each country. As a consequence, we extended the period of collecting the tweets to decrease the risk of dealing with abnormal events occurring in a certain country. | Keywords
topic modeling, LDA, tweets clustering, microblogging topic modeling, multi-culturality |
|
|
|