Preserving data from social media is crucial for many scientific disciplines. Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. To reduce the reliance on commercial gatekeepers, we decided in 2013 to create a large-scale longitudinal archive of tweets from X (then Twitter) for research purposes. We collected data from the then freely available random sample of 1% of all tweets from Twitter’s streaming API.
In this talk, we will introduce TweetsKB – a knowledge base of tweets that has been enriched with named entities and sentiments. We also show how TweetsKB can be used to create topic specific sub-corpora, focusing on important societal events such as the COVID-19 pandemic. Understanding the COVID-19 discourse, its differences to the general Twitter discourse, and interdependencies with real-world events or (mis)information can foster valuable insights.
Presenters:
Sebastian Schellhammer
1 Comment
twitter doesnt exist anymore. didnt you know this?