MSc Thesis: Aspect-Based Emotion Analysi

MSc Thesis

Aspect-Based Emotion Analysis of Geo-Social Media Data using BERT

Abstract

This thesis explores the application of Aspect-Based Emotion Analysis (ABEA) in the context of disaster response, specifically analyzing social media data related to California’s 2020 wildfire season. To facilitate the fine-tuning of a BERT-based model, a novel ABEA training dataset, consisting of 2,621 English tweets, was created using group annotation and majority voting strategies to support consensus-building and dataset consistency. The dataset contains aspect-level emotions labels for anger, sadness, happiness, fear, and none. Through hyperparameter optimization (HPO) and comprehensive validation, various model configurations were examined to address the challenges of overfitting and model performance plateaus on limited training data. The highest averaging configuration achieved F1-Scores of 70.1 for aspect term extraction (ATE) and 46.9 for the joint tasks of ATE and aspect emotion classification (AEC). While much research has been published on Aspect-Based Sentiment Analysis (ABSA), this is the known first study to present both a dataset and model results for ABEA.

As a proof of concept, the study then applied the ABEA model in a wildfire case study to analyze spatio-temporal ABEA patterns related to “fire” and “disaster response” topics, revealing intensified public engagement and heightened emotion ratios in close proximity to active wildfires. This work contributes to ABEA research by proposing a scalable annotation methodology, detailing fine-tuning challenges, and evaluating ABEA’s applicability to spatio-temporal disaster analysis. Future research directions may include refinements to the annotation guideline to increase label consistency, the integration of data augmentation techniques to increase model robustness and exploring the use of general-purpose language models for ABEA tasks.

Research Questions

RQ1. What methods can be employed to increase annotation consistency for ambiguous annotation tasks, such as aspect-level emotion annotations? How can a measure for inter-annotator agreement be found and interpreted at the aspect-level?

RQ2. What is the efficacy of repurposing an aspect-based sentiment analysis (ABSA) model for aspect-based emotion analysis (ABEA) in the domain of disaster responses using a new training dataset? Which hyperparameter configurations significantly affect model performance?

RQ3. Can the integration of spatio-temporal analyses with ABEA offer actionable insights for disaster response by showing distinct spatio-temporal patterns for specific emotions and emotion-targets? How does the spread of emotions correlate with validation datasets such as wildfire footprints?

Methodology Overview

The methodology combines data annotation, model fine-tuning, and application analysis to explore the viability of Aspect-Based Emotion Analysis (ABEA) for disaster response. First, a dataset of 2,621 English tweets related to California’s 2020 wildfire season was developed, utilizing group annotation and majority voting to ensure label consistency for emotions at the aspect level, specifically for anger, sadness, happiness, fear, and none. This annotated dataset served as the foundation for training a BERT-based model through an extensive hyperparameter optimization (HPO) process. Various model configurations were tested to balance overfitting and underfitting challenges, with the highest-performing configuration achieving an F1-Score of 70.1 for aspect term extraction (ATE) and 46.9 for joint ATE and aspect emotion classification (AEC). As a proof of concept, the fine-tuned model was applied to a spatio-temporal case study of the California wildfires, analyzing emotional engagement and identifying geographical and temporal patterns in the public's response to the wildfires. This study thus demonstrates a structured approach to adapting ABEA for real-world applications in disaster response.

Theoretical Background

The Different Levels of Sentiment Analysis

Sentiment analysis (SA) is a subfield of natural language processing (NLP) focused on extracting sentiment polarities from text, typically classifying sentences into positive, neutral, or negative categories.

SA operates at various levels:

Document-level, which classifies an entire text;
Sentence-level, which classifies individual sentences; and
Aspect-based sentiment analysis (ABSA), which focuses on sentiment at the sub-sentence or word level.

ABSA is particularly useful for extracting detailed opinions about specific aspects or features of an entity, such as a product's attributes, and has been increasingly researched due to advancements in NLP models, particularly transformer-based architectures like BERT. These models have demonstrated strong performance in various sentiment-related tasks, although challenges remain when dealing with social media data or more complex sentiment subtasks.

What Sub-tasks does Aspect-Based Sentiment Analysis actually solve?

ABSA consists of various sub-tasks, with some models aiming to unify several sub-tasks into one single model, while other methodologies employ a pipeline method, where different models are used sequentially to solve the sub-tasks.

Despite a lack of standard taxonomies and definitions in the field of ABSA, there are generally four sub-tasks:

Aspect term extraction (ATE): finding the target word(s) of the sentiment in the text,
Aspect category detection (ACD): identifying the thematic category of the aspect term,
Opinion term extraction (OTE): finding the word(s) signaling the sentiment, and
Aspect sentiment classification (ASC): labeling the aspect’s sentiment [positive, neutral, negative].

Using Pre-Trained Language Models for Fine-Tuning

Pre-trained language models, such as BERT, are foundational to modern NLP due to their ability to generate context-aware word representations through extensive pre-training on large datasets. These models, built upon the Transformer architecture, effectively capture long-range dependencies in text using self-attention mechanisms, outperforming previous architectures like LSTMs. The rise of pre-trained models has led to a focus on fine-tuning, a process that adapts these models for specific tasks like sentiment analysis by retraining a task-specific classification head while preserving or adjusting the pre-trained model's learned parameters. Fine-tuning involves optimizing hyperparameters and adjusting model layers, with techniques like manual tuning, grid search, and automated methods being employed to maximize performance. This approach enables the efficient transfer of knowledge from general-purpose models to domain-specific tasks, significantly improving performance across a variety of NLP benchmarks.

Has Aspect-Based Emotion Analysis been done?

A review of scientific studies on Aspect-Based Emotion Analysis (ABEA) reveals varying methodologies, many of which do not perform aspect term extraction (ATE) directly from the text. Following the taxonomy of Zang et al. (2022), a focus was placed on End-2-End models that handle both ATE and Aspect Emotion Classification (AEC). However, many studies, such as those by Padme and Kulkarni (2018) and Suciati and Budi (2020), rely on predefined categories or sentence-level classifications rather than extracting aspects from text. Similarly, Sirisha and Bolem (2022) and Dehboyorgi and Mohandoss (2021) report sentence-level emotion classification without explicitly identifying aspect terms. This indicates a lack of joint methods in ABEA, with most research still using pipeline approaches that separately tackle subtasks. As a result, comparing methodologies across studies is challenging due to their differing task conceptualizations and focus.

Paper	Method Description	Models or Packages	Language	Emotion Classes	What are the Aspects	ATE from Text
De Bruyne et al. (2022)	Multimodal ABEA, with images and comments from adidas Instagram account	For ATE: pre-trained twitter-RoBERTa-base For AEC: pretrained twitter-RoBERTa-emotion-model	English	Anger, joy, love, longing, sadness, neutral	Pre-defined categories and sub-categories	No
Dehbozorgi and Mohandoss (2021)	ABEA in two steps: first emotion identification at sentence-level, then aspect term extraction	For emotion classification: Text2Emotion python package For ATE: rule-based grammatical tagging with NLTK package	English	Anger, joy, sadness, surprise, happiness	Sequences in the text that follow a certain Part-Of-Speech pattern	Likely, but not verifiable (no examples or details)
Suciati and Budi (2020)	Sentence-level emotion detection for 4 pre-defined aspect categories using 4 different methods	For emotion classification: Decision Tree, Random Forest, Support Vector Machine, Extra Tree Classifier	Indonesian and English	Happy, sadness, surprised, neutral	Four pre-defined categories (service, ambience, price, food)	No
Padme and Kulkarni (2018)	Sentence-level emotion analysis for topics found through topic modeling	For aspect (topic) identification in the whole dataset: Latent Dirichlet allocation (LDA) For emotion classification: NRC Emotion Dictionary	English	Anger, sadness, surprise, joy, fear	General topics that are found in the dataset through topic modeling	No
Sirisha and Bolem (2022)	Sentence-level sentiment and emotion classification	For emotion and sentiment classification: novel hybrid ABSA-RoBERTa-LSTM	English	Anger, sadness, optimism, joy	Unclear, no reference made to aspect terms	No
Mehra (2023)	“Segment-level” sentiment and emotion classification, for predefined aspect categories	For emotion classification: Text2Emotion python package For sentiment classification: a modified BERT model (no details)	English	Anger, fear, happiness, sadness, surprise	Seven pre-defined categories	No
Ismail et al. (2022)	Sentence-level sentiment and emotion analysis for predefined aspect categories	For sentiment classification: TextBlob python library For emotion classification: Linear support vector classifier	English	Anger, happiness, sadness, surprise, fear, love	Three pre-defined categories	No
Üveges et al. (2022)	ABSA and ABEA for Hungarian speeches, though no details are given on the aspect terms	An adapted BERT-based Hungarian model	Hungarian	12 emotions based on Plutchik’s model	Keywords in the text, that evoke the emotion	Likely, but not verifiable (no examples or details)
De Geyndt et al. (2022)	ABSA and ABEA for reviews, emails, conversations	For ATE: Conditional Random Field For AEC: RobBERT embeddings as input for SVM	Dutch, English, French, German	12 emotions based on Plutchik’s model	Word(s) in the text that	Yes
Mustapha (2024)	Sentence-level emotion classification for main categories which represent the “causes” for emotions	For aspect category detection: k-means clustering and GPT-3.5 interpretation For AEC: GPT-3.5-turbo	English	30 emotions, manually selected based on frequency in dataset	Five categories that are found by analyzing the dataset	No

Page 1 of 1

Dataset Annotations

Creating a New ABEA Training Dataset

The Annotation Process

The annotation process focused on constructing a high-quality dataset tailored to aspect-based emotion analysis (ABEA), comprising 2,621 tweets annotated for aspect terms and five emotion classes—anger, sadness, happiness, fear, and none. Seven human annotators followed a structured guide grounded in Shaver et al.'s hierarchical emotion model, which grouped emotions into clusters to increase clarity and reduce ambiguity in label assignment. Challenges arose from the subjective nature of emotions, especially in written text without visual or tonal context, leading to difficulty in distinguishing between closely related emotions, like worry and caring. The annotation guide aimed to balance consistency and interpretative flexibility, but cases involving sarcasm, hypothetical scenarios, or ambiguous expressions required group discussions and majority voting to reach a consensus. To capture unemotional content, the “None” category was added, which improved dataset balance and ensured that neutral expressions were adequately represented for model training.

The Doccano Labeling Tool Interface with an Example Tweet and Annotations

Emotion Annotation Guide, adapted from Shaver et al. (1987)

The Dataset

After annotation and refinement, the dataset included five aspect-level emotion labels: anger, sadness, happiness, fear, and none, with "happiness" being the most frequent, accounting for 39% of the labeled aspect terms. In contrast, "fear" appeared less frequently, at only 5%. Additionally, the aspect terms were categorized by part of speech, with nouns comprising the majority. Specifically, 45% were singular nouns, 16% were singular proper nouns, and 13% were plural nouns. Other parts of speech, such as pronouns and adjectives, were less represented in the aspect terms

Emotion Labels in the ABEA Training Dataset

Top 10 Part-of-Speech Tags in the ABEA Aspect Terms

Model Fine-Tuning

Fine-Tuning a BERT-Based Model for ABEA

The GRACE Model: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis

For the model choice, the thesis employed the GRACE model, a BERT-based architecture optimized for ABSA, as the foundational model for ABEA. GRACE was selected due to its capacity for co-extracting both aspect terms and sentiments within the same framework, aligning well with the goals of identifying and classifying both aspect terms and emotions. The model’s architecture allowed for the extraction of context-specific emotional nuances linked to aspect terms, making it an appropriate candidate for expanding from traditional sentiment analysis to more complex emotion classification. GRACE’s initial settings, designed to handle high-level sentiment tasks, provided a robust baseline for understanding the model’s performance when adapted for ABEA, even though it presented challenges given the task’s increased complexity and the small dataset size.

Schematic View of GRACE’s Base Frame with shared lower Layers, and separate computational Branches for GRACE’s two ABSA tasks ATE and ASC (Luo et al., 2020).

An Overview of Cascaded Labeling in the GRACE Architecture (Lou et al., 2020)

Hyperparameter Optimization (HPO)

The HPO process was extensive and iterative, aiming to balance the risk of overfitting due to the limited dataset size with the need for improved generalization across diverse emotional categories. Key hyperparameters, including dropout rates, weight decay, batch size, and learning rates, were adjusted over multiple configurations, with a particular focus on avoiding the model’s tendency to overfit the training data. Additionally, modifications were made to the number of shared layers between the Aspect Term Extraction (ATE) and Aspect Emotion Classification (AEC) branches, as well as to the structure of the classification heads.

HPO and Model Results

The HPO results revealed several key insights into the model’s performance and limitations when adapted for Aspect-Based Emotion Analysis (ABEA). The optimization process, which spanned 55 configurations, aimed to maximize F1-scores for both the Aspect Term Extraction (ATE) and the joint ATE and Aspect Emotion Classification (AEC) tasks. The highest-performing configuration, achieved with adjustments to the dropout rates, batch size, and number of shared layers, yielded an F1-score of 70.1 for the ATE task and 46.9 for the joint ATE and AEC task. This result, while significant as a baseline for ABEA, highlighted the model’s challenge in achieving high precision across both tasks due to limited training data and the complexity of emotion classification.

During HPO, dropout rates were increased to address the model's overfitting tendencies, with the highest F1-scores obtained at a dropout rate of 0.3, indicating a balanced trade-off between model complexity and regularization. Adjustments to the classification heads and sharing fewer layers between ATE and AEC tasks also showed moderate performance gains. However, other modifications, such as weight decay increases and classifier head architecture changes, resulted in only minimal improvements or, in some cases, reduced model performance, particularly on external validation datasets. Overall, the HPO results underscored the difficulty of adapting GRACE for ABEA, suggesting that further optimization, particularly with larger datasets or additional training steps, could yield improved generalizability for emotion detection.

F1-Score per Epoch for Hyperparameter Configuration 42, taken from the 10-fold cross validation

Case Study Application

Case Study Methodology

The case study methodology involved applying the fine-tuned ABEA model to analyze social media responses to California’s 2020 wildfires, focusing specifically on spatio-temporal patterns of emotional discourse. Tweets from geotagged, timestamped data were filtered for relevance to the fire events, creating subsets labeled "during and near" and "not during but near" for comparison. A 5x5 km hexagonal grid was overlaid on the affected areas to capture spatial variations, while temporal spikes in discourse were analyzed to correlate with key wildfire events, such as onset and rapid spread phases. For each grid cell, changes in the ratios of emotions—anger, sadness, happiness, and fear—were examined, along with the frequency of aspect terms like "fire" and "disaster response" to provide insights into public engagement around the fires. This spatio-temporal framework enabled the study to assess how emotional responses varied across time and location, shedding light on the intensity and scope of public sentiment during critical moments of the wildfire crisis.

California Wildfire Season 2020: Wildfire and Twitter Datasets Overview

Example of the Comparison Subsets for the SCU Complex Fire: ‘during and near’, ‘not during but near’, and ‘during but not near’

Case Study Results

1. Emotion Changes between Wildfire-Times and Non-Wildfire-Times

The first set of results focused on changes in emotional ratios between times of active wildfire and non-wildfire periods. The analysis revealed an increase in sadness and fear during wildfire events, particularly in areas closest to the fires, indicating heightened public concern and distress. Happiness, which included emotions like hope and support, also increased within fire-affected zones, suggesting a sense of solidarity or relief in certain contexts. Meanwhile, anger was expressed in areas both directly impacted and in nearby regions, likely reflecting frustration with the disaster response or the wildfire situation itself. By combining a binary color scale with 3D extrusions representing tweet volumes, these results provided a nuanced spatial view of emotional engagement, highlighting areas of intensified public sentiment during the crisis.

Change in Sadness Ratio for the SCU Complex Fire, comparing “during” with “not-during” Subsets

Change in Sadness Ratio for the Hennessey Fire, comparing “during” with “not-during” Subsets, with Example Tweets

Change in Anger Ratio for the Hennessey Fire (top) and the SCU Complex Fire (bottom) by comparing “during” with “not-during” Subsets

2. Aspect-Term Frequencies Near and Not-Near Wildfires

The second type of analysis examined the frequency of aspect terms, specifically focusing on “fire” and “disaster response” terms, in regions close to the fires compared to those further away. Timelines created for each region showed distinct peaks in aspect term frequencies during key stages of each wildfire, particularly during the initial outbreak and periods of rapid spread. Regions near the wildfires displayed significantly higher engagement with fire-related and disaster-response topics, indicating a strong localized reaction. This pattern underscores the relevance of proximity in shaping public discourse during wildfires, as communities closer to the events were more likely to discuss immediate concerns and response efforts.

Timelines for tweeted “Fire”-Related and “Disaster-Response”-Related Aspect-Terms per Day, comparing Regions Near Wildfires with Regions Not Near

Emotion frequencies for “fire” aspect terms over time for the SCU Complex Fire (top) and the Hennessey Fire (bottom)

3. Emotions and Aspect Terms at the Local Level

The final analysis explored local-level emotional responses and aspect-term variations in towns near the wildfire perimeters, such as Vacaville, Napa, and Fairfield. The results showed that each location had distinct patterns of emotional and topical engagement, reflecting the unique impact of the fires on nearby communities. For example, Vacaville exhibited high levels of anger and anxiety in tweets related to evacuation efforts, while Fairfield showed discourse around relief and support themes, particularly with aspect terms like “animal rescue” and “displaced victims.” This localized breakdown provided insights into the specific concerns and emotional states of different areas, demonstrating the potential of ABEA to inform targeted disaster response efforts based on real-time public sentiment at a granular level.

A Closer Inspection of the Aspect Terms and Emotions related to Disaster Responses for Three Areas of Interest near the Hennessey Wildfire: Vacaville, Napa, and Fairfield

Discussion

This thesis achieved notable advances in developing a novel ABEA dataset and fine-tuning a BERT-based model for aspect-based emotion analysis. The annotation process was structured to improve consistency, but accurately interpreting emotions from text alone posed challenges, especially with cases involving sarcasm, hypothetical scenarios, or ambiguous emotional expressions. This limitation suggests that future annotations might benefit from refined guidelines or even context-enriched data to better capture the complexity of human emotions.

The dataset itself, while foundational for training ABEA models, revealed some inherent tensions. Introducing a "None" class allowed the model to differentiate unemotional content, reducing bias in classification. However, this label occasionally blurred the distinction between genuinely neutral statements and unemotional content, creating a subtle ambiguity. Expanding the dataset through data augmentation techniques, such as synonym replacement or paraphrasing, could enhance its representativeness, potentially helping models generalize better to diverse text samples.

Fine-tuning the model highlighted persistent challenges in balancing overfitting and underfitting, largely due to the dataset's limited size. Hyperparameter optimization efforts improved aspect term extraction but revealed performance plateaus for the joint task of extracting both aspect terms and associated emotions. This outcome suggests that future work could explore more robust architectures and larger datasets to strengthen the model's generalizability and accuracy in handling complex classification tasks.

The case study application underscored ABEA's potential to analyze spatio-temporal patterns of public sentiment, particularly valuable for monitoring disaster-related discourse. Findings illustrated meaningful emotional shifts near active wildfires, indicating potential applications for real-time disaster response. However, practical limitations—such as moderate model performance on emotion classification and issues like the Modifiable Areal Unit Problem (MAUP) in spatial analyses—highlight areas for refinement. To make ABEA a more practical tool for real-world applications, future research should focus on optimizing both model training and data collection to enhance accuracy, scalability, and relevance in crisis management scenarios.