Ma. Practical Thesis Context







Digitalization has transformed the collection, interpretation, analysis, representation, reproduction, and dissemination of visual information. In the context of sentiment analysis practices, which involve examining the emotional tone within textual data, data visualization, and the digital tools and technologies that enable their production and interpretation, allow us to discover patterns, insights, and relationships within the data that might not be apparent through textual analysis alone. Despite these potentials, a critical assessment of the digital methods of producing data and visualization and therefore meanings interpreted through them is often lacking. The main goal of the practical part of this research was to examine the insights a critical look at data production and visualization, and the meanings they produce, can offer in the context of sentiment analysis. Additionally, based on these critiques, the goal was to explore the possibility of creating alternative visualizations specifically for sentiment analysis in a Reddit conversation.









Joé Mertenat


Master of Arts in Digital Communication Environments
Academy of Arts and Design HGK
University of Applied Sciences and Art Northwestern Switzerland FHNW









Img. 1  This shows a part of a dataset containing different data collected from a Reddit conversation about “Why do people still don’t believe in climate chnage?”. It contains the comments of the conversation as well as other information like the name of the user who sent the comment, the votes the comment received, the time when the comment was sent, the depth of the conversation which indicates the level of comment-reply relationship, and the analyzed polarity and subjectivity values. In this case the polarity and the subjectivity were analyzed using TextBlob (see Img.3), a Python library for natural language processing. Polarity is quantified on a scale from -1 (negative) to 1 (positive), while subjectivity is measured from 0 (objective) to 1 (subjective). A potential issue with this process is that it relies on a predefined dictionary and does not consider context or nuance. It may struggle with sarcasm, irony, or cultural differences in language use. Additionally, it treats each word individually, missing the broader context in which words are used, which can lead to inaccurate polarity scores.

Img. 2 This time the polarity value of each comment is analyzed  using VADER (see also Img. 4),  it is a lexicon and rule-based sentiment analysis tool designed specifically for social media text. It captures the sentiment by considering both individual word meanings and context, including punctuation, capitalization, and emoticons, making it effective for informal, short texts like tweets and comments.  VADER calculates the polarity of a comment by using the SentimentIntensityAnalyzer from the vaderSentiment library. It computes four scores: positive, neutral, negative, and a combined compound score. The compound score is a normalized value between -1 (most negative) and 1 (most positive), representing the overall sentiment. This compound score is then used as the primary indicator of the comment's sentiment polarity.  A potential problem with VADER's process is that it may struggle with complex language structures, such as sarcasm, irony, or ambiguous context. Additionally, it relies on predefined rules and dictionaries, which might not capture all nuances or adapt well to varying cultural or domain-specific language use. This can lead to inaccuracies in the sentiment analysis.  Additionally, we can see that the results vary depending on the techniques used to measure the polarity of the comments.



Img. 3 This shows a python code used to analyze the polarity and subjectivity of each comment from the Reddit conversation. This analysis is performed using TextBlob, a Python library for natural language processing.


Img. 4 This shows a python code used to analyze the polarity and subjectivity of each comment from the Reddit conversation. This time the analysis to measure the polarity of each comment from the Reddit conversation is performed using VADER.






Img. 5 This shows the same part of the dataset, but this time the polarity and subjectivity are measured using ChatGPT-4, an advanced AI language model. Since the technique used to measure polarity and subjectivity differs with ChatGPT-4, the values are different from the previous analysis using TextBlob and VADER.
Img. 6  It shows the results from a second analysis using ChatGPT-4. Although the technique used by ChatGPT-4 to measure polarity and subjectivity is the same, the values are slightly different from those shown in Image 5, which raises concerns about consistency and reliability.





Img. 7,8,9 This selection of images shows different zoom levels from a visualization (tidy tree) created using D3.js, a JavaScript library for building dynamic, interactive data visualizations in web browsers.
Reddit comments are represented by dots, with comment-reply relationships indicated by lines. Dot color indicates polarity values analyzed using VADER: green for positive comments, light green for slightly positive, grey for neutral, light red for slightly negative, and red for negative. While D3.js has great potential for quickly presenting large amounts of data, for the sake of efficiency, it relies on predefined visualization typologies or layout algorithms. This can be problematic as it reduces complexity and nuance, limiting the ability to fully capture or represent the unique characteristics of the data, leading to oversimplified or standardized visual outputs.
Img. 10 This shows another visualization (sunburst) created using D3.js. Reddit comments are represented by areas, with the comment-reply relationships indicated by one area building upon another (from the center outward). The color of the areas reflects polarity values analyzed using VADER: green for positive comments, light green for slightly positive, grey for neutral, light red for slightly negative, and red for negative. I acknowledge the potential of this visualization in the context of sentiment analysis, as it allow for a quick display of polarity distribution throughout the conversation and can potentially help identify insights, patterns, trends, etc. However, to achieve this, it significantly reduce the nuance and complexity of the original information.



















Img. 11-14 This selection of images shows various visualizations created using Basil.js, a JavaScript library designed for scripting and automating layouts in Adobe InDesign. These visualizations present different data collected from a Reddit conversation. By correlating different data points, patterns emerge that may suggest potential relationships or influences between certain data variables. However, what I find problematic is that the ability to detect these patterns is heavily influenced by how the data is visually encoded, as well as the tools and technologies used to produce the visualization. Additionally, as seen before, the methods used to analyze, categorize, and quantify the data also impact how it is visually represented and therefore the meaning produced through the visualization.








Img. 15  This shows a graphical user interface developed to enable the users to interact with muliple parameters and data variables in one visualization. With this interface I explored different strategies to develop alternative ways of visualizing and interacting with data. A series of buttons was designed to be able to see or not see the days of the conversation, another for the depth (comment-reply relationship) of the conversation, and another for different variables like the polarity and subjectivity values, the time, and the votes that each comment received. The goal was to move beyond traditional typologies or layouts, to incorporate more nuance and allow for different interpretations of the same data or information, to position the user as a producer of information, and to create an open tool that anyone can use and reflect on, without requiring prior experience in reading complex visualizations.



Img. 16  The users are enabled to zoom in and out and turn around the visualization.

Img. 17 When the ‘Polarity’ button is selected, polarity values analyzed using VADER are displayed using color coding: from dark red (most negative) to grey (neutral) to dark green (most positive). A light red dot indicates a slightly negative comment. The users are able to critically assess the analyzed polarity value of for each comment when seing the text and not only the color coded dot.







Img. 18 When the ‘Interactions’ button is selected, users can see the interactions between comments and replies. This also helps identify influential comments in the conversation.




Img. 19 When the ‘Time’ button is selected, an axis displays comments according to the time of day they were posted (from 00:01 to 23:59).





Img. 20 When the ‘Subjectivity’ button is selected, an additional axis displays comments based on both the time they were posted and their subjectivity value as analyzed by TextBlob.
 






Img. 21 When the ‘Votes’ button is selected, another axis is used to display the number of votes each comment received on Reddit.






Img. 22 Each user may have different interpretations and make varying findings depending on the variables or parameters they choose to select or ignore. This interactive visualization should also highlight that the identification of insights or patterns, and the production of meaning, are always influenced by numerous choices—both enabled and constrained by the tools and technologies used to create and interact with the visualization, as well as the decisions made by the designer and the user.