Updates coming to my Infinite Sentiment app!

A month ago, my hobby project Infinite Sentiment started out as a simple client-side Nextjs web app that took a random passage from the novel Infinite Jest, ran it through a Hugging Face Transformers.js sentiment analysis model (fully client-side), and displayed the results along with the passage. That version can still be seen on the main branch here: https://github.com/cipherphage/Infinite-Sentiment.

Since then, I've been working on a separate branch called sentiment-viewer (https://github.com/cipherphage/Infinite-Sentiment/tree/sentiment-viewer) which generalizes and extends the original web app primarily in three ways: 1) it now iterates over the whole array of passages so that an entire text can be analyzed, 2) it now utilizes Intl.Segmenter from the Web API to facilitate analysis of a broader set of unicode strings as well as to leverage the Segmenter's built-in options of granularity (specifically, words and sentences), and 3) it now displays an interactive heat map like visualization of the results. I wrote about an older version of the sentiment-viewer branch previously: https://world.hey.com/cipher/infinite-sentiment-client-side-sentiment-analysis-of-novels-using-transformers-js-f9da785b.

After achieving some interesting results using the sentiment-viewer version of the app to compare four novels from the 20th century (see the link to my previous blog post), I decided to keep working on generalizing and extending it. Today, I am at the point where I need to spin off parts of the app: I will be breaking this project up into a separate Nextjs web application, a React-Transformers-Sentiment-Analysis-Dashboard component library, and a React-Transformers-Sentiment-Analysis-Visualization component library. I will package the libraries into NPM modules and publish them.

The main reasons to do this are as follows:

Decoupling the UI provides greater flexibility to those who wish to use the components.
- An engineer could easily utilize only the visualization library by providing the appropriately structured data to it, which perhaps they already have existing on-hand.
- Perhaps someone has a use case for the dashboard components without any visualization, preferring only to read the quantitative analysis, manipulate the data, and download or transfer the data elsewhere.
- For these reasons, the Nextjs client-side web app is probably not nearly as useful to an engineering team as the UI components could be. On the other hand, the web app could be very helpful to non-engineers who just want to analyze their text files.
Decoupling provides greater flexibility and efficiency for me, as the developer and maintainer:
- It's easier to keep track of work priorities and technical debt because the individual projects have a smaller scope. This is, essentially, the core aspect of the agile methodology.
- It's easier to generalize and extend the components if they are already decoupled, because it's easier to understand the smaller code bases (i.e., it's easier to follow through on making changes).
- Perhaps most importantly, it will be far easier to test the code. Refactoring the React components so that the logic can be tested in unit tests will be a simpler task. Generalizing the React components in the decoupling process will also make writing component tests simpler.
Additionally, the projects might be more useful and popular individually than they would be together as a single web app.

Infinite Sentiment on the sentiment-viewer branch has four main parts:

The Nextjs client-side web app.
- Fetches the text file used in the analysis and parses it into passages (these are strings that have line breaks after them, which are more or less individual paragraphs in a typical article, essay, or novel).
The Hugging Face Transformers.js web worker.
- Machine learning and analysis can take time. We utilize a web worker so that Transformers can do its thing on a separate thread in the background while our UI remains responsive.
The sentiment analysis dashboard React UI components.
- Button components and their logic for different kinds of sorting as well as different levels of analysis (i.e., words, sentences, or passages) and pausing/unpausing the analysis.
- Info components that display real time information about the progress of the analysis.
The sentiment analysis visualization React UI components.
- Visualization components which are primarily colored squares in a grid. The colors correspond to the positive or negative sentiment score. Each square also contains the text and related data for a tooltip child component.
- Data slide show components enable users to move through the analyzed text one segment at a time and see the text and related data. It has a slide show like UI with left and right arrow buttons on either side which is why I call it the data slide show.

The four major parts outlined above will end up living in three separate repositories/projects. First, the Nextjs client-side web app. This application has two main reasons for existing: 1) it allows me as the developer to quickly spin everything up, run changes, and manually test, and, more importantly, 2) it provides a simple and useful application that a non-software engineer may be able to utilize for sentiment analysis of texts. What are some changes I'm planning for this application? Aside from the obvious change of separating out the UI components which I'll get to below, this web app needs to 1) provide users a way within the UI to specify the text(s) to analyze, 2) provide users a way within the UI to analyze more than one text file, and 3) as a prospective nice-to-have feature, I could provide users a way within the UI to change the text passage-level parsing, for example by having an input to accept a user provided regular expression. Before committing to feature #3, I should research existing text parsing libraries and see if it makes more sense to create a UI that exposes text parsing options to users (in fact, as I'm typing this, I like the sound of it a lot more than my original idea for feature #3, however, it remains a nice-to-have and would actually represent a fourth repository/project). Since this app is entirely client-side there is little in the way of security that I need to consider. If someone feeds in a bad text file or enters a regular expression that causes an infinite loop then they will only be stopping themselves from using the app (and can just close their browser tab).

Second, the sentiment analysis dashboard components will live in their own React-Transformers-Sentiment-Analysis-Dashboard NPM module and the Transformers.js web worker will be an optional prop passed into its parent component. Why make the web worker optional? This is to allow users of the library greater flexibility. They will be able to choose if they want to process the data before sending it to the component library, or they could provide their own web worker. If they don't do either of those things, then my version of the web worker will be available for them to use to conduct the analysis. Here is a gif showing some of the current features provided by the sentiment analysis dashboard being used (the informational text at the top and bottom as well as the two rows of buttons above the visualization):

In addition to the existing features, the following new features are planned: 1) provide users with more quantitative data about the analysis (a must have feature), 2) provide users with a way to download a CSV or JSON of the analysis data (a must have feature), 3) provide users a way to choose the sentiment analysis model from those available in Transformers.js (a must have feature), 4) provide users a way to load an existing CSV or JSON of analysis data. Note: the first three are must haves, in my opinion, so I will work on those first, once I complete the work of extracting these components into their own repository.

Third, the sentiment analysis visualization components will live in their own React-Transformers-Sentiment-Analysis-Visualization NPM module. Here is a gif showing some of the current features provided by the sentiment analysis visualization being used:

In addition to the existing features, the following new features are planned: 1) provide users with other visualization methods such as charts, tables, and highlighted or annotated text (a should have feature), 2) provide users a way to directly compare the sentiment data from different levels of the same text (e.g., compare the sentiment of a sentence to the sentiment of the passage it came from), 3) provide users a way to compare the sentiment data from different models of sentiment analysis, and 4) provide users a way to compare the sentiment data of different text files.

As you can see there is a lot of exciting work ahead. For something that started out with such humble beginnings it's really satisfying that it might turn into something useful and in relatively short time. It is a testament to the usefulness and accessibility of the underlying tools and technologies that this is possible at all!