In my last post, I wrote that I would have more to say about NLP/NER methods of categorizing and extracting information from book reviews. I'll come back to that subject in my next post, but I wanted to share the transcript of my presentation for the "New Horizons in Network Analysis" panel at the 2019 meeting of the Association for Computers and the Humanities (ACH). I co-presented with John R. Ladd, so I've included our shared intro and conclusion, as well as a link to John's portion of the presentation.
In his foundational 1973 essay, "The Strength of Weak Ties," Mark Granovetter helped to launch modern social network analysis by advocating for the ability of networks to act as a "micro/macro bridge," a medium-scale approach that shows how small-scale occurrences aggregate into large-scale phenomena. Lately I've become captivated by how fresh his argument still seems. Our colleagues have so far presented today on just a few of the many ways network analysis, or graph theory, can add to literary and historical knowledge by exploring medium-scale phenomena. Beginning with the first humanities network visualization projects and expanding into increased use of graph theory principles and metrics, humanities networks projects have grown up alongside the digital humanities. Though "micro" close reading approaches and "macro" statistical studies have seemed to form opposing poles of analysis, network theory has developed a strong descriptive vocabulary for medium-scale social, historical, and literary phenomena.
We now understand networks, as Granovetter did, not only as a visual aid for relational thinking but as a set of mathematical measures that can give us greater purchase on large and small questions in our field. In fact, as we will argue in today's presentation, we have methods at our disposal today that can be combined with network metrics to great effect.
I speak of the set of statistical techniques often grouped under the umbrella of "machine learning." In literary studies, statistical modeling---including both unsupervised clustering methods and supervised classification tasks---has most often meant processing and modeling data pertaining to words and word counts, but there is no reason that these same techniques can't be applied to network metrics instead. That is to say, networks give us a great deal of measurements, and these measurements can be aggregated to allow us to sort texts in new ways.
The core of our approach is simple: network metrics can be used to create a vector, or what Ruth and Sebastian Ahnert have called a fingerprint, for an individual network or even an individual node. By calculating over all of those vectors at once, networks can be sorted into various groups. And let's be clear: just as with modeling the words within literary texts, we learn more from the process of modeling than from the output of the model. We engage in this process to get a better sense of the set of features we're examining, and in this case those features tell us more about networks and the systems those networks describe.
For my part of this presentation, I'm using methods similar to John's, but I'm applying those methods to an entirely different genre of texts from a different century and different country.
My most recent scholarship builds on previous work about how hierarchies of cultural taste, in the United States and Europe especially, were established in the second half of the nineteenth century and challenged/complicated in the first half of the twentieth century.1
In what seems like a contradiction, book reviews largely moved away from sweeping remarks about what made literature good or worth reading, and focused more on summarizing and making discrete recommendations.
Did book reviews stop mediating culture hierarchies, or did their approaches become more subtle? How can we best understand the roles they played? How can we, by extension, better understand how periodicals mediated cultural hierarchies, and how such mediations changed over time? These are the central concerns behind the work I'm presenting today.
Prior analysis of book reviews has largely focused on one of three models: individual close readings; one to many comparisons of authors and periodicals, or analysis of authors and periodicals as many among many. My work on machine learning with network features aims for he third of these.2
I generated the dataset for my portion of our experiments by text-mining documents tagged as reviews from the American Periodicals Series, originally published between 1880 and 1925. I have a licensing agreement with Proquest that allows me to work with 11.5 million digital objects from 1,887 periodicals, with about 300,000 objects pre-tagged as reviews. For my network, I wanted to be able to show relationships among authors based on which periodicals reviewed them, or relationships among periodicals based on which authors they reviewed.
To convert book reviews to nodes and edges, used a combination of natural language processing and named entity recognition to tag reviews as single work reviews or multi-work reviews, and then to isolate the authors being reviewed.3
I then converted tagged reviews to network data. There are two node types, authors and periodicals, and edges represent reviews of a given author in a given periodical.4 I used projection to create a network of authors, with the number of shared authors as edge weights, and an author network, with the number shared periodicals as edge weights. In any case where an author was reviewed by a periodical more than once in a single year, I increased the edge weight between that author and the periodical (+1 for each review). I divided data into separate networks by year, as the slide depicts.5
Next, I extracted measures of each network's overall structure, just as John described, to see if these network traits he used could predict the date of each author-periodical network.6 As a general rule, almost all real world networks evolve over time. Borgnat and co-authors summarize that "it is possible," therefore, "to consider an evolving network as a time sequence … of networks (snapshots) and to study each of these independently."7 Such snapshots, they argue, convert each network measure into a time series, which can be "studied using signal processing notions."8
To explore the relationship of these network measures to a network's "date signal," I performed linear regression on my book review networks. As many of you probably know, linear regression is a very well established, supervised learning approach, and it's especially good for fitting one or more predictor variables to a linear variable.9 As Charles Macal has said, "we model for insights, not numbers."10 However, in order to argue that the derived insights are meaningful, we need to establish a baseline accuracy level.11 I employed a leave-one-out strategy in which all but one sample is used and the training set, and one sample is the test set. Each sample is left out once, and the model is retrained every time on the rest of the data.12
Under these conditions, the linear model was able to predict the year of networks with authors as nodes with a mean accuracy of +/- 8.16 years.
The median accuracy was +/- 6.78 years, and the mode accuracy, when predictions were rounded to their nearest integer, was +/- 2 years. The variance was 39.64, and one value (the prediction for 1883) scored as an outlier, off by 30.24 years. This outlier was unsurprisingly the worst prediction, and the closest prediction was within 0.106 years of the actual date. The r2 score for the regression without any holdouts is 0.61, where 1.0 is the maximum possible value, and a 0 would describe a model where the predictor variables provide no information.13
When projection was reversed to treat periodicals as nodes and authors as edges, the linear model was able to predict network years with a mean accuracy of +/- 6.67 years.
The median prediction accuracy was +/- 6.06 years, and the rounded mode was +/- 11 years. The variance of these predictions was 25.18, and one value again scored as an outlier, this time the prediction for 1880, which was off by 21.86 years. This was the worst prediction, and the best prediction was within 0.79 years of the true date. The r2 score for the regression without any holdouts is 0.71.
So, what does all this mean? There's much room here for improvement, but both models are performing well above the expectations of random guesses, or even if we made random guesses bounded by the date range of our data. There are outlier predictions in both models, and variance is high. That said, we are getting information about the year a network represents using only statistics that measure each network's overall structure.
To isolate the effect of each predictor, we can turn to the coefficients.14
With the authors-as-nodes network, density is by far the largest positive predictor. Each time the density score increases by one point, the predicted date will increase by about 1211 years. Density in my model was expressed as a value between 0.0 and 1.0, so in reality we would see much smaller prediction increments. The next most informative coefficient is average clustering, followed by average path distance, which reduces the predicted date as it increases. Average degree, triadic closure, and radius provide less predictive information, and diameter is effectively irrelevant to our model.15
Over time, more authors share at least one periodical between them.16 At the same time, since average clustering is also increasing over time, the completeness of communities is increasing. The average node and its neighbors are more likely to form a complete graph. Meanwhile, the average path distance—or the average number of steps it takes to connect two nodes (following the shortest path between them) for all possible pairs of network nodes—is decreasing.
With the periodicals-as-nodes network, density remains important, but not nearly as important as it was in the first model. Average path distance is the second largest coefficient, and in remains a negative predictor of date. A network's radius is the third largest coefficient, and diameter remains the least valuable, but now, all seven network measures are at least a partial factor in the predictions.
Over time, as with author networks, density is increasing. Periodicals are more likely to share at least one author between them, and the average number of steps it takes to connect two nodes is decreasing. Further, as the date increases, a given periodical is marginally more likely to form a complete network with its immediate neighbors. If nodes A and B are connected and nodes B and C are connected, nodes A and C are more likely to be connected (triadic closure). Since the radius of periodical networks is also increasing over time, we can say that the network is becoming more eccentric.17
Throughout the data, there is substantial variability (or noise) foregrounding the trends (or signal), but a signal seems to be present. This preliminary result suggests that the periodicals-as-nodes networks provide more predictive capacity than authors-as-nodes networks.
To return, then, to hierarchies of taste, I see a lot of potential. A more developed experiment could build on this work to address several questions. Is "who gets reviewed" consolidating over time? Or, are sets of periodicals becoming more cliquish? Is homophily is changing over time (along genre lines, by periodical type, by author gender, etc.)? Are there patterns to authors garnering single-work reviews vs. multi-work reviews?
Even though book reviews were trending toward summarization with light commentary, they may have shaped wider tastes. If so, they did it more subtly, by association and affiliation. Network measures, over time, and in the aggregate, may help us see such patterns.
In this presentation, our goal was to demonstrate that machine learning methods like clustering and classification can be applied to measures of entire networks, and to gain insights about nodes in a network. For our examples, we deliberately chose networks represents different types of texts, different times periods, and different geographies. We have demonstrated some of the potential payoffs of applying machine learning methods to networks, with carefully considered network measures as our feature sets. There's much more to be done with these methods. Potential next steps for our work include:
Thank you all for your time and attention.18