About this Blog-like Thing

By Matt Lavin

September 01, 2016

Since the late 1990s, the business world and academia alike have seen the rise of the data scientist. Though the term data science dates back to early 1960s, figure of the data scientist took on additional cache with the popularization of the World Wide Web and the data revolution that accompanied it. 1

These conceptualized experts were said to be "the information and computer scientists, database and software engineers and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection."2

In 2012, Harvard Business Review named data scientist the "sexiest job of the 21st century."3

The data humanist, in contrast, has not arisen as a figure. Instead of data humanities, we have seen the rise of digital humanities as a catch-all term for humanities computing, digital librarianship, digital project development, digital pedagogy, and humanities-based data curation.

Data remains the backstop of much if not all digital humanities work, either directly (in the case of, say, Matt Jockers' work on sentiment analysis) or indirectly (as in, say, the numerous critiques of Jockers' work on sentiment analysis).

This particular topic is of interest me because I work in a space where digital humanities and digital media intersect. I'm a Clinical Assistant Professor of English and Director of the Digital Media Lab at the University of Pittsburgh. My most recent scholarship focuses on the intersection of digital humanities, book history, and U.S. literature. My programming languages of choice are Python and javascript, will a little R peppered in. My web stack includes, Ubuntu, Docker, Nginx, WSGI, and Python. I use Lektor and Flask for websites, and I like Jupyter Notebooks.

This is me, and this is me, and this is me, and this is me.

I should probably mention that the Data Humanist won't be a very good blog, because it's barely a blog at all. Most blogs (even really good digital humanities blogs) feel a certain amount of pressure to post regularly and, to accommodate that pressure, they sometimes feel the need to slap together very short or shoddy or off-topic filler to give the appearance of an active blog. I don't mean to be overly critical of this impulse; I'm just saying it's stupid and I hate it. :)

So I will aspire to do less posting but to release posts will a bit more heft. Hence the phrase "blog-like thing."

Content will begin in October [edit: well, that didn't happen. Here it is mid-December, and I'm about to publish my first post.] and follow approximately once per month. I plan to do this for 12 months, and then reevaluate my approach.

I hope this blog-like thing will be a space where data isn't a dirty word. I plan to write about scraping, wrangling, curating, analyzing, visualizing, teaching, and critique--the entire lifecycle of humanities data. I plan to integrate multi-modal content and dissect it. I plan to share Jupyter Notebooks via Github so people can see my code.




1. Press, Gil. "A Very Short History of Data Science," Forbes (May 28, 2013).

2. "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century," September 2005, National Science Foundation, NSB-05-40.

3. Davenport, Thomas H., and D.J. Patil. "Data Scientist: The Sexiest Job of the 21st Century," Harvard Business Review (October 2012).