Three disciplines will come together in this collaboratory: Humanities, Art, and Technology, but it will focus on an emerging “non-discipline,” as Johanna Drucker calls it, “Digital Humanities.” Before delving into an extensive explanation of that field and explaining the road-map that Digital Humanities will provide for our collaboratory’s first two years, we offer a word about the other two disciplines at stake, Art and Computer Science.

From Ira Greenberg’s perspective, digital art differs only in degree but not in kind from traditional art forms such as painting: the computer is a medium, as is paint, and the artist deploys it equipped with conventions, a shared language, for stating pictorial ideas. Each artistic act consists in an intuitive wielding of notations coupled with some kind of contingency – randomization of a sort. Technology can now sustain art by providing generative data sets.

Unfortunately, when we say “technology,” we speak of specialists in engineering and computing who are removed from the specializations in which their technologies operate. When those technologies function merely as a tool – when they are merely functional rather than generative, as is a medium – this division of labor works very well. However, in creating digital art and digital archives, the latter being one function of the digital humanities, such a division is inimical to the work at hand. Technological best practices and standards for excellence must be modified to accompany something else, and that something else is best described as human cultural history.

The Humanities are devoted to preserving and interpreting the human record. At this moment, libraries, information technology services, and humanities departments are engaged in the task of digitizing printed texts and manuscripts for the sake of preservation, but preservation in two senses of the word. In one sense, making digital copies of textual materials preserves them in the face of disintegration, the hard drive having proved to be much more resilient than paper. In another sense, since we are now faced with a deluge of information, only those portions of the digitized cultural record that are machine readable will be accessible. For example, suppose one were interested in researching a topic in the history of Western medicine. Which of the 30,000 16th-century Anglo-American texts, the 80,000 18th-c., or the 400,000 19th-c. texts are relevant to that research? No human eye can encounter those texts one by one in order to determine relevance, and any text deemed “irrelevant” is as good as lost. Disciplines solve the problem by limiting research to specific periods in time, or pre-selecting “canonical” texts. As many researchers have begun to notice, disciplinary apparatus filters out those texts that cannot be neatly categorized, for instance, by period or discipline, or that disrupt tales of progress. Librarians solve the problem of sorting through the data deluge by creating metadata about texts, but records of title, author, and publication date (when available) offer little substantive information about the texts that they represent. Properly encoded data provides finer-grained determinations of relevance as well as enabling one to engage in “distance reading” – taking a long view of data, an interdisciplinary view. The major fields of endeavor that teach machines to read huge amounts of data deeply and well are textual coding, software engineering for the semantic web, and visualization. Three experts in those fields who work extraordinarily well together are here at Miami University, and our research has naturally brought us together to help each other.

Laura Mandell, general editor of the Poetess Archive, technical editor of Romantic Circles, Associate Director of NINES, and co-Director of 18thConnect (coming), works in the field of English literature in document encoding, transforming, and interoperability. Ira Greenberg is a digital artist who has been engaged in developing the programming languages deemed most promising for visualizing textual information by scholars attending THATCamp, the annual un-conference put on by Digital Humanities guru, Dan Cohen. Ira Greenberg published the first of the major new books about processing, and is currently hosting at Miami a processing summit in which the language developers gather to work together toward future releases. Jerry Gannod is one of the very few Computer Scientists whose research interest in software engineering, the semantic web, and semantic web services has led him more deeply into the field of Digital Humanities, to developing encoding software and tools for manipulating texts that extend even to mobile devices.

Together, the three of us are interested in finding solutions to some of the major problems confronting the field of Digital Humanities at this time, detailed in the environmental scan below. Why? Confronting those problems offers significant research opportunities, new ways of understanding human creativity, past, present, and future. We propose to create a collaboratory – a laboratory space designed to foster collaboration among professors in English, Art, and Computer Science along with their graduate and undergraduate students – for conducting research into coding, semantic web applications, and visualization.

Environmental Scan:

Considered together, the report commissioned by the Council on Library and Information Resources about Digital Humanities Centers and the most recent summary of proceedings at the 2008 meeting of the Text Encoding Initiative Consortium emphasize one major issue confronting Digital Humanities: Silos. As Diane M. Zorich’s report assessing the effectivity of Digital Humanities Centers across the U.S. says, “The silo-like nature of current centers is creating untethered digital production that is detrimental to the needs of humanities scholarship.” It is detrimental in two ways: first, scarce resources are being used to duplicate efforts, each scholarly project producing its own software and workflow, and second, digital products cordoned off from each other in institutional silos are rarely interoperable. Just as one cannot read 80,000 texts, one cannot move from digital archive to digital archive, learning each one’s individual system of categorization, and then searching each one accordingly. There is a group that has addressed precisely this problem. The Text Encoding Initiative was first launched at a Vassar Conference in 1987, at an NEH-funded meeting of scholars in humanities disciplines, library sciences, and linguistics. In the “Proceedings of the 2008 TEI Symposium,” Susan Schreibman, Director of the Digital Humanities Observatory, made the following comments: "The TEI Guidelines have become the de facto standard in encoding humanities texts in both the scholarly and library communities." Her statement is true: from the Modern Language Association to the American Historical Association, every major organization in humanities disciplines recommends adhering to the TEI Guidelines for creating electronic archives. She continues:

The TEI community is at a turning point. The value of the TEI as a standard is increasingly being recognized in the academic community and the cultural heritage sector. Yet, there are still major impediments to the TEI being more widely adopted as a standard, to the texts already in existence being reused in environments beyond which the original creators intended, and in the development of specialized tools for specific purposes. In order to promote the first two points, it is more critical than ever that our texts be amenable to semantic interchange: that they are prepared in such as a way that they can be more widely shared between and amongst projects.

Encoding texts properly will make “semantic interchange” possible, but creating archives that are preeminently sharable requires more work than that.

Two Mellon-funded initiatives have been launched to combat the “silo” problem: the Bamboo Project (http://projectbamboo.uchicago.edu) and SEASR (http://www.seasr.org). Bamboo conducts a series of international seminars devoted to “advanc[ing] arts and humanities research through the development of shared technology services.” They foster collaborations between institutions but are also working to break down barriers among libraries, IT Services, and scholars within single institutions. SEASR or the Software Environment for the Advancement of Scholarly Research fosters collaboration through empowering scholars to share data and research in virtual work environments. By doing so, SEASR eases scholar’s access to digital research materials, now stored in a variety of incompatible formats. Collaboration, sharing, and interoperability are the key mandates for future development in Digital Humanities, and it is this mandate that CHAT is designed to fulfill.

What CHAT Will Do:

CHAT will operate under the auspices of the Armstrong Interactive Media Studies Program, introducing AIMS Faculty to the fields of visualization and digital humanities and developing courses for that program.

A. Research

This co-laboratory will encode a digital archive of popular poetry written in Britain and America between 1700 and 1900 and then use it to experiment. We will experiment with visualizing a huge number of poems in various aspects determined through coding as well as data-mining that is both directed by and productive of ontological categories. The visualizations will be created based on aesthetic criteria as well as principles for the effective visualization of data (see Bibliography A). We will engage in software development that allows non-computer experts among humanists to perform the coding, and will solicit textual encoding from scholars throughout the humanities disciplines.

Ira Greenberg and Laura Mandell have presented several types of poetry visualization tools at two Humanities conferences: the Modern Language Association conference in Chicago, IL (December 2007), and the Benjamin Haydon (art and literary history) conference (November 2008). The audiences have helped us to generate research questions, not only about poetry – its metrical, visual, auditory, and intellectual effects – but also about the process of encoding. CHAT’s first project will be to take one famous poem, John Keats’s “Ode on a Grecian Urn,” and commission 50 scholars to code it. We will be able to visualize the process of subjective decision-making involved in making choices about how to code the poem, and thus will produce research about what kinds of things happen to humanities documents as they are processed “logically.” We will commission a larger number of scholars to encode different poems, thus creating a data set that will allow us to ask innumerable research questions about poetic form and function during the height of its popularity in Britain and America. Ira Greenberg published the first of the major new books about processing and has a second book on the language due this summer. He is a member of the Processing language development team and is currently hosting at Miami a series of processing development summits in which the language developers gather to work together toward future releases. Ira Greenberg's digital art calls upon data of various kinds from multiple disciplines. He creates generative software art, such as his Protobyte series, that are both original aesthetic creations and also visualization agents, capable of giving original and descriptive form to any data system.

The work done by Greenberg in this collaboratory will constitute research in the Fine Arts that simultaneously enables research in the field of Digital Humanities but also in other disciplines, the Humanities only a subset among them. In Greenberg’s work, any data source can be input, and the Protobyte system can evolve using the data sources as an initial rule/conditions. For example the initial parameters might be a list of elements (He, O, N, C, etc), a painting, a poem, network activity, or a real time performance. The system would parse the data (a middleware translation problem to be solved in conjunction with our collaborators in computer science and digital humanities), and the system would then run autonomously, with runtime opportunities for inputting additional data and/or filtering subsets of the data. Ideally, the output of Greenberg’s Protobyte Visualization System would be both useful from a data modeling/analysis standpoint and also as an aesthetically valuable work of art. This project will thus expand both the field of visualization and digital arts, solidly affixing form to function at a very high level, rather than in a more conceptually facile way that more commonly happens in digital art/visualization. As in the case of Greenberg’s Protobyte project, true research in the digital versions of our discipline, we believe, whether it be art, technology, or humanities, will eventuate in simultaneous breakthroughs in other fields. We ask for the space necessary to collaborate because day-to-day interactive collaborations among producers of data sets – the humanists and scientists who will walk into our lab, we hope, looking for solutions -- computer scientists, and digital artists can produce new research.

The three collaborators and their students already have a history of working together successfully. Jerry Gannod, his student Holly Connor, and Laura Mandell have developed a set of macros that can be used in Microsoft word which allows non-experts to code texts as easily as they would type them. This product is already in use by a team at Nottingham University in the UK. Jerry, Holly, and Laura hope to demonstrate this tool at the Digital Humanities Conference of 2009 to be held at the University of Maryland in College Park. We have issued a preliminary announcement that the tool is coming on the NINES site. This tool will help us generate our data set.

Ira Greenberg has created numerous versions of the poetry visualization tool (http://miamichat.wordpress.com). He intends to create more, and to enlist other Processing programmers and digital artists to create their version of the tool. Jerry Gannod will be investigating the systems for categorizing poetic elements as well as machine-generated cruxes in the data in order to create ontologies specific to poetry. Those ontologies as well as the rich metadata associated with poetic texts will allow us to create visualizations based on objective criteria. As Laura and Jerry also create information-visualization charts, graphs, and schemes, we will be able to compare the numerous artistic renditions of poetry and understand more about how data-rich environments can be understood through artistic means.

B. Service to the field of Digital Humanities

Working together, we will be experimenting with various ways of encoding text that goes in and various ways of visualizing and categorizing what comes out. Not only will we be able to create visualization tools open to and available for public use, we will be able to determine what kinds of codes work best in the process of using them for research by computer scientists and for visualizing them in multitudinous ways. We will be able from this collaboration to create Miami’s guidelines for best coding practices. That is, the TEI Guidelines offer a huge number of coding elements, and an even greater number of possible coding techniques: we will be able to reduce that immense set of possibilities to minimal requirements for creating interoperable – SHARABLE – documents.

While Bamboo is focusing on reworking institutional structures and SEASR on sharing software and tools, Miami CHAT will focus on creating sharable documents. Scholars are loath to follow time-consuming coding standards that may disappear in a year or two. The TEI Consortium has created a standard that insure that each individual scholarly digital archive can be upgraded as medial environments evolve – as long as the scholar or institution supporting that archive goes to the expense needed to push it into new formats. We will instead develop minimum coding necessities for future sharability so that ANYONE could pick up the texts from a scholarly archive, no matter who originally developed it, and ingest them into a massive-text research environment.

Finally, CHAT is already forging connections with other digital humanists such as Lev Manovich, director of the Software Studies Initiative at UC San Diego (http://lab.softwarestudies.com/): this lab has received NEH funding for macroanalysis of huge datasets; they would like to work with us in developing tools for microanalysis, once visualizations have determined what specific zones of information might be most interesting.

C. Service to Miami University

In addition to furthering the research agendas of the three major participants and the graduate students who work with them, CHAT will offer space and technical expertise to faculty who apply for it. We will issue calls for proposals at the beginning of each semester for work to begin the following semester. The selection process will be rigorous and demanding, and no “service” project that can be implemented through IT Services will be accepted: only projects eventuating in both original research and software development will be accepted.

CHAT wishes to develop a “Digital Humanities” track for the new undergraduate Software Engineering major and for Ph.D. candidates in English. This track will involve taking one course each taught by Jerry Gannod, Laura Mandell, and Ira Greenberg. We hope to develop that track further as a possible area of focus for the recently launched M.S. in Computational Science and Engineering, and ultimately, like King’s College London, to offer a Ph.D. Degree in Digital Humanities to help create the Humanities faculty of the future. CHAT will bring scholars from around the world to work on specific problems in the Digital Humanities. Ira Greenberg’s “Processing Summit” is a prime example of how this will work. Other similar such summits could include: a summit on text delivery via mobile devices, a summit on developing semantic web applications, a summit on transformations of genre. Those are just a few possibilities.

CHAT will also be engaged in major grant writing endeavors. Laura Mandell and Jerry Gannod have currently submitted an NEH Digital Initiatives grant; our next grant proposal will be for the NEH Digital Institute, to host the next summit sponsored by CHAT. We will also apply for an NEH Preservation and Access grant this summer, and will seek funding opportunities through NSF for software engineering research.

To insure the optimal functioning of CHAT, Laura Mandell will conduct the research recommended by the CLIR report into the effective models of collaboration developed by the sciences (see Bibliography C).

Next Steps and Needs:

Laura Mandell, Jerry Gannod, and Ira Greenberg are currently meeting once a week during this Spring 2009 semester for a “CHAT Seminar” in which we read and discuss the newest research in the field of information visualization (Bibliography A) and data mining (Bibliography B). Several undergraduate and graduate students planning to work with us next year will attend some of these meetings. Next Fall, we will open the CHAT collaboratory, consisting of space plus four workstations, IMACs. We would like to use two RAs for the Collab, one allotted to Laura Mandell for 2009-2010, and one allotted to Jerry Gannod. Other undergraduates and graduate students working in the Collab will be taking independent study courses with us. Laura Mandell has been given a full year of course release by the College of Arts and Science.

Conclusion:

In the medieval university, a “professor” stood at a podium and read aloud from a manuscript book. The students copied the books – or, if wealthy enough, paid scribes to copy them. Out of manuscript transmission and print production of texts in the codex (book) form, the modern Humanities disciplines have emerged, and the systems for interpretation specific to philosophy, history, literature, and anthropology are one-and-all codex-based. Those systems are going to change. How will we know them? Well, first by copying text into this representation- or modeling-machine, the computer; next by allowing the machine to read them; and finally by determining where machine-reading breaks down, precisely where human creativity and subjectivity must come into play for the sake of understanding and preserving our cultural past. CHAT will contribute to the development of this digital humanities universe.

Printable Version

home