Recap - Collections as Data: National Forum 2

On May 7-8, 2018 a group of librarians, archivists, researchers, and technologists gathered at the University of Nevada Las Vegas for a second Always Already Computational: Collections as Data national forum. The goals of the forum were twofold: (1) to provide a public program showcasing the opportunities and questions that emerge in the creation, management, and use of collections as data; and (2) to draw on the collective expertise and wisdom of the invited participants in order to “reality check” the project’s direction and in-progress deliverables.

Public Program

The forum opened with eleven presentations by forum participants active in collections as data. Several of these projects have been published as Collections as Data Facets. The panels were livestreamed to a public audience. The panel line-up and prompts can be found here.

The livestream recording is available below.

At the risk of oversimplifying, we want to share out a brief summary of the proceedings, beginning with a taste of these exceptional presentations.

The first panel addressed the populations and people behind collections as data efforts, responding to the question: “Who are collections as data for?” Encouraging libraries to move towards designing collections as data to serve “anyone” rather than “everyone,” Shawn Averkamp detailed the design decisions that shape large-scale aggregation of collections and collection-data at the New York Public Library. Dot Porter recounted how OPENN transformed her role as a curator. Bergis Jules shared ideas coming out of the Documenting the Now project to work with and for the individuals implicated in social media data collections.

In the second panel, speakers delved into collections as data potential and motivations. Micki Kaufman took us into a 3D representation of an archival corpus on Henry Kissinger, where it becomes possible to use color and depth perception to interpret and understand relationships amongst historic events on a timeline. Inna Kouper explored how we might gain insight into the coverage of history and culture within a mega corpus like HathiTrust digital library. Greg Cram spoke about a multi-year New York Public Library project to digitize Copyright Office records, an effort capable of informing rights status and access possibilities with other collections as data. Reflecting on Data Refuge and various humanities projects, Laurie Allen asserted that collections as data forces us to ask a different set of questions than we usually do, to consider different ways of focusing resources on collections and to reconsider what we need from collection platforms.

The third panel tackled the “how” of collections as data, emphasizing stories of collections as data implementation. On Panel 3, Meghan Ferriter shared a status update for LC Labs, how they are prototyping the creation of transformative experiences using the Library of Congresses’ APIs with a diversity of users, including internal staff. Mary Elings described how the Bancroft Library is making strides to create “research-ready” data. Helen Bailey relayed the lessons learned by MIT Libraries, including the critical work of maintaining a current understanding of user needs, and plans to explore machine learning. Veronica Ikeshoji-Orlati discussed strategies at Vanderbilt to address sustainability, audience engagement, and defining librarianship as data consultancy.

And that was all before lunch on the first day!

Reality Check

We were openly seeking input to strengthen and refine the project’s outcomes and the group of experts assembled in Las Vegas generously engaged in the process.

After lunch, we all oriented ourselves to small group exercises designed to test the project’s personas and an early draft of the project’s summative work, a guide to “making” collections as data. While the feedback on our progress was overall very positive, the first exercise surfaced loads of useful insight from the groups on how to make it better - just as we had hoped. For example, the faculty/researchers in the room reminded us of their priorities – research output/creativity and technical skill acquisition come first, before peer review acceptability and classroom use – and encouraged us to maintain an orientation around the use cases and associated research methodologies. We also received helpful suggestions on making it easier for readers to find relevant entry points to our material and then navigate across it.

In a second exercise on Day 1, we learned from the groups what they perceive to be high value actions that institutions can take to facilitate collections as data work, and whether these actions are resource intensive. Rising to the top is the need to display explicit, unambiguous, clearly defined licenses for using and citing the data as well as to provide documentation on the origins of the data and its processing documentation. Interestingly, much of the discussion centered on policy work and rights review processes rather than technological limitations for collections as data, pointing to larger access issues for digital library resources in general that remain unaddressed.

At the close of Day 1, it was clear that the group was stimulated and fully primed to dive deeper into collections as data.

Closing Out

Day 2 of the forum was devoted to thinking about future directions for collections as data. Like the conclusion of a meeting when everyone restates their assignments and “action items” the group produced to-do lists for themselves and others through a series of three activities. These lists will be released moving forward.

The activities were designed to collect a wide range of thoughts on three questions:

What advice would you give to workers at an institution that is getting ready to start collections as data work to help them get some easy wins in the short term?

Within the next 1-6 months, what can you do within your own workplace to realistically and relatively easily improve / move forward collections as data work?

Within 5-10 years, what steps could be taken that will position institutions to deeply support collections as data work at scale and across the organization?

All three activities followed a common process. Forum participants were given the question and asked to think about it for five minutes. Next, participants were asked to pair up and share their thoughts and refine them for ten minutes. Finally, each pair combined with another to continue refining thoughts and add them to a Google Doc. The result is a really valuable and practical collection or ideas for information professionals in a wide variety of contexts.

For institutions that are just getting started with collections as data, the participants suggested building a network inside and outside the library/archive who can help plan and support collections as data work. They also suggest doing some research (perhaps starting with the collections as data zotero library), conducting audits of local collections, and doing an environmental scan of peer institutions.

In the short term, forum participants said they wanted to focus on taking stock and spreading the word. Common suggestions were to look for low hanging fruit such as collections that can easily be OCR’d and those with very open licences. In order to spread the word, forum participants suggested highlighting collections as data friendly collections online and through social media and hosting workshops for colleagues as well as users.

Predictably, forum participants targeted more complex issues for the long term. Staffing issues were particularly on the minds of people as they considered both new kinds of positions as well as professional development for existing staff. They also recommended spending time nurturing relationships both with local users who are particularly interested in computationally intensive research as well as regional and national partner institutions who are also working on collections as data.

One important theme that was present across all three activities was not so much an action item as it was a suggestion for a shift in mindset. One participant framed it well when they wrote about the need to “socialize collections as data as something that can be done and supported by units and staff across the library.” In other words, “de-silo it.”

To close out, we would like to express our thanks. Thank you to the forum participants - we appreciate the time you took to share your experience with the project and the broader community. Thank you to the Institute of Museum and Library Services for your support. Finally, thank you to the University of Nevada Las Vegas Libraries for hosting the forum - special thanks to Amy Gros-Louis, Lonnie Marshall, and Kee Choi for everything they did to make the forum a success.

Hannah Frost

Stewart Varner

Sarah Potvin