Cohort Projects

✌️ Cohort 2 (1/2020 - 4/2022)✌️

…And 25 of our closest friends: The Louisiana Digital Library as Community-Focused Data

Louisiana State University

Sophia Ziegler, Gina Costello, Leah Powell, Elizabeth Joan Kelly

The Louisiana Digital Library (LDL) is a state-wide resource for sharing digital heritage content from public libraries, academic libraries, museums, and archives. Our project enables librarians, archivists, and curators from across Louisiana to gather as a community of practice and explore the policy, practice, and ethics of reconceptualizing the LDL as data. By producing a series of sample collections as data, this project will foster community around a state-wide goal of building computationally meaningful collections that are ethically-grounded and culturally relevant.

Using Newspapers as Data for Collaborative Pedagogy: A Multidisciplinary Interrogation of the Borderlands in Undergraduate Classrooms

University of Arizona

Mary Feeney, Sarah Shreeves, Anita Huizar-Hernández

Using Newspapers as Data for Collaborative Pedagogy: A Multidisciplinary Interrogation of the Borderlands in Undergraduate Classrooms explores how historical newspapers packaged as a single “collection as data” can act as a point of convergence for collaborative pedagogy in the undergraduate classroom. The dataset will include selections from the University of Arizona Libraries’ Historic Mexican and Mexican American Press digital collection and Arizona newspapers digitized for the National Digital Newspaper Program, including Spanish-language newspapers, newspapers of African American communities, and newspapers from predominantly white English-speaking communities, all located within the Southwest during two periods from 1915 to 1922 and 1941 to 1959. Faculty members participating in the project will use newspapers as data to explore topics in their courses in History, Journalism, English, and Spanish and Portuguese during the Fall 2020 semester. The project will culminate in an undergraduate symposium and a white paper with recommendations based on lessons learned.

Images as Data: Processing, Exploration, and Discovery at Scale

Harvard University and University of Richmond

Carol Chiodo, Lidia Uziel, Lauren Tilton, Taylor Arnold

Images as Data will increase the means of access and discovery of born-digital collections of photography and moving images. The project will draw from three born-digital collections of European ephemera from Harvard University Library and two digitized collections from the Harvard Art Museums. These contemporary materials provide unique testimony on political unrest and the culture of protest in Europe, in the context of rising nationalism, anti-immigration movements, globalism and international migration. Through the implementation of computer vision techniques, including the distant viewing framework, the project will provide a model for expanding the processing of digital images and subsequent algorithmic discovery of connections across collections. It will also illustrate how distant viewing can offer a paradigm for addressing the social and ethical challenges of using machine learning with images, particularly of sensitive topics.

LGBTQ+ Audio Archive Mining Project

University of Wisconsin Milwaukee

Ann Hanlon, Dan Siercks, Marcy Bidney, Cary Costello

The UWM Libraries house one of the largest collections of historical and contemporary LGBTQ+ materials in Wisconsin and the Midwest, including a rich record of Milwaukee’s LGBTQ+ communities. The LGBTQ+ Audio Archive Mining Project will use machine learning tools and data analysis and visualization to build and process text datasets extracted from a variety of AV materials in these collections, including collections of oral histories, local television news and radio broadcasts, and early LGBTQ+ community cable programming. The LGBTQ+ Audio Archive Mining Project will aid in better understanding the contents of these collections, and enhance discoverability of previously unrecognized topics, relationships, and patterns that shed light on the history of the LGBTQ+ community in Milwaukee and the Midwest.

Surfacing hidden water data: Water, people, displacement in Southern California

Claremont Colleges

Jeanine Finn, Jessica Dávila Greene, Char Miller

With this grant, the Claremont Colleges Library will bring computational accessibility to the digitized materials in its wide-ranging California Water Documents collection. This collection includes over 13,000 digital files of mixed archival materials, including journals, ledgers, correspondence, field notes, and maps documenting the history of water use in Southern California in the late 19th and early 20th centuries. This rich collection has the potential to surface the histories of the earliest inhabitants of the west (including the Cahuilla and Paiute peoples) who utilized local waterways to those who exploited the First People’s knowledge and labor to build the region’s modern agriculture and urban economy. The project team will collaborate with community partners to attach appropriate indigenous place names, include coordinates for geographic materials, and make PDF and image files fully discoverable and computationally useful.

dLOC as Data: A Thematic Approach to Caribbean Newspapers

Florida International University

Miguel Asencio, Jamie Rogers, Perry Collins, Hadassah St. Hubert

Digital Library of the Caribbean (dLOC) intends to enhance access to its existing Caribbean newspaper collections by making texts available for bulk download to its users. This will facilitate modes of scholarship that depend on access to image and textual data at scale and will enable a new level of access to titles not included in newspaper data resources such as Chronicling America. To meet the needs of the dLOC community for teaching and research, we will demonstrate the potential of newspaper data by creating a pilot thematic tool kit focused on hurricanes and tropical cyclones. The toolkit will provide multilingual datasets focused on these disasters from several countries and islands in the Caribbean, such as the Bahamas, Belize, Cuba, the Dominican Republic, Grenada, Haiti, Jamaica, and Martinique.

☝️ Cohort 1 (1/2019 - 4/2020)☝️

Collections as Data: Redefining Creators, Users, and Stewards of the Charles “Teenie” Harris Photographic Archival Collection

Carnegie Museum of Art

Dominique Luster, Charlene Foggie-Barnett, Ed Motznik, Samantha Ticknor

This project seeks to build upon the rich history of the Teenie Harris archival collection and develop new opportunities for computer-generated creation and computational manipulation of collection metadata that is both produced and used by the African American community and the Carnegie Museum of Art. This project aims to develop and document the service and use capabilities and limitations of machine learning, text parsing, and computer vision technologies to make meaningful contributions to archival metadata. The public facing deliverables will combine the notion of creators and users of the Harris data and will result in a suite of web-based in-gallery interactives that have the functionality of engagement with and contribution to the collection as data.

On the Books: Jim Crow and Algorithms of Resistance

University of North Carolina Chapel Hill

Nathan Kelber, María R. Estorino, Amanda Henley, Matt Jansen, Lorin Bruckner, Sarah Carrier, William Sturkey

On the Books: Jim Crow and Algorithms of Resistance will create the most complete list of North Carolina Jim/Jane Crow laws (1877-1965) since Civil Rights pioneer Pauli Murray’s States’ Laws on Race and Color (1951) by using machine learning to analyze a corpus of more than one hundred years of North Carolina public, private, and local session laws from the end of the Civil War through the Civil Rights Movement (1865-1968). The project is inspired by current conversations about algorithmic bias, big data, and race in the work of authors such as Safiya Noble, Virginia Eubanks, and Cathy O’Neil. The results of this research will be shared in a plain-text corpus, a website for educators and researchers, a white paper, a code repository, a methods workshop at a Triangle Digital Humanities Institute, a Carolina K-12 teacher workshop, and a future library conference.

The Native American Educational Services College Digital Library Project

Northwestern University

Josh Honn, Kelly Wisecup, John Dorr, Dorene Wiese

The Native American Educational Services (NAES) College Digital Library Project is a metadata and meta-data project situated in Native American & Indigenous Studies and Library & Information Science that seeks to form a process and pedagogy around the data curation of at-risk community-based research collections, in our case an urban American Indian college open from 1974-2005. Working in collaboration with Native community organizations, Northwestern University librarians and faculty will curate data from digitized NAES College library catalog cards, create a digital humanities website presenting context for the data, and write a white paper examining issues of data sovereignty. The project seeks to circulate ethical, sustainable data curation processes for multiple fields and institutions, especially those collaborating with Native nations and organizations.

From Collection Records to Data Layers: A Critical Experiment in Collaborative Practice

University of Pittsburgh

Tyrica Terry Kapral, Aaron Brenner, Matthew J. Lavin

From Collection Records to Data Layers: A Critical Experiment in Collaborative Practice aims to develop effective strategies for enriching existing library-generated collections data through research-driven and critically interpretive layers of additional data that are conducive to computational use. The project team will collaborate with library partners and undergraduate scholars in the English Department to create enrichment data layers that extend the catalog data for a diverse array of materials held by the University of Pittsburgh Library System—specifically, serials (e.g., journals, magazines, newspapers, newsletters) and ephemera (e.g., broadsides, flyers, cartoons) that reflect the perspectives of African Americans, American Labor Unionists, American left-wing organizations, feminists, and the LGBTQ community. The project will yield actionable datasets and reproducible workflows for creating and sharing collections-based data layers, while teaching computationally minded data stewardship practices and fostering scholarly engagement with collections produced by marginalized and underrepresented groups.

Linking Lost Jazz Shrines Project

Weeksville Heritage

Obden Mondesir, Zakiya Collier, Cristina Pattuelli

The Linking Jazz Shrine Project seeks to apply linked open data principles to our Weeksville Lost Jazz Shrines of Brooklyn Oral history collection. This collection was part of a 2008 research project that documented Central Brooklyn’s cultural legacy of jazz history between the 1930s and 1960s. By applying linked open data, we plan to make these collections and the connections they provide more discoverable to jazz researchers that would benefit from a significant collection about Central Brooklyn’s nearly lost jazz culture.

Uncovering Health History: Transcribing and Publishing Early Twentieth-Century Tuberculosis Patient Records as Data

University of Denver

Kim Pham, Kevin Clair, Jack Maness, Jeanne Abrams, Fernando Reyes, Jeff Rynhart, Alice Tarrant

This project will use handwritten text recognition (HTR) to create transcriptions of records that have been unavailable through traditional OCR processes from the Jewish Consumptives’ Relief Society, a tuberculosis sanatorium located in Denver from 1904 to 1954. These records represent a valuable archive of primary source materials regarding the treatment history of tuberculosis in the early 20th century and the history of primarily Jewish and Eastern European immigration to Denver during the same time period. We intend to develop capacity in services and infrastructure to support and use HTR technologies in our regular workflow to produce collections as data and to contribute to the emerging HTR technology ecosystem.