The Santa Barbara Statement on Collections as Data

Learn more about the statement.

Statement comments and suggestions are encouraged. Annotate any portion of the text by highlighting it. If you’d rather send comments and suggestions directly, please submit via this form. Alternatively, comments and discussion are welcome on the Collections as Data Google Group. The first statement development period will run through summer 2017. The next version of the statement will be shared by September 2017.


Collections as Data National Forum, 3.3.17

What are “collections as data”? Who are they for? Why are they needed? What values guide their development? The Santa Barbara Statement on Collections as Data poses these questions and suggests a set of principles for thinking through them, as part of a community effort to empower cultural heritage institutions to think of collections as data and consequently to explore what might be possible if cultural heritage seen in this light was more readily open to computation.

The concept of collections as data emerges at – and is grounded by – a particular moment in the recent history of cultural heritage institutions. For decades, cultural heritage institutions have been building digital collections. Simultaneously, researchers have drawn upon computational means to ask questions and look for patterns. This work goes under a wide variety of names including but not limited to text mining, data visualization, mapping, image analysis, audio analysis, and network analysis. With notable exceptions like the Hathitrust Research Center, the National Library of the Netherlands Data Services & APIs, the Library of Congress’ Chronicling America, and the British Library, cultural heritage institutions have rarely built digital collections or designed access with the aim to support computational use. Thinking about collections as data signals an intention to change that, and efforts like the Library of Congress’ Collections as Data: Stewardship and Use Models to Enhance Access and the multinational Digging into Data suggest that a broader community shift intentionally scoped to institutions large and small comes at an opportune time.

While the specifics of how to develop and provide access to collections as data will vary, any digital material can potentially be made available as data that are amenable to computational use. Use and reuse is encouraged by openly licensed data in non-proprietary formats made accessible via a range of access mechanisms that are designed to meet specific community needs.

Ethical concerns are integral to collections as data. Collections as data should make a commitment to openness. At the same time, care must be taken to comply with legal requirements, cultural norms, and the values of vulnerable groups. The scale of some collections may also obfuscate what is hidden or missing in the histories they are perceived to represent. Cultural heritage institutions must be mindful of these absences and plan to work against their repetition. Documentation should be informed by archival principles and emergent reproducibility practice to ensure that users have the information they need to work with collections responsibly.

Principles

  1. Collections as data development is a work in progress. Work in progress status can be seen as a virtue. Iteration implies productive friction across a range of perspectives geared toward encouraging computational use of collections, development of internal and external collaborations, and alignment between traditional and emerging services.

  2. Collections as data development aims to encourage computational use of digitized and born digital collections. By conceiving of, packaging, and making collections available as data, cultural heritage institutions work to expand the set of possible opportunities for engaging collections.

  3. Ethical commitments guide collections as data development. Ethical commitments are made in light of historic and contemporary inequities represented in collection scope, description, and access. Commitments should be documented and readily accessible to those engaging with collections. Commitments should serve to respect the rights and needs of the communities who create collections as well as the communities that use those collections.

  4. Collections as data stewards aim to lower barriers to use. A range of accessible instructional materials and documentation should be developed to support collections as data use. These materials should be scoped to varying levels of technical expertise. Materials should also be scoped to a range of disciplinary, professional, creative, artistic, and educational contexts.

  5. The needs of specific communities inform collections as data development. Concrete strategies should be pursued to engage community need. Multiple approaches to data development and access are encouraged.

  6. Shared collections as data documentation helps others find a path to doing the work. In order for a range of individuals and institutions to engage collections as data work it must be possible to access documentation that demonstrates how the work is done. Documentation should be publicly accessible by default. Draft documentation is better than no documentation. Examples of documentation include workflows and code.

  7. Whenever possible, collections as data should be made openly accessible. Terms of use should align with efforts like Creative Commons, RightsStatements.org, and Traditional Knowledge licenses where appropriate.

  8. Collections as data development works toward interoperability. Working toward interoperability entails alignment with emerging and/or established community standards and infrastructure. Working toward interoperability eases integration with centralized as well as distributed infrastructure. Interoperability facilitates collections as data discovery, access, and use.

  9. Collections as data stewards work to support the integrity of collections. Claims based on collections as data depend on their integrity. Integrity is safeguarded by fault-tolerant systems and data provenance. Provenance reflects how data were created, and modified as well as the scope, and intended use of the data.

  10. Collections as data may encompass or be derived from collections. Data as well as the data that describe those data are considered within scope ( e.g. images, audio, video - as well as - metadata, finding aids, catalogs). Data resulting from the analysis of those data are also included.


Code of Conduct

All project activity, both in person and online, aims to foster a welcoming and inclusive experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age, religion, nationality, or political beliefs. Harassment of participants will not be tolerated in any form. Harassment includes any behavior that participants find intimidating, hostile or offensive. Participants asked to stop any harassing behavior are expected to comply immediately. Please contact any member of the project team if you have concerns.