From catalog to semantic web, problems and perspectives
[During the 2021/2022 academic year, the Gramsci center for the humanities took part in an academic project of the University of the Republic of San Marino entitled The San Marino Political Archives: census, digitization, fruition.
The final outcome of this work will be the publication of a Guide to the recorded archives of personalities and political parties, enriched with more detailed theoretical insights, which will act as a first indispensable tool for accessing the heritages existing on the territory of the Republic; alongside this, as Gramsci center, we have tried to reflect on the formulation of a proposal for the digitization of the identified material, which would provide for its fruition and wider valorization.
In our perspective, however, the occasion was also valuable to set up an initial general study of the different possibilities and forms of transferring, rendering and fruition of archival records in the digital environment. Attention to how the digital revolution is changing the ways in which culture and knowledge are produced, processed, and consumed, in its various declinations, is in fact one of the directives guiding the work of our center since its foundation.
The first fruit of this reflection will be four releases, curated by Andreas Iacarella, which will take the form of an attempt to deepen dialogue and reflection on a theme that appears as urgent as ever to solicit. We are certain that the technical and scientific nature of the discourse cannot be eluded, and that therefore the debate will have to be increasingly brought into a dimension of extreme interdisciplinarity, involving all the actors involved: professionals in communication, knowledge design, information technology, specialists in cultural heritage (archivists, librarians, curators, etc.), as well as historians, art historians and all those categories of scholars who represent the traditional users of archives. The different knowledges will have not only to talk to each other, but to intertwine, to hybridize, certainly losing something of their specificities, to the advantage, however, of a more advanced design of archival digital environments, which can become a vanguard in the construction of the common memory of the communities of reference. With this in mind, what we want to offer is but a small critical contribution to the discussion, with the presentation of some general problems and issues, in the hope that this may provoke further reflection.
Summary of the episodes:
1. Cultural heritage in the digital environment: from catalog to semantic web, issues and perspectives
2. Archives in the web: strategies and models
a. From online inventory to digital library: comparing experiences
b. Building a narrative: the paths of history in the web
3. Policy archives on the web
a. Italian political archives and the challenge of the internet: a bumpy ride
b. Letters and commercials: two successful cases of valorization
c. From Archivi del Novecento to 9centRo: network experiences compared
4. The Europeana case and other international experiences of memory sharing]
In January 2023, version 1.1 of the National Plan for the Digitization of Cultural Heritage 2022/2023 was presented by the Central Institute for the Digitization of Cultural Heritage – Digital Library, which is attached to the Italian Ministry of Culture. This is the first update of a five-year plan (2022-2026) that has the ambition of directing and guiding the process of digital transformation of the heritage afferent to the various institutes and places of culture in Italy.
We are facing an absolute novelty on the national scene, the Central Institute for Digitization itself was born just a few years ago, in 2020; therefore, it is worth starting from this project to focus on some of the recurring problems in the processes of digitization of cultural heritage.
Planning for digital
The computerization of public administration officially began, in Italy, starting in the 1990s. In a short period of time, this has resulted in the development of online catalogs and portals for cultural heritage, which has become more and more substantial. According to the data offered by the National Digitization Plan we are currently speaking, only as far as information systems managed by central institutes of the Ministry are concerned, of “more than 37 million catalographic descriptions to which about 26 million images are associated; this information patrimony has been consulted by more than 100 million unique visitors in the last five years” If we add to this what is managed at the territorial, regional or private level and not fed into national databases, we can get an idea of the extent of the context we are referring to.
What immediately emerges in this overview is what risks becoming a limitation: namely, the extreme articulation of both sectoral (archives, libraries, museums, archaeological parks, etc.) and territorial content. A granularity that on the one hand can “more easily guarantee scientific depth,” dealing with “circumscribed and disciplinarily homogeneous contexts”, but on the other shows some serious weaknesses. Four in particular are identified in the Plan, against which intervention strategies are proposed to be directed:
– “poor sustainability over time due to rapid obsolescence of data, applications and infrastructure not thought of in terms of networks of interconnected entities;
– inability to develop advanced digital services that are based on effective exchange and interrelation of data;
– limited sharing of results (…), resulting in increased costs due to the multiplication of technological and methodological tools in use;
– poor ability to follow users in the different forms of enjoyment of cultural heritage, which precede and follow the visit at the place of culture”.
It seems to us, through this example, that we have raised quite a few issues, with respect to which we will try in the course of the episodes to offer some insights. In particular, the following appear to be central: the paradigm of interoperability, with what it implies in both technical and scientific terms; the need for an increase in the interdisciplinary debate on the issue of digitization, involving all the actors involved, both from the point of view of professionalism and of institutions; and the deepening of a perspective that holds together the disciplinary rigor that comes from established traditions with the emergence of new communicative needs and of a user base that is becoming, in the digital environment, as varied as ever. Our focus will, as mentioned, be on archives, but some preliminary reflections seem useful.
Forms of the Web
«Like all media revolutions, the first wave of the digital revolution looked backward as it moved forward. Just as early codices mirrored oratorical practices, print initially mirrored the practices of high medieval manuscript culture, and film mirrored the techniques of theater, the digital first wave replicated the world of scholarly communications that print gradually codified over the course of five centuries: a world where textuality was primary and visuality and sound were secondary (and subordinated to text), even as it vastly accelerated the search and retrieval of documents, enhanced access, and altered mental habits. Now it must shape a future in which the medium-specific features of digital technologies become its core and in which print is absorbed into new hybrid modes of communication».
These remarks, by some pioneers of the Digital Humanities (DH), seem particularly apt to introduce the discourse. From their perspective as researchers, the limitation that the authors identified in the digital revolution was the use that had been made of the new tools: resting on the enormous simulation capacity of the computer, they had been content to imitate the known, without fully exploiting the potential of the new context.
Potentialities that have developed and deepened over time, as Francesca Tomasi has well pointed out: “from a typically document-centric approach of Web 1.0″, we have transited through the intermediate phase of Web 2.0, which introduced the concept of “democratization of knowledge, focusing on the idea of a collaborative and participatory environment.” (Crowdsourcing). Then we come to the so-called Web 3.0, in which the focus has shifted “to the atomic datum, the resource qualified through a uniquely identified fragment, and to typified, i.e., semantic, relations”. Although simplistic, these labels are useful for having an overview of the growing potential of the digital space. In the Semantic Web, the theoretical starting assumption is that the data should therefore be studied “with an awareness of its multiple relations,” elaborating “conceptual models, that is, abstract approaches of data observation, in the form of ontologies”. We are within the Linked Open Data (LOD) paradigm, “which allows information to be fragmented down to its smallest terms and reassembled according to different logics,” in an open environment in which there is the “possibility of creating a network of non-hierarchical connections that is potentially infinitely expandable.”; the underlying architecture, which accommodates the data, is that of the graph, to which we will return briefly below.
It is immediately clear from these brief hints how the issue cannot be reduced to the technical level. The term “digitization,” which too often in public discourse becomes synonymous with the simple acquisition in digital form of huge masses of analog materials, should be explored in its real depth. The digitization of cultural heritage is, always, a process that involves “a radical transformation of material form and so takes place in an economy of loss and gain”: digital documents, even when they are reproductions of an analog object, are not simply copies, but complex and layered structures. “[E]ach type of digital medium is a digital medium in a specific way,” writes Niels Brügger, “each has its digitality.”. And it is this different “digitality” that needs to be analyzed.
As Stefano Vitali pointed out in urging the importance of a critique of digital sources, with digital documents many established categories seem to falter. First of all, it is the notion of text itself, as a “closed and perfect entity, with characters of coherence and cohesion”, which is lacking: textuality takes on, in the network, a decidedly more fluid character. This leads to a whole series of problems when we want to determine the authenticity or reliability of a document, or when we ask ourselves the question of the verifiability of sources. Digital documents are fluid, are fragile, and “do not exist as physical entities distinct from the technology and process that makes them intelligible”; that is, “I think I am seeing an object, but I am actually witnessing a performance”.
But with all evidence the issues are also of a more general nature: offering cultural heritage, and archival heritage in particular, in a networked context opens it up to an entirely new public dimension. “With the advent of the Internet (…) the potential and, sometimes, the actual public of archives” have taken on “mass proportions,” becoming “typologically less and less characterizable and animated by a spectrum of motivations, personal or professional, as varied as ever.”.
In the dimension made its own since the so-called Web 2.0, of crowdsourcing, that is, of free interaction between creator and user, of exchange and creation of collective content, the distinction between specialists and laymen seems to be disappearing, in favor of the emergence of a kind of “collective intelligence”.
These aspects have prompted that philosophy of thought which considers “digitization as a medium capable of realizing in itself a wide, capillary and tendentially universal dissemination of knowledge,” and which has its roots in that ideological approach that interprets the Net as a tool for “human liberation and democratization of access to information and knowledge”. But the risks of such a vision are obvious.
On the one hand, as pointed out by many scholars regarding online sources of the past, the risk is that users prefer to “do it themselves, telling ‘their’ story”. “Everyone a Historian,” as Roy Rosenzweig wrote . But the disappearance in the network of the intermediation of the historian, as of the other professional figures who allow a critical and conscious access to the documents of the past, risks with all evidence to feed a discourse with a strong narcissistic characterization: the “history and memory that the network transmits, narrated and interpreted in part by anyone, allow the uncritical and decontextualized reproduction of individual and community memory, that is, of the “blind” horizon of each”. An “abstract localism” that is “incapable of reading the complexity of historical processes,” and that can even feed alternative narratives and “collective memories to ‘official’ history and dust[re] off – or invent[re] out of whole cloth – new ‘national legends'”.
On the other hand, we cannot deny the great fascination exerted by the prospect of a widespread historical memory that can really build itself as a collective product, through platforms and portals designed ad hoc; but in this perspective, the role of memory professionals (historians, archivists, etc. ) “devenait encore plus important pour filtrer, organiser, interpréter, reprendre un rôle d’intermédiaire face à cette activité nouvelle du grand public”. Commitment must therefore be directed, through as broad a dialogue as ever between the various professions, to the construction of tools that are able to take advantage of the distinctly communicative, hyperdemocratic aspect of the Web, developing new hermeneutics and interweaving them with those already established. That is, the question becomes, “what kind of knowledge do we construct with digital tools and how do we construct it?”.
What enables, for the average user, access to information on the Web are search engines. The logic underlying the search engine is, evidently, quite different from the classificatory and bibliographical logic – proper to a cultural and ideological project – that has accompanied humanistic research for centuries: the “mechanisms of operation of Google, not only do not rely at all on classifications, sorting, prior arrangements of ‘documents,’ but, in a sense, prevent precisely that from happening”; however, since the search is entrusted to software, the “code has become even more ‘secret’ than it was in traditional archives, and the user’s possibility of exercising control over the search results,” over their reliability, “over the route taken to get there is almost nil”.
Faced with the risk posed by this unknown continent that is the digital, dealing with “a country with traditions foreign to our own”, cultural institutions initially reacted by leveraging that imitative character of the web we mentioned, limiting themselves to “constructing mere containers, non-dynamic digital copies of the paper catalog, lists of objects and reproductions accompanied by misplaced or incomprehensible captions”.
This landscape has inevitably become more complex over time. The Semantic Web has introduced those characteristics we already mentioned, of extreme fragmentation of information and at the same time of enormous possibilities to aggregate it according to different logics and paths. Thus prompting a reflection on how to interweave the tools for the construction of meaning typical of the Web (metadata, paradata, ontologies, etc.) with the descriptive traditions of cultural heritage.
An ontology, in particular, can be described as a “formal and shared representation of the concepts and mutual relationships that characterize a certain knowledge domain”. That is, it defines the concepts that are representative of a given knowledge domain, and the relationships that connect them to one another; it creates “networks whose nodes are capable of describing themselves and showing themselves to be logically completable by software agents navigating them.”. The mode of knowledge representation that becomes dominant in this context is that of the graph, “a set of nodes connected by relationships,” a “multidimensional and relational information architecture”, different from a hierarchical and essentially closed form.
An open system
We cannot go into the technical details of these structures here; what is relevant to our discussion is that with the graph an open system of knowledge organization is established. We have moved from a logic of closed systems to one of potentially infinitely open systems. In this context, the notion of interoperability plays a key role: the “data acquire knowledge value when they are interconnected with other data, when their interconnectedness produces deflagrating network effects”. That is, there is a need to ensure that data come “out of proprietary silos and are made searchable, accessible, intelligible and reusable”; “exposed on the web in non-proprietary formats, so that they can be re-used, re-purposed and re-mixed with other resources”.
As far as cultural heritage is concerned, this has a scope that invests the very nature of the descriptive traditions of the different disciplines, urging a greater effort of sharing and reconciliation: several are the MAB (Museums Archives Libraries) or GLAM (Galleries Libraries Archives Museums) projects in this sense, but the reflection is still open. As Tomasi points out, it would be up to cultural institutions “to take back that role as mediators of knowledge,” valuing the “description of cultural objects through the lens of interpretation, which, with the creation, selection and use of the ontological models most appropriate to research aims and objectives, can enrich cultural data, and thus information experience, through the integration of heterogeneous and transversal resources.”.
The epistemological repercussions of these issues and of the linked data paradigm on the “world of cultural heritage” are enormous: to be modified are the very “cognitive processes that have hitherto governed our relationship with the bibliographic universe and with the tools that have historically mediated the relationship between reader and knowledge” (catalogs, records, indexes, etc.). Thus, the idea that a “worldview is possible only from an awareness that knowledge is a dynamic process of continuous composition and disarticulation of what we discover and know about the world”. This brings with it numerous potentials for the communication of cultural heritage:
«on the one hand (…) it must be an opportunity for the construction of a critical consciousness, and therefore it must promote an intelligent approach on the part of the user, encouraging the construction of relationships, suggesting research hypotheses and alternative perspectives, avoiding easy concessions to intellectual laziness; on the other hand, communication in the digital environment must make the most of semantic potential to simplify and extend access to knowledge, propose paths and make explicit-thanks to the computational power of computers and the adoption of appropriate data architectures-relationships that are invisible to the human eye».
As already mentioned, however, the considerable possibilities of empowerment, of building a cultural proposal or a shared narrative, are accompanied by equally cogent problems, such as disintermediation, that is, the possibility of accessing “directly from any part of the graph,” bypassing the “mediating function between sources and users” represented by descriptions”. Which in turn ties in with the issue of the loss of the “context” (descriptive, interpretive, physical, etc.) of the individual document. How to succeed in balancing the drive for innovation and openness offered by the semantic web with the need for orderings and constructions of meaning with solid scientific foundations is precisely the challenge in which we are currently immersed.
It seems to us at this point that we have offered a fairly broad, albeit partial, overview of the issues that arise when talking about the digitization of cultural heritage. Sufficient at least to realize in what sense the digital revolution has entailed in this field a change not of technology, but of knowledge design. As Michetti writes, the “digital world is not neutral,” so “if we want to address the issue [of cultural heritage on the Web], we must first understand in what direction digital technologies are pushing us. We need to read and use technologies critically (…). Adaptation to digital is more than the choice of a format, procedure or software. If anything, these are consequential aspects.”. It is the same paradigm of “doing science” that has been revolutionized through the digital, and from this perspective producing, communicating, and organizing knowledge appear as increasingly interconnected actions, only artificially separable. Thus, the high road to follow seems to be that of increasing hybridization between disciplines.