◀ 10.2. Preservation Preparation Workflow

Preservation was already possible with the functionality provided by Pilot I. There, resources could be selected and manually preserved using the PoF Middleware. Pilot II extends this with the ability of the PoF Middleware to enable Synergetic Preservation by relying upon the user’s Preservation Strategy and Preservation Value (PV) explained in the previous sections.

This section describes the updated preservation workflow steps of Pilot II along the ForgetIT PoF workflow “Preservation Preparation Workflow” as defined in deliverable D8.4 with a focus of the Pilot’s contribution to the steps. The full technical details will be reported in the final deliverable of the PoF Framework D8.6.

The workflow is depicted in the Figure below with its steps and functional entities involved in these steps (which are explained in the PoF Reference Model in the Functional Model; see D8.5). The additional numbers in the Figure are aligned with the following subsection numbers to explain the steps with involved functional entities.

Preserve-or-Forget Preservation Preparation Workflow
The PoF Preservation Preparation Workflow (adapted from D8.5).

10.2.0 ForgetIT PoF Middleware (at DFKI)

Before explaining the workflow, let us have a look at the PoF Middleware. For this, we use the PoF Middleware installation at DFKI premises which was deployed in order to actually preserve real resources from the DFKI PIMO. As the DFKI PIMO contains confidential material, we decided to have a local installation at DFKI of the middleware as well as content analysis services. Therefore, this installation also shows that the PoF Middleware, the components and services inside as well as the Digital Preservation System can be deployed also at company premises.

DFKI wants to thank EURIX for their willingness and effort for helping us to deploy the middleware as well as our partners USFD (University of Sheffield) and CERTH (Centre for Research and Technology Hellas) for providing us with their services which we could install locally at the DFKI.

Server at DFKI providing access to the ForgetIT PoF Middleware and the Preservation System.
Server at DFKI providing access to the ForgetIT PoF Middleware and the Preservation System.
ForgetIT PoF Middleware running at DFKI.
ForgetIT PoF Middleware running at DFKI for the DFKI PIMO.
DSpace installation at DFKI.
DSpace installation at DFKI connected to the PoF Middleware for preserving resources from the DFKI PIMO.
ForgetIT PoF Middleware Scheduler Route.
Some insights of the ForgetIT PoF Middleware: the scheduler route.
ForgetIT PoF Middleware Preservation Route.
ForgetIT PoF Middleware: the automated preservation route.

10.2.1 Content Value Assessment

Before the workflow starts, the functional entity Content Value Assessment (CVA) is responsible for the assessment of resources in the PoF Framework. As already pointed out in Section 2.4.1 this step provides the Preservation Value of a resource.

Considering the role of the functional entity CVA in the PoF Framework, the Semantic Desktop as Active System is an example of the situation where the Active System is capable of providing the PV for the preservation decision (as well as the MB for Managed Forgetting in the Active System) and thus, the functional entity CVA is part of the Active System.

This design decision was made because of the beneficial usage of both MB and PV in the Semantic Desktop infrastructure. The rich semantic model of the PIMO and the usage statistics of the Semantic Desktop allow for a comprehensive view on the resources wrt. MB and PV. Furthermore, the nature of the PIM application scenario implies a lot of access, usage, and changes to resources and the PIMO resulting in a lot of traffic as well as content assessment in the PIMO as a knowledge base. Therefore, both values are computed in the Semantic Desktop and stored directly in the PIMO to be easily accessed by its components and thus, making them an integral part of the PIMO.

Therefore, to enable the PoF Middleware to make decisions based on the PV in the Select step, the values are reported and updated in certain time intervals to the PoF Middleware by the SD/PoF Adapter (see ForgetIT architecture diagram).

The update contains the resource’ URI, its Preservation Value, and last modification date of the resource. Adding the last modification date allows the PoF Middleware to decide if the resource might need to be sent to the archive again if the resource changed since the last preservation.

10.2.2 Select

The Select step uses the functional entity Managed Forgetting & Appraisal to make conscious decisions about preservation of resources of the Active System. To accomplish this, the results of the Content Value Assessment are used for deciding about preservation actions.

The Forgettor component (see ForgetIT architecture diagram) selects the set of resources to be preserved based on the selected Preservation Value Categories set in the user’s Preservation Strategy. This information is part of the Preservation Broker Contract introduced in 10.1.9, set in the Preservation Service Contract, communicated to the PoF Middleware and managed there for each user.

10.2.3 Provide

The step Provide uses the functional entity De-Contextualization to extract a resource from its Active System context in preparation of packaging it for archiving.
Since Pilot I, the PoF Middleware retrieves resources via the Collector using the CMIS interface embedded in the SD/PoF Adapter. For the PIMO, this means that a thing and its grounding occurrence (i.e., the semantic representation and the actual physical file) is separated: the CMIS interface hands over the resource to be preserved as a cmis:Item and the PIMO’s model information about its thing will be part of the context information handed over in the forgetit:context attribute of the cmis:Item (for technical details of the interface please refer to D8.4, Section CMIS Integration). This attribute is then available for the modules in the PoF Middleware, especially the Contextualizer in the next step.

Information on a photo collection via the CMIS interface.
Information on a photo collection via the CMIS interface (here, with the Java CMIS Workbench).
Available properties via the CMIS interface.
Available properties via the CMIS interface (including, e.g., the preservation category, and the relation to the context item.

Technically, the context information export is an excerpt from the PIMO semantic graph describing the resource in the PIMO and its connection to other things such as topics for a document or persons attending an event. The format used for the exported excerpt is RDF/S using the PIMO Ontology RDF Schema and Turtle as exchange format.
The Terse RDF Triple Language is a compact textual syntax for representing RDF.

For Pilot II, this interface was enhanced by handling collections of resources (see D8.4) and using the additional context delivered by the SD/PoF Adapter. Furthermore, now every concept in the PIMO can be preserved separately, i.e., the handling was extended to all PIMO classes not only those representing (file) resources such as pimo:Media and pimo:LifeSituation as in Pilot I (see below for an example of the file). Now, it is also possible to preserve a, e.g., pimo:Project such as ForgetIT in the DFKI PIMO, although it might not have a physical file attached.

10.2.4 Enrich

In the Enrich step the functional entity Contextualization shall provide additional information for the content to be preserved in order to allow archived items to be fully and correctly interpreted at some future date (see D8.3). All resources in the submitted collection are handed over to the Contextualizer which runs three different components:

First, the world knowledge contextualization, as described in deliverable D6.3, processes each textual resource in the submitted collection. This component creates a World Context by applying an entity recognition to the text of a resource (e.g., a document or e-mail) using DBPedia as source to disambiguate entities. Each entity found in the text is added as semantic annotation (i.e., as URI) to the World Context. This World Context is then stored as additional context information to the metadata of the respective resource. See image example for world context below.

Second, the visual concept detection, as described in deliverable D4.3, adds visual concepts detected in images as additional context. See image example for visual concepts for contextualization below.

Third, the personal knowledge contextualization takes the context information provided in the previous step by the Semantic Desktop in the forgetit:context attribute as separate context. In terms of the PoF Reference Model, this context information generated from personal knowledge represented in the PIMO is stored in the so-called Local Context (see also D8.2). See image example for local context from the PIMO below.

The context information delivered by the Semantic Desktop satisfies the context dimensions identified in deliverable D6.1:

10.2.5 Package

In this step the functional entity Archiver creates from the resource(s) collected from the Semantic Desktop the content and metadata to create a Submission Information Package (SIP). This is then handed over to the Transfer step.

10.2.6 Transfer

The Transfer step then submits the SIP to the Digital Preservation System (DPS) (see ForgetIT architecture diagram) which stores it as an Archival Information Package (AIP). In the case of Pilot II on the ForgetIT testbed, the DPS is composed of DSpace and OpenStack Swift. For the deployment used at DFKI, currently, this is DSpace only as shown in the screenshots abiove.

10.2.7 Preservation finished

Once the preservation is finished, the PoF Middleware notifies the Active System of the outcome.
Notifying the user of the outcome is twofold: first, the user gets a notification once a collection has been preserved in the PIMO5 home screen as shown in the Figure below.

Notification of the successful preservation on the PIMO5 home screen.
Notification of the successful preservation on the PIMO5 home screen.

Second, several places in the Semantic Desktop show if a thing is preserved such as in the thing view the Figures below, and the decoration of images (as well as in the PIMOCloud with the green PIMOCloud icon, see D9.3).

Showing the preservation state of a photo in the PIMO5 image detail view.
Showing the preservation state of a photo in the PIMO5 image detail view.
Showing the preservation state of a photo collection.
Showing the preservation state of a photo collection.

10.2.8 Inspecting the results in the Archive

The following screenshots show examples from the DFKI PIMO which were preserved using the ForgetIT PoF Middleware as described above and stored in the DSpace installation at DFKI.

The DSpace community PIMOArchive.
The DSpace community PIMOArchive listing preserved collections. Visible are several photo collections and projects. On the right hand side, you see some of the subjects added to preserved resources. Among others, we also used the rdfs:types from the PIMO ontology. Furthermore, currently more than 3400 resources are preserved for the user Heiko.
A preserved photo collection in DSpace.
The photo collection Istanbul 2014 in DSpace community PimoArchive belonging to user Heiko Maus.
A preserved photo in DSpace.
A photo from the Istanbul collection in DSpace with metadata, local context from the PIMO and analysis results from the image analysis.
Image analysis results from the PoF Middleware.
Opened the file imageAnalysis.xml containing the results from the Contextualizer component doing visual concept detection in the PoF Middleware (using the service from our colleagues from CERTH installed at DFKI premises and invoked form the PoF Middleware).
Context from the PIMO for the image as RDF in RDF Turtle format.
Opened the file localContext.ttl containing the context from the PIMO for the image as RDF in RDF Turtle format (handed over via the CMIS interface in an item connected via pimo:context).
World Context PoF Middleware using YODIE.
Opened the file worldContext.json for a textual notes on the Turing workshop travel containing the additional world context extracted by using the Contextualizer component (Sheffield's YODIE) in JSON format).

Now stuff is in the archive which came to the PIMO user Heiko over all the years and got a high preservation value. Another example is the early draft of the ForgetIT proposal (called Preserve-or-Forget at that time) where Claudia and Wolf asked us if we are interested to join this activity with our Semantic Desktop approach.

Early draft of the ForgetIT idea archived with metadata and PIMO context.
The early draft of the ForgetIT idea archived with metadata and PIMO context. Used in the Semantic Desktop, stored in the PIMO, and preserved for future generations using ForgetIT's PoF Middleware.