A Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlA Metadata Approach to Preservation of Digital Resources: The University of North Texas Libraries' Experience by Daniel Gelaw Alemneh, Samantha Kelly Hastings, and Cathy Nelson HartmanPreserving long-term access to digital information resources is one of the key challenges facing libraries and information centers today. The University of North Texas (UNT) Libraries has entered into partnership agreements with federal and state agencies to ensure permanent storage and public access to a variety of government information sources. As digital resource preservation encompasses a wide variety of interrelated activities, the UNT Libraries are taking a phased approach to ensure the long-term access to its digital resources. Formulation of preservation policy and creation of preservation metadata for electronic files and digital collections are among the most important steps. This paper discusses the issues related to digital resources preservation and demonstrates the role of preservation metadata in facilitating the preservation activities in general. In particular, it describes the efforts being made by the UNT libraries to ensure the long-term access and preservation of various digital information resources.ContentsIntroductionMetadata for Digital Resources ManagementDigital Initiatives at UNT LibrariesPreservation Metadata Requirement AnalysisIssues and ChallengesLessons LearnedConclusion IntroductionIt is now common knowledge that digital information is fragile in ways that differ from traditional technologies, such as paper or microfilm. The fact that information is increasingly stored in digital form, has led to an accelerated search for effective methods of managing electronic information resources. The huge and ever expanding multiple sources of information on the Web normally contain special formatting and are produced with a variety of software in different versions. If the original digital resource is not \"born digital\may be a digital representation or digital surrogate of the physical medium, e.g. a page of text, an object, a painting, a photograph, a sculpture, a song, a movie, etc.The persistence of digital information resources is an important factor for any digital library development. Addressing the preservation and long-term access issues for digital resources is one of the key challenges facing libraries and information centers today. In order to make sense of the high heterogeneity that exists among digital resources, a growing body of research has attempted to deal with the problems associated with the volume and nature of information on the Web and to look into ways to achieve consensus on a standard. Metadata for Digital Resources ManagementMetadata is a set of attributes used to describe an object. In reviewing the library and information science literature of the past few years, there is no shortage of views of the significant role of metadata in meeting the most pressing needs and challenges of digital resource management. A number of researchers (Moen, 2001; Waibel, 2001; Besser, 2000; Sutton, 1999; Zeng, 1999; among others), agreed that the underlying principle for metadata is to link and integrate heterogeneous, multi-platform, massive digital information collections that are contributed by different institutions into a single unified resource so these digital repositories are accessible by anyone, from anyplace, at anytime.1 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlA number of metadata initiatives provide detailed and descriptive information about a digital resource to facilitate discovery by users. Resource description is essentially about describing information resources using a standard framework or set of principles. But because of the specific nature of heterogeneous digital resources, describing digital resources in a consistence fashion may not be an easy task, and in some cases, it is a complex process. Those concerned with digital information management all regard metadata as an essential component of the evolving networked information environment, but each of these communities view metadata with notably different perspectives.Current Metadata InitiativesMetadata standards come from various professional community efforts to support many needs in the digital environment. The literature reveals that different communities view metadata in significantly different contexts. The recent (2001) report from Research Libraries Group (RLG), comprised of key stakeholders from a variety of institutions, affirmed the fact that no single metadata standard can be expected to accommodate the needs of all communities. Although some projects, such as Dublin Core (DC) have tried to develop a coherent set of metadata schemes that can work for wide range of communities, they have not yet provided a complete description or solution for all types of digital information resources.There is a great diversity of perspectives on various aspects of metadata issues. For instance, librarians have used machine-readable cataloguing (MARC) since the 1960's to identify, describe and provide access to their collections. However, what worked well for libraries may not work in other environments. Similarly, the basic metadata required for describing an image or work of art or non-text objects will bear a strong resemblance to the metadata that describes traditional print documents. However, some significantly different extra elements will be required for a complete description of non-text images and multi-media resources. In light of this, some formats of metadata have been developed specifically for use in certain fields of study or type of information source.Different communities have developed their own organizational and descriptive standards for accessing, arranging, and administering their specific digital collections, such as: the Medical Record Metadata for health professionals, Government Information Locator Service(GILS) metadata for describing government information resources, Visual Resources Association (VRA) Core Categories for describing visual information resources, and many more.A number of commentators (e.g. Moen, 2001; Besser, 2000; and, Sutton, 1999) are optimistic that the core element set will be as minimalas possible. Thus, the core element set meanings will be easy to understand by most users and the element set will be flexible enough for description of diversified resources in a wide range of subject areas. Of course, the various previous efforts provide ways of describing digital resources to facilitate interoperability among resource discovery tools. Further developments in Resource Description Framework (RDF), Extensible Markup Language (XML), and Z39.50 may also provide means for integrating diverse metadata-based resources. For instance, the most recent work at the Library of Congress (LC) - Metadata Encoding and Transmission Standard (METS) schema - provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata.As indicated by Mullen (2001), most metadata initiatives have focused on resource discovery and to make it easier for people to find all of the information they need. Although such standards and structures are the most important steps in the development of the Web to avoid a chaotic repository of information, they do not guarantee continual long-term access to digital resources.Preservation MetadataFor years, information centers preserved important electronic resources by transferring the files at regular intervals to the latest new information carriers available. As described by Besser (2001), refreshing a file involves periodically moving a file from one physical storage medium to another to avoid the physical decay or the obsolescence of that medium. Similarly, refreshing of files involves periodically moving files from one file encoding format to another that is usable in the current computing environment. But with multi-media digital resources (unlike in print) restoring in digital format may not be possible without the original software or hardware.Preserving digital resources is made difficult by the fact that digital resources can only be read by software. This would mean that in order to ensure long-term access to digital resources, we need to preserve all the software, hardware, and operating systems on which the software ran. However, with the current quick obsolescence of information technologies, such an approach may not be feasible. Furthermore, inadequate media longevity is one of the issues. For instance, optical disks are expected to have a physical lifetime of up to 30 years but even a life expectancy of 30 years for storage media far exceeds the lifespan of hardware and software. Considering the ever-growing global Internet traffic, another problem is the mass of data and the need to compress it for efficient storage and transmission. However, compression sometime causes loss of data. It is also likely that repeated transfers over years from one carrier to another may cause data loss. This raises a number of issues including copyright, authenticity, and reliability.Evidently, sustainable solutions to preserve digital resources are not yet available and are still being tested by various communities. Unlike the traditional notion of 'preservation,' (which refers to conservation and permanent preservation of the material or its information content), the ideal digital preservation activities would ensure that digital resources in all formats would be accessible as long as necessary. As described by Chapman (2001), if the objectives of digital preservation strategies were to preserve the artifact only, regardless of usability, longevity would be measured according to the lifespan of an object stored in a given environment. A number of researchers defined digital preservation in a variety of ways and present their views on how digital preservation might be achieved. According to RLG/OCLC's more specific definition, \"Digital preservation refers to the series of managed activities necessary to ensure continued access to and preservation of digital materials.\"It is clear that digital preservation is a critical issue, calling for measures that go beyond permanent archiving and all stakeholders agree that digital resource preservation encompasses a wide variety of interrelated activities. According to the RLG report, the problem of preserving digital sources is compounded by the fact that most of the sources do not have proper descriptions. Similarly, Besser (2001) 2 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlstated that such multi-format resources created by differing software require detail descriptions of the technical environment needed to view the digital resources.Despite the fact that most metadata research gives more emphasis on resource discovery, a small breakthrough has been achieved in the last couple of years for preservation issues. A growing number of efforts to perfect the digital preservation methods by various organizations and agencies include: Reference Model for an Open Archival Information System (OAIS), CEDARS (CURL, Consortium of University Research Libraries, Exemplars in Digital Archives-UK), National Library of Australia (NLA), and RLG/OCLC (Research Library Group) to name but a few. These high-level preservation metadata initiatives provide much needed information required to manage the long-term preservation of digital resources.As indicated by Besser (2000), preservation metadata is a strategy to provide sufficient technical information about the resources and to support the two primary strategies for preservation of digital resources, migration (transfer of digital resources from one generation to a subsequent generation) and emulation (developing techniques for imitating obsolete systems on future generations of computer.) Besser asserts that properly used metadata facilitate the long-term access of the digital resources by explaining the technical environment neededto view the work, including applications and version numbers needed, decompression schemes, other files that need to be linked to it, among others. Digital Initiatives at UNT LibrariesDuring the past few years, the University of North Texas (UNT) Libraries, Government Documents Department made efforts to preserve various federal and state government information resources by forming partnerships with state and federal agencies. The various digital projects and undertakings in Government Documents Department include the Cybercemetery, the Nineteenth Century Texas Law Online,and the Texas Register project.Figure 1 depicts one of the UNT Libraries' Digital Projects Page available at http://govinfo.library.unt.edu/. As the name Cybercemeteryindicates, the UNT Libraries collect the digital publications from \"deceased\" federal agencies and preserve them for current and future public access. Furthermore, various digitization undertakings at the Government Documents Department made available selected local resources such as the Texas Criminal Justice Statistical Reports. These, together with the Texas Register and the Nineteenth Century Texas Law Online projects can be cited as part of the UNT Libraries, Government Documents Department's initiatives to preserve state historical and agency publications. The UNT Libraries are also currently involved in a project with the Texas State Library and Archives Commission to preserve current state electronic publications. Figure 1. Cybercemetery Digital Project Page: preserving \"deceased\" U.S. government digital publications. Access, Use, and PreservationIn complying with the fundamental principle of free public access to government information resources, the goal of the UNT GovernmentDocuments Department's digital projects and initiatives is to make government information resources accessible for exploration and search by anyone, from anyplace, at anytime. 3 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlFigure 2: UNT Libraries Overall Web Usage Statistics (http://www.library.unt.edu/reports/stats/default.htm) As can been seen from Figure 2 above, each digital resource has its own user statistics. If we scrutinize the usage statistics very closely, there is a nearly linear increase in the monthly percentage of users for the Government Information Connection digital collections (Gammel's Laws, GovInfo, GPO, NPR, OTA, Texas Register). For instance, Texas Register is one of the UNT libraries popular electronic resources that became available online in 2001. Since then, the user base has grown dramatically, with the current number of hits more than sixteen times the hits in the early months. Usage grew from just over 30,000 (in January, 2001) to over 530,000 (in April 2002). Each digital information collection attracted a lot of users from all over the world. As the Libraries' digital collections grew, the need to address the issue of preserving long-term access to these resources became evident.In light of this, the UNT Libraries are taking a phased approach to prolonging the usable life of the libraries' digital resources. Formulation of preservation policy and creation of preservation metadata for electronic files and digital collections are among the most important steps in UNT Libraries' preservation initiatives. In view of that, the Digitization Workgroup was charged to recommend a plan that will ensure long-term future access to the UNT Libraries' electronic information resources. The Workgroup is reviewing the differenttypes of the UNT Libraries electronic resources. These include:State records;The online library catalog system;Personal and shared directories;The Libraries' Web site; and,The Libraries' digital collections. Preservation Metadata Requirement AnalysisMetadata is a key factor for ensuring the long-term access of digital resources. There is a continuous need for extending the existing metadata element set to be able to describe all available digital resources. During the past two years, the UNT libraries reviewed several metadata initiatives to build an element set appropriate for its digital collections while monitoring the RLG/OCLC efforts toward building a standard metadata element set.In addressing the issues of identifying specific metadata requirements, UNT Libraries attempted to assess the specific characteristics of the existing digital resources. In the preliminary needs assessment, the following issues, among others, were considered:Specific creation features and production life cycle of the digital information resources: Structural Type (Text, image, Audio, Video etc.), Integrity Issues, etc.;Users' information seeking behavior: [who, what, how ...];UNT's objectives [to ensure longevity]; and,Current standards and future trends [Interoperability (mapping) with current best practices and international standards plus complying with federal and state requirements].Life Cycle Assessment of the Digital ResourcesAs indicated by the National Library of Australia (NLA) report (1999), to manage digital collections or individual items one needs to have a clear understanding of one's digital collection. Documentation has always played a key role in preservation practice and there are many instances where documentation provided the only information about processes and changes that had been applied and might need to be corrected.4 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlIn this regard, all available digital resources creation manuals, guidelines, and reports at the Government Documents Department were reviewed and modified accordingly. Sample documents, (Texas Register digital collection creation processes report and ACIR procedure manual) can be viewed at http://texinfo.library.unt.edu/texasregister/text/report/TX-Reg-Report-2002.doc and http://www.library.unt.edu/gpo/ACIR/document/ACIR-procedure.doc, respectively. Those documents provide detailed information aboutthe creation history and complete life cycle of the digital resources. The preliminary resource assessment and evaluations assisted us in identifying the specific characteristics and requirements of the available digital resources.Based on the thorough assessment of the available digital resources, attempts have been made to review current best practices and standards to represent a range of relevant fields. The review pays particular attention to the preservation and management metadata sets, which are needed to support various preservation approaches including migration and emulation.The work at NLA developed a practical model for dealing with the immediate threat of disappearing digital objects, and established a workable distributed archive. Similarly, a number of projects and researches - such as OAIS (Open Archival Information System), CEDARS (CURL Exemplars in Digital Archives), NEDLIB (Networked European Deposit Library), and others - have investigated options for dealing with long-term preservation challenges.Based on the preliminary survey of the existing digital collection and a detailed review of current best practices, we chose to base our recommendation of preservation metadata on a synthesis of various preservation metadata until the OCLC/RLG (2001) completes a national standard.The Draft Metadata ArchitectureThe extensive literature review revealed that effective metadata is our best way of minimizing the risk of digital resources becoming inaccessible. Metadata, to be most valuable, both for the users and owners, needs to be consistently maintained throughout the process. Creating documentation that governs and informs the metadata creation steps and procedures in a consistent and uniform manner is among the most important steps in metadata creation. The detailed workflow and user guide document provides procedural information required to create metadata with examples for different file formats. Since the metadata assigned to an item entirely depends on the metadata creators' definition of the work, the detailed user guide also provides rules, syntax and descriptive information to identify the source of information for each element.The following chart (Figure 3) illustrates the basic structure of the UNT Libraries draft preservation metadata contents. A detailed description of the recommended preservation metadata elements can be found in Appendix II of this paper. Figure 3. UNT Libraries' Preservation Metadata Structure. The following table (Table 1) describes the subheadings of each metadata elements. Table 1: Preservation Metadata Elements' Subheadings.Name of Sub-HeadingElement NameName of the elementDescriptionRemarkThe element name may not be identical to the name of the origin5 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlSub-elementOrigin(s)DefinitionDescriptionRequiredRepeatableExampleComment Indicate existence of sub-elements, labelsSource of the elementFurther explanation for clarification of the purpose of the element.A brief statement that defines the concept of the categoryIndicate whether the element value is mandatory or optionalIndicate if the element is repeatable or not repeatableLocal examplesNotes to clarify exceptions Mapping to other metadata standards will start soonFurther described in \"Description\" and \"Comment\" subheadings Yes/NoYes/No As can be seen from the sample element description in Table 2 below, each of the identified metadata elements are described under separate subheadings. For a complete list of recommended elements, see also Appendix II. Table 2: Sample Preservation Metadata Element Description.Element NameAccess InhibitorsOriginSub-elementDefinitionPurposeDefinitionRequiredRepeatableExampleComment NLA Description of any features of the digital resources intended to inhibit accessWithout this information the DR may not be able to be accessed, copied or migratedNoYesEncryption, watermarking, digital signature, password protection, etc.This information may be placed in the Documentation linked to the DRMetadata Creation WorkflowMost preservation metadata project managers acknowledged that the best practice is to create the metadata at the information creation stage. Hodge (2000) recognized that creation is where long-term archiving and preservation must start. The metadata routinely collected at the point of creation would be relatively easy, consistent, reliable, and automatic. Of course, the preservation and archiving process is made more efficient when the creators provide an indication of the long-term value attached to the information resources. More importantly, attention would be paid to issues of consistency in the process of metadata creation in the very beginning of the information life cycle.Much of the preservation metadata continues to be created \"by hand\" and after-the-fact. This problem is coupled with the fact that metadata creation is not sufficiently incorporated into the tools for the creation of elements' record to rely solely on the creation process. As standards groups and vendors move to incorporate XML and RDF architectures in their word processing and database products, the creation of metadata as part of the origination of the object will be easier.The following diagram illustrates the logical steps in creating metadata tags for digital resources in general. As can be seen in Figure 4, metadata can be incorporated into the digital resources (step 3-1), and/or can be stored in repositories separate from the resources it describes (step 3-2). When the metadata have been saved in their appropriate location, the process of metadata creation is considered to be complete. 6 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlFigure 4. Metadata Creation Steps. Metadata Creation and Editing ToolThere are various metadata creation tools and wizards available. For the purpose of testing and demonstrating our prototype, we selected the NoteTab Light program (http://www.notetab.com). Figure 5. Customized NoteTab Light Metadata Creation Tool. This freeware version of NoteTab metadata creation tool allows us to add and modify metadata elements and also copy metadata values either to be embedded in resources or to maintain in a different repository. This tool reduces the need for editors and data enterers to learn the syntax of the metadata. Issues and ChallengesWe are just at the beginning stages. We plan to develop a prototype for experiments and demonstrate the feasibility of preservation metadata at the UNT Libraries. Our initial prototype testing will be limited to the Government Documents' digital resources. The filesand Web pages will be modified to include the recommended metadata elements.Technical IssuesDuring the life cycle of digital resources, there are a series of processes that require various sets of hardware and software infrastructures. Similarly, as described in the metadata creation workflow, there is a series of managed activities that determine the appropriate hardware and software technologies to be used at each step of the preservation metadata creation process. These include:identifying the appropriate metadata creation tool,appropriate means for creating a metadata repository database,appropriate indexing and harvesting software and search engines to use,designing several interfaces for field searches and related considerations. 7 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlLessons LearnedA pretest activity will allow us to determine the resources required to implement a comprehensive preservation metadata project. Due to issues of cost, compliance, and heterogeneousness, we found the fewer elements that are required the better (provided that such minimummandatory metadata elements would not have any consequences on the preservation activities). In addition, the quality control analysis involves various levels of assessment, including examining the metadata records for consistency, reliability, adequacy, etc. We plan to work on a detailed analysis of the costs and benefits for the recommended metadata elements, including the amount of time, and the levelof skills required to create and manage a successful preservation metadata system. We have created a set of questions that provide scenarios about the issues and challenges of implementing the recommended preservation metadata system. These include:Is the preservation metadata system easy to use?Does the User Guide include a clear set of rules?Is it feasible to develop controlled vocabulary lists from the many files on the libraries' server to represent content and do so adequately?Is it feasible to consider creating the default values in the metadata creation tool for some of the mandatory fields?Is the preservation metadata system supported by the existing UNT Libraries' search engine topology? (Adaptability of existing schema?)Is the preservation metadata interoperable with current and future international standards? (Semantic, structural, and syntactical mapping issues?) ConclusionLike many others, UNT Libraries realize that being digital does not mean being accessible. Access to digital resources through descriptive metadata is only short-term. Preservation metadata plays a significant role in facilitating preservation decisions, detects preservation threats and provides measures for minimizing risks to long-term access. We anticipate that the management, storage and serving of large datasets will be greatly improved by the use of preservation metadata management tools.Finally, we will evaluate and assess the practical application of the whole process of metadata creation workflow and user guide documents. We expect a tremendous amount of discussion from all stakeholders regarding the types of metadata elements most useful to a specific requirement. Based on the feedback and input from the field, the preliminary versions will be reviewed and modified. Of course, the real test will be in the efficiency of our first migration. About the AuthorsDaniel Gelaw Alemneh is currently a doctoral student in information science, with a digital imaging specialty, at the University of North Texas (UNT). He is an IMLS fellow from Ethiopia, and received a Post-Master's Certification in Digital Image Management from the UNT in August 2000. Prior to that, he earned his Master's Degree in Library and Information Management from the University of Sheffield, U.K. Mr. Alemneh is employed as a Super Graduate Library Assistant in the Government Documents Department, and works on various digitization projects.E-mail: dalemneh@library.unt.eduDr. S.K. Hastings joined the faculty at the UNT in 1995. She is very active in state and national professional associations. Dr. Hastings has served as a resource person and presented a number of papers at varies professional meetings and conference programs including Curricula Development for Multimedia Librarians, Standards for Museum Information Managers and Index Access Points in the Retrieval of Digital Art Images, the changing role of information professionals/digital mangers. Dr. Hastings continues to research problems associated with the access, retrieval, and preservation of digital images, with particular emphasis on designing information communities for the 3D environment. She is principal investigator for a federally funded IMLS Library- Museum-University Collaboration project. You may view her various projects at http://www.courses.unt.edu/shastings/.E-mail: hastings@lis.admin.unt.eduCathy Nelson Hartman is the head of the Government Documents Departments at the UNT Libraries. She has been very active in state and national professional associations and serving as chairperson for a number of work groups, committees, and taskforces at state and national level. In addition, she has served as a resource person and presented a number of papers in various professional meetings including Computer in Libraries, ALA, Texas Library Association, and Depository Library Conferences. She has published a number of articles on digital resource management issues. Ms. Hartman is a successful grant recipient and she is project manager for several digital projects including the Cybercemetery, the Texas Register, and others. You may also view her various projects at http://www.library.unt.edu/govinfo/.E-mail: chartman@library.unt.edu 8 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlReferencesH. Besser, 2001. \"Digital preservation of moving image material?\" Accepted for publication in The Moving Image, at http://www.gseis.ucla.edu/~howard/Papers/amia-longevity.html, accessed 16 November 2001.H. Besser, 2000. \"Digital longevity,\" In: Handbook for Digital Projects: a Management Tool for Preservation and Access. Andover, Mass.: Northeast Document Conservation Center.S. Chapman, 2001. \"What is digital preservation?\" OCLC Symposium: Digital past, digital future: An Introduction to digital preservation, at http://www.oclc.org/events/presentations/symposium/chapman.shtm, accessed 9 May 2002.G. Hodge, 2000. \"Best practices for digital archiving: an information life cycle approach,\" D-Lib Magazine, volume 6, number 1, at http://www.dlib.org/dlib/january00/01hodge.html, accessed 18 January 2002.W. Moen, 2001. \"The Metadata approach to accessing government information,\" [Electronic version], Government Information Quarterly, volume 18, pp. 155-165.A. Mullen, 2001. \"GILS metadata initiatives at the state level,\" Government Information Quarterly, volume 18, pp. 167-180.National Library of Australia, 1999. Preservation metadata for digital collections. Exposure draft, at http://www.nla.gov.au/preserve/pmeta.html, accessed 24 October 2001.OCLC/RLG, 2001. Preservation metadata for digital objects: A Review of the state of the art. A White Paper by the OCLC/RLG Working Group on Preservation Metadata, at http://www.oclc.org/digitalpreservation/presmeta_wp.pdf, accessed 14 January 2002.Reference Model for an Open Archival Information System (OAIS), n.d. \"Draft Recommendation for Space Data System Standards,\" at http://www.ccsds.org/RP9905/650x0r1.pdf, accessed 26 January 2002.S. Sutton, 1999. \"Conceptual design and deployment of a metadata framework for educational resources on the Internet,\" Journal of the American Society for Information Science, volume 50, pp. 1182-1192.G. Waibel, 2001. \"Produce, publish and preserve: A Holistic approach to digital assets management,\" at http://www.bampfa.berkeley.edu/moac/imaging/index.html, accessed 19 September 2001.M. Zeng, 1999. \"Metadata elements for object description and representation: A Case report from a digitized historical fashion collection project,\" Journal of the American Society for Information Science, volume 50, pp. 1193-1208. Appendix 1. Acronyms used in this paper, with selected Web addressesAACRAnglo-American Cataloguing Rules.http://www.ala.org/editions/updates/aacr2/The AACR second edition, 1988 revision (AACR2) is used in the preparation of bibliographic records by OCLC participants as well as by most libraries in the United States. Requests for changes in the rules go to the American Library Association (ALA), Association for Library Collections and Technical Services (ALCTS), Committee on Cataloging: Description and Access (CC:DA). CC:DA submits proposals for changes in the rules to the Joint Steering Committee for Revision of AACR (JSC). This international body, after appropriate consultation with all countries involved, issues changes to the rules.Advisory Commission on Intergovernmental Relations. http://www.library.unt.edu/gpo/ACIR/The ACIR is a permanent, independent, bipartisan agency that was established under U.S. Public Law 86-380 in 1959 to study and consider the federal government's intergovernmental relationships and the nation's intergovernmental machination.Art Museum Image Consortium. http://www.amn.org/AMICO/The AMICO is a not-for-profit organization of institutions with collections of art, collaborating to enable educational use of museum multimedia.Access to Network Resources. http://ukoln.ac.uk/elib/lists/anr.htmlANR is part of the Electronic Libraries Programme (eLib), which established by the Joint Information Systems Committee (JISC) (U.K.). The main aim of the eLib programme, through its projects, is to engage the Higher Education community in developing and shaping the implementation of the electronic library.American National Standards Institute. http://www.ansi.org/default.asp ACIR AMICO ANR ANSI9 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlANSI is a private, non-profit organization founded on 18 October 1918. The Institute's mission is to enhance both the global competitiveness of U.S. business and the U.S. quality of life by promoting and facilitating voluntary consensus standards and conformity assessment systems, and safeguarding their integrity.Creative Archiving at Michigan and Leeds: Emulating the Old on the New.CAMiLEON CEDARS CDWA CIMI CURL DC DOI FGDC GILS ISO MARC METS 10 of 18http://www.si.umich.edu/CAMILEON/CAMiLEON is a research project that is investigating emulation as a digital preservation strategy. The project is a collaborative effort of researchers at the School of Information, University of Michigan (USA) and the University of Leeds (U.K.). CURL Exemplars in Digital Archives, U.K.http://www.leeds.ac.uk/cedars/ CEDARS began in April 1998 and ended in March 2002. Its broad objective was to explore digital preservation issues. These range through acquiring digital objects, their long-term retention, sufficient description, and eventual access. Categories for the Description of Works of Arthttp://www.getty.edu/gri/standard/cdwaCDWA is a product of the Art Information Task Force (AITF), which encouraged dialog between art historians, art information professionals, and information providers so that together they could develop guidelines for describing works of art, architecture, groups of objects, and visual and textual surrogates. The Categories describe the content of art databases by articulating a conceptual framework for describing and accessing information about objects and images. They also provide a framework to which existing art information systems can be mapped and upon which new systems can be developed. Computer Interchange of Museum Informationhttp://www.cimi.org CIMI is a consortium of cultural heritage institutions and organizations that work together to bring rich cultural information to the widest possible audience. Consortium of University Research Libraries, U.K.http://www.curl.ac.uk/ CURL's main objective is to promote, maintain and improve library resources for research, learning and teaching in research-led universities in U.K. Dublin Corehttp://purl.oclc.org/dc/The DC Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. Its activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices. See also OCLC. Digital Object Identifierhttp://www.doi.org/The DOI is a system for identifying and exchanging intellectual property in the digital environment. It provides a framework for managing intellectual content, for linking customers with content suppliers, for facilitating electronic commerce, and enabling automated copyright management for all types of media. Federal Geographic Data Committeehttp://www.fgdc.gov/ The FGDC coordinates the development of the National Spatial Data Infrastructure (NSDI). The NSDI encompasses policies, standards, and procedures for organizations to cooperatively produce and share geographic data. Government Information Locator Servicehttp://www.access.gpo.gov/su_docs/gils/The GILS is an effort to identify, locate, and describe publicly available Federal and state information resources. GILS is a decentralized collection of agency-based information locators using network technology and international standards to direct users to relevant information resources within the Federal Government. International Organization for Standardizationhttp://www.iso.ch/iso/en/ISOOnline.frontpageThe ISO is a worldwide federation of national standards bodies from some 140 countries, established in 1947. The mission of ISO is to promote the development of standardization and related activities in the world with a view to facilitating the international exchange of goods and services, and to developing cooperation in the spheres of intellectual, scientific, technological and economic activity. Machine Readable Cataloginghttp://www.loc.gov/marc/ The MARC formats are ANSI/NISO, (Z39.20) standards for the representation and communication of bibliographic and related information in machine-readable form. Metadata Encoding and Transmission Standardhttp://www.loc.gov/standards/mets/The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed 11/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlas an initiative of the Digital Library Federation.The Making of America IIMOAII http://sunsite.berkeley.edu/MOA2/The Making of America II is a Digital Library Federation project to create a proposed digital library object standard by encoding defined descriptive, administrative and structural metadata, along with the primary content, inside a digital library object.National Information Standards OrganizationNISO http://www.niso.org/NISO founded in 1939 as a non-profit association accredited by the American National Standards Institute (ANSI), identifies, develops, maintains, and publishes technical standards to manage information in our changing and ever-more digital environment. NISO standards apply both traditional and new technologies to the full range of information-related needs, including retrieval, re-purposing, storage, metadata, and preservation.National Library of AustraliaNLA OAIS OCLC PANDORA RDF RLG SGML TRAIL VRA XML W3C11 of 18http://www.nla.gov.au/preserve/The NLA is among the pioneer institutions in digital preservation research. Its preservation activities' page provides links to its various initiatives and projects. The set of preservation metadata developed by NLA is invaluable resource.Open Archival Information Systemhttp://www.ccsds.org/RP9905/The OAIS Reference Model has been of great value in providing a comprehensive and consistent frame of reference that encompasses many of the issues surrounding the creation of digital repositories. See also CCSDS.Online Computer Library Centerhttp://www.oclc.orgOCLC is a nonprofit membership organization serving 41,000 libraries in 82 countries and territories around the world. Its mission is to further access to the world's information and reduce library costs by offering services for libraries and their users. OCLC is the leading global library cooperative, helping libraries serve people by providing economical access to knowledge through innovation and collaboration.Preserving and Accessing Networked Documentary Resources of Australiahttp://pandora.nla.gov.au/index.htmlPANDORA is an archive of the National Collection of Australian Online Publications copied with the publisher's permission and preserved and made available for the future.Resource Description Frameworkhttp://www.w3.org/RDF/The Resource Description Framework (RDF) integrates a variety of applications from library catalogs and world-wide directories to syndication and aggregation of news, software, and content to personal collections of music, photos, and events using XML as an interchange syntax. The RDF specifications provide a lightweight ontology system to support the exchange of knowledge on the Web.The Research Libraries Grouphttp://www.rlg.orgTRLG is a not-for-profit membership corporation of over 160 universities, national libraries, archives, historical societies, and other institutions with remarkable collections for research and learning. Rooted in collaborative work that addresses members' shared goals for these collections, RLG develops and operates information resources used by members and nonmembers around the world.Standard Generalized Markup Language.http://www.w3.org/MarkUp/SGML/SGML is an international standard for the definition of device-independent, system-independent methods of representing texts in electronic form. See also XML.Texas Records and Information Locator Servicehttp://www.tsl.state.tx.us/trail/TRAIL provides access to Texas State government information contained in electronic publications. TRAIL facilitates ready access to the information for Texas citizens and other users.Visual Resource Associationhttp://www.vraweb.org/The VRA is a firmly established association with over 600 active members and devoted to advancing knowledge, research, and education in the field of visual information resources.Extensible Markup Languagehttp://www.w3.org/XML/The XML is the universal format for structured documents and data on the Web. It is a simple, very flexible text format derived from SGML (ISO 8879). XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web. The base specifications are XML 1.0, W3C Recommendation February 1998, and Namespaces, January 1999.World Wide Web Consortiumhttp://www.w3.org/11/13/2006 3:28 PM A Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.html Z39.50 The W3C develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for information, commerce, communication, and collective understanding.refers to the International Standard, ISO 23950: \"Information Retrieval (Z39.50): Application Service Definition and Protocol Specification\http://www.loc.gov/z3950/agency/Appendix 2. Draft Preservation Metadata ElementsElement's Name Origin Sub-Element Definition Digital Resource Description This element will record the name given to the Any form of the title used as an Texas Register:resource. Volume 27 Typically, a Title alternative to Number 8will be a name by the formal title of the resource.which the resource is formally known.This element will record the entity An Author or The Office of primarily Creator include the Texas responsible for a person, an Secretary of making the organization or State.content of the a service.resource.Purpose ExampleNoteCommentsRemarksTitleOAIS (Reference Information) CreatorOAIS (Reference Information) DateThe date the resource was This element will created or record a date became associated with available in its an event in the current form, or life cycle of the the date that it resources. was last Recommended modified. best practice for Therfore, the encoding the date following value is defined qualifiers may in a profile of be used: ISO 8601 - created - [W3CDTF] and creation date of follows the the resourceYYYY-MM-DD - modified - format. But if the modification full date is date of the unknown, month resourceand year - issued - date (YYYY-MM) or on which the just year (YYYY) resource was may be used.made formally available.2002-02-25. Persistent IdentifierOAIS (Reference Information) NLA (Persistent Identifier)An identifier or 'permanent An identifier or name' for an 'permanent name' object that for a digital identifies it URL, ISBN, ISSN, File-Name, ... resource that uniquely, identifies it uniquely and enables links to metadata about persistently.it, and to other 12 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlobjects related to it.These This element will identifiers will record different serve as kinds of pointers to the III catalogue address, OCLC no, ... identifiers such as metadata Local Control information in Numbers.the local system.Contains a It describe the description of the physical nature original of the original information item from resource prior to which the created (or digitized modified) as the content was current digital produced.form.Use this element if the original is not born Digital or different from the current format.Other (Metadata) Identifier Original Content TypeCedars (Object Origin)RelationNLA (Relationships)-OAIS - Context Information (Relation)Specifies any other information -Is part of a higher aggregation, resources, which e.g., this is part (section) of were judged, to [Identifier]be significantly - Contains the lower component related to the (repeatable), e.g., contains [Unique digital resource Identifier]being described It is essential to - Relation to the primary digital maintaining a necessary for resources, e.g., this is the html history of the preservation version of [Unique Identifier of change of a management. It primary digital resources]also enables a related digital - Related to accompanying digital resource to information information resources, e.g., source.be linked to accompanied by map [Unique earlier or later Identifier of the accompanying editions of it, information resources]other forms of it, - Linked to previous and/or to its metadata, following in a migration sequence, and other objects, e.g., was migrated from or to including finding [Unique Identifier]aids.Class of digital object represented by the digital resource.Choice of appropriate Still image, sound, text, data base, preservation Web document, executable strategy program, etc. (List of MIME types depends on may serve as a useful reference).knowing structural type.Structural typeNLA Technical infrastructure (of complex digital resource)NLA Internal Structure Managing - e.g-1. Web page (consists of one of complex preservation ASCII HTML file, along with three digital resource: requires embedded static GIF files and one i.e. an managing the embedded audio WAV file)enumeration of structure of - e.g.-2. CD-ROM containing 22 the components complex digital files (14.gif image files, 3.wav of a complex resource as well audio files, 3.txt files and 2.ex object, along with as their executables assembled in their components.accordance with ISO 9660.interrelationships.File-format and version, Technical resolution, Describe Specifications of type-specific dimensions in the digital pixels; colormetadata resource(s) or palette; essential for file(s) comprising compression, managing a content Data other-info,preservation.Object.e.g. Image (TIFF v.4.0, 600 This metadata should apply to file formats which are used to directly render or access content, rather than file formats which are used for storage Depending on the local requirements and type of dr, it can further breakdown for various classes of DR.File descriptionNLA 13 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmldpi, dimensions convenience in pixels; color (e.g., ZIP or TAR palette; files).compression algorithms).DR is in the form of a ZIP file, which must be unpacked and Enables access stored on local Any specialized to objects with hard drive in a procedures specified special needed to install directory tree installation an object.prior to use; requirements.computer must be re-booted after installation, etc.Installation requirements*NLA *Information pertaining to IR (with System Requirements) may be placed in the Documentation linked to the DR.File-SizeRLG/OCLC (size) NLA (storage Information)Necessary for managing the object within the archive system. E.g. This element is migration of important for Describes details storage media dissemination of the storage to CD-ROM purposes as some Size of DR: 1.4 requirements of might require versions of MB.the digital this Windows cannot resources (in information, accept files bytes).since standard greater then 2 CD-ROMs GB.have a maximum capacity of 700 MB.Without this Description of information the Encryption, any features of DR may not be watermarking, the digital able to be digital signature, resources accessed, password intended to copied or protection, etc.inhibit access.migrated.Description of any system or method used to enhance access to information within the content of the digital resource, which need to be maintained in successive generations.Any characteristic that may appear as a loss in functionality or change in the look and feel of the digital resource resulting from the preservation processes and procedures.Time markers in audio or video files, navigational links in a hypertext document, CD type ID points linked to file, Metadata description, etc.- Web page: has been migrated from HTML to PDF (as a result, hyperlinks are broken; embedded JavaScript application no longer functional).- The shockwave files This element describes peculiarity or exceptions that occur as a result of digitization, migration, and other processes in the preservation cycle.See also Functionality.This information may be placed in the Documentation linked to the DR.Access Inhibitors*NLA Access Facilitators*NLA Enable the aids and facilitators to be taken into account in any preservation process.Information about Access Facilitators can be placed in the Documentation linked to the DR.Exceptions Vs. Functionality: - one as the \"negative\" of the other. Functionality metadata records all of the attributes, which still exist in the current instance of the DR. ExceptionNLA (Quirks) Help to assess the success (or otherwise) of preservation strategies, and prevent time being spent on trying to solve problems that were inherent in the object at the time the strategy was 14 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlapplied.could not be captured from the source document.FunctionalityRLG/OCLC Description of any functional or \"look and feel\" attributes of the rendered digital resource (in regards to its current manifestation).Enumerate the set of - Web page: functional contains an properties interactive exhibited by the JavaScript digital resource application and relative to the embedded current stage of animations.the preservation cycle.See also Exception.Conversely; the Exceptions (NLA's Quirks) metadata lists all of the attributes, which no longer exists as part of the current DR. [I.e. E+F=original (all) attributes].RLG/OCLCDocumentation Link the digital Supporting resource to documentation supporting Manual, necessary/useful documentation Procedure, for display and/or useful for Glossary, etc.interpretation of rendering and the digital understanding resource.its content.LocationLocation of Documentation.Link the digital URL, File name, resource to the etc.document.Sub-Element RLG/OCLC Technical Environment Descriptions2.1 Software- RLG/OCLC (Display/Access Application)- CEDARS (partly Render/analyze engine, Input formats, output formats)Identification of software program Translate the - Internet capable of archived byte Explorer 6.0,displaying or stream into - Adobe Acrobat accessing the human-readable Reader 4.0, etc.content of the content.digital resource.Specify if it is the minimum or recommended environment?Access Application(Software Name and versionRLG/OCLCLocationOperating System (Name and Version)NEDLIBThis, (description of where the required Access Application can be obtained), may take the form of anything Location of the ranging from Access Link digital manufacturer Application resource to information, to a needed to display compatible pointer (e.g. and/or access the Display/Access URL) to the digital resource's Application.location of content.where the Access Application can be directly obtained (e.g., via download, or through the archive itself).Identify Windows According to - SpeName/designation operating (Windows 3.1, NEDLIB, for e.g. the and version of the environment Windows 95, Windows NT is a Operating System used by the Windows 98, general operating recoor software envrendering Windows ME), environment, platform upon - Nprograms of the Windows NT, characterized which rendering recDR and also Linux, Apple, perhaps by a programs operate.distinguish Solaries, etc.particular look 15 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlbetween different versions of an operating environment, which could potentially impact the ability to access the DR.and feel and set of functionality. metadata for Windows NT OS name and 4.0, however, is a OS Version specific separately.implementation of the Windows NT environment.URL to download OS from manufacturer, or from a digital repository holding an archived copy of the OS. Also could be include the location of an emulator for this environment.LocationLocation of working copy of the Operating System.Link DR to compatible Operating System. RLG/OCLCLocation of Link the OS supporting metadata to E.g. URL of Documentation documentation supporting Users' Manual, Locationuseful for documentation Glossary, etc.operation or use useful for of the OS.operation.2.2 HardwareCould be a Description of Ensure that general Microprocessor users' obtain specification specifications sufficient (e.g. 333Mz), or necessary to processing a particular operate the power to run microprocessor content of the the software (e.g. Intel DR's software necessary to Pentium II 333 environment.display the DR.Mz). NEDLIB Microprocessor RequirementsRLG/OCLCLink the Location of Microprocessor supporting E.g. URL of metadata to documentation Documentation Users' Manual, useful for supporting Locationoperation or use documentation Glossary, etc.of the useful for Microprocessor.operation.Description of any permanent Ensure that User must have storage resources users' obtain 33 MB of hard necessary for the sufficient disk space free operation of the storage in order to software resources to install/run the environment and display/render software or rendering of the digital environment.the digital resource.resource. NLA RLG/OCLCLocation of Link the supporting storage E.g. URL of documentation metadata to Documentation Users' Manual, useful for supporting Locationoperation or use documentation Glossary, etc.of the useful for Microprocessor.operation. 16 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlInformationDescription of Describe the Sound card, additional complete set of speakers, a equipment physical monitor with a needed to resources particular render/display of necessary to resolution, the digital access to CD-ROM drive, resource.object's content.etc.NEDLIB Peripheral RequirementsRLG/OCLCLink the Location of Peripherals supporting E.g. URL of metadata to Documentation documentation Users' Manual, supporting useful for Locationdocumentation Glossary, etc.operation or use useful for of Peripherals.operation. Location of HardwareRLG/OCLC Location of the Link DR to physical devices compatible needed to render Hardware the digital Environment.resource.Description of where the required Hardware Environment can be obtained. This may take We may choose the form of This may take the to describe anything only a form of an ranging from enumeration of minimum or contact all possible (recommended) information for hardware environments.a \"technology environment.museum\" to the location of emulation programs (perhaps maintained by the UNT itself). All relevant details of any process applied to a digital resource, This element will including have sub specific settings elements such as:or actions that -Action:-were required This element describes what This field to produce the documents what The series of was done to probably store digital has happened to a linked records change the resources. This information particular digital pertaining to original digital such as the information is resource. It the digital resource.element wasessential to describes any resources document what disintegrated -Policy Applied:-changes made, builds up a preservation into its integral this element will from the time of change history methods have parts or change serve as a pointer creation of the over time.in Format.to existing been applied to digital resource.policies relating the digital to system resources and processes like how the various migrations.copies or formats of digital resources might differ from each other.The date on We may record which the the most recent YYYY-MM-DD preservation date on which format.metadata record the preservation Alteration history- OAIS Provenance Information (Modification history)- NLA (Process)Preservation Metadata Creation Date 17 of 1811/13/2006 3:28 PMA Metadata Approach to Preservation of Digital Resourceshttp://www.firstmonday.org/issues/issue7_8/alemneh/index.htmlwas created.metadata was updated.System-generated log could be one way of recording preservation metadata creator information.Preservation Metadata CreatorNLA (Record Creator)This element The names of record individuals who responsibility have contributed for the metadata data to this creation and/or metadata record.alteration.NoteAny other information relevant to the preservation of the digital resources.This element Not encourage to will serve as a Free form text. use this element.catch all note field.Editorial historyPaper received 17 May 2002; accepted 22 July 2002. Copyright ©2002, First MondayCopyright ©2002, Daniel Gelaw AlemnehCopyright ©2002, Samantha Kelly HastingsCopyright ©2002, Cathy Nelson HartmanA Metadata Approach to Preservation of Digital Resources: The University of North Texas Libraries' Experience by Daniel Gelaw Alemneh, Samantha Kelly Hastings, and Cathy Nelson HartmanFirst Monday, volume 7, number 8 (August 2002),URL: http://firstmonday.org/issues/issue7_8/alemneh/index.html18 of 1811/13/2006 3:28 PM