Archive Impact

 

To browse this glossary, click the corresponding letter of the alphabet below

A | B | C | D | E | F | G | H | I | J | K | L | M |

N | O | P | Q | R | S | T | U | V | W | X | Y | Z


A_________________________________    Top

Antonym is a word of opposite meaning. 1 See Associative Relationships

Archive is a central place to store and maintain records and historical materials (regardless of format) created by an organization, government or individual(s).

Associative Indexing is a method of automatic indexing that augments the terms found in documents with related terms obtained from a term association map. A term association map is a vocabulary tool that shows the similarity between terms based on the co-occurrence of the terms in the database documents. 16

Associative Relationship demonstrate a link between terms in a hierarchical thesaurus, but is not part of the hierarchy. See Antonym, Synonym and Related Term

Audio Logging is the transcribing of the spoken work into text using voice recognition technology. See Text Extraction

Automatic Indexing is the use of algorithms (software) to analyze the contents of records, such as bibliographic entries, and assign keywords that represent the content of the given database entries. The techniques used to determine appropriate keywords from the contents of database entries include phrase detection, thesaural lookup, linguistic analysis, statistical analysis, and term occurrence probabilities. 16

B_________________________________    Top

Back-file Conversion is an imaging process that converts records into a digital format. Reasons for a back-file conversion are space savings, preservation, instant access, cost-savings, and legal requirements.

Bandwidth is the rate at which data can be transmitted over a line or network connection. 2

Bit is short for "binary digit" and is the smallest unit of information in a computer system. It has one of two values: on (represented by the number 1), or off (represented by 0).

Bitmap Image is composed of bits and is displayed on the screen as pixels. Bitmap images lose resolution as they are enlarged. A bitmap image is sometimes identified by the extension .bmp on the filename.

Bit Rate is the number of binary digits that pass a given point in a telecommunication network in a given amount of time, usually a second. The term bit rate is a synonym for data transfer speed (or simply data rate). 3 Generally speaking, when used to refer to digitally encoded video files, the higher the bit rate, the higher the quality of the image. A 56 kb/s file is suitable for World Wide Web display of transmission on modem lines. Comparatively, a 5 Mb/s file is suitable for VHS.

Boolean Search is a search allowing the inclusion or exclusion of documents containing certain words through the use of operators such as AND, NOT and OR. 4

Broader Term (BT or B) is a term in a subject thesaurus or indexing system which includes as a more specific subclass the descriptor or subject heading under which it is listed (example: "Libraries" listed under "School libraries"). A descriptor or subject heading may have more than one broader term. Compare with narrower term and related term. 5 See Hierarchical Relationships

Byte is a string of eight bits, which is the number needed to store one character such as a letter or a number. It is also the standard measurement unit of a file.

C_________________________________    Top

CD-ROM (Compact-Disk-Read-Only-Memory) is a type of optical disk that stores data up to 640 Megabytes (MB). The designation "read-only" means once stamped with information, the disk can only be viewed and not written over.

CGM (Computer Graphics Metafile) is the ANSI (American National Standard Institution) standard for vector and bitmap images.

CMYK (Cyan, Magenta, Yellow, and Black) is a subtractive color model used for printing on paper.

Case Sensitive [Sensitivity] is a program's ability to distinguish between uppercase (capital) and lowercase (small) letters. Programs that distinguish between uppercase and lowercase are said to be case sensitive. A case-sensitive program that expects you to enter all commands in uppercase will not respond correctly if you enter one or more characters in lowercase. It will treat the command RUN differently from run. Programs that do not distinguish between uppercase and lowercase are said to be case insensitive. 6

Cataloging is the "preparation of bibliographic records which entails recording descriptions and determining all points of access to the record." Cataloging rules are spelled out in Anglo-American Cataloging Rules (AACR2). MARC (machine-readable cataloging) format records are used as the standard in recording electronic bibliographic data. 23

Chain Indexing is a subject categorization scheme in which terms describing objects (typically documents) are linked or chained together on the basis of a set of rules in a hierarchical relationship. 16

Classification is the process of grouping like terms into classification groups or classes. The classes may exhibit a variety of properties, such as monothetic, polythetic, exclusive, overlapping, ordered, and unordered. In textual systems, one generally classifies either the documents into classifications groups or the keywords into like groups. 16

Closed Captioning is a service for persons with hearing disabilities that translates television program dialog[ue] into written words on the television screen. 7. Closed captioning is not visible without the use of a specially installed decoder. See Text Extraction

Clustering is the grouping of items in a database so that members of a cluster exhibit similarities to each other and dissimilarities to other clusters. In retrieval systems, the items in a cluster are often retrieved together in response to a query. For each cluster a composite item, called the centroid, can be generated that represents the cluster and is used as the basis for retrieval of the cluster. There are two classes of clustering methods: hierarchical methods (such as Ward's Method), which produce a nested set of clusters, and non-hierarchical or partitioning methods (such as single-link methods), which produce a single layer of clusters. Clusters may overlap (i.e., some items may occur in more than one cluster). 16

Concordance, or inverted list, is a data structure for indexing textual data records by the substantive terms or keywords associated with each record. The inverted list is an index for each keyword containing the location of each occurrence of the keyword in the database. 16

Conservation is the treatment of maintaining archival materials to stabilize them chemically or strengthen them physically, sustaining their survival as long as possible in their original form. 23

Content is the digital representation of a multimedia object. Content may be a still, moving or 3-D image; a text document; an audio file or a compound document (compound documents consist of more than one page, or combine video and audio content). Information about the physical or intellectual characteristics of content is called metadata.

Content Attribute is a physical characteristic of a multimedia object. An attribute may include physical dimension, pattern, chroma, file format, resolution, color mode, and/or compression.

Controlled Vocabulary is a listing of terms that specifies which terms may be used for indexing and the relationships among them, produced by the process of vocabulary control. 8

Coordinate Indexing is a method of post-coordinate indexing based on the assignment of keywords that capture the concepts covered by a document. These keywords (or word symbols) are then used as the basis for subsequent retrieval of the records. 16

Copyright is the exclusive right to publish or sell a book, composition, photograph, work of art, software program, etc. The government for a defined time period grants this right.

D_________________________________    Top

DPI (Dots-Per-Inch) is a measure of the resolution. It can refer to a printer, scanner, or monitor. It is the number of dots in a one-inch line. The more dots per inch, the higher the resolution.

Database is a program that captures data for the purpose of being able to manipulate its order, retrieval, input, output, etc.

Digital Archive is an ideal; a focal point database and storage facility for all formats and media of an organization's institutional knowledge and history.

Digitize is to make digital. Digitization transforms paper or analog documents to electrical/computer or digital format.

E _________________________________    Top

Encoding is the process by which analog tape is converted into a digital file. The file can be preserved in a number of different formats. RM is a standard format for real media files. MPEG is another standard. Time code is converted from a SMTPE time code into a proxy time code. SMTPE time code from videotape is measured in 30 frames per second, while proxy time code is measured by milliseconds. This can create a number of difficulties when indexing proxy files.

F_________________________________    Top

Face Recognition is a type of pattern recognition specific to identifying human faces.

Faceted Classification is a style of pre-coordinate subject description (typified by Universal Decimal Classification and Ranganathan's Colon Classification) which provides a flexible system for generating controlled vocabulary subject classification. Techniques and guiding principles are used to build up the vocabulary and the relationships among the terms of the vocabulary rather than a hard and fast classification scheme of subject headings. 16

Fuzzy Logic is a multi-valued (as opposed to binary) logic developed to deal with imprecise or vague data. Classical logic holds that everything can be expressed in binary terms: 0 or 1, black or white, yes or no; in terms of Boolean algebra, everything is in one set or another but not in both. Fuzzy logic allows for partial membership in a set, values between 0 and 1, shades of gray, and maybe-it introduces the concept of the "fuzzy set." 9

Fuzzy Logic Search is a search that matches close approximations of words. These can include matches from misspelled words or words hidden within other words. For example, position could be positioning, positioned, or possibly positions. 10

G_________________________________    Top

GIF (Graphics Interchange Format) is a bitmap file format for graphics sometimes identified by the .gif extension on the filename. GIF images support up to 256 colors.

H ________________________________    Top

Hierarchical Relationships is a structured organization of controlled vocabulary terms in association with other terms or categories of terms. Examples of narrower terms in succession are: Animals; Mammals;Dogs; Working Dogs; German Shepherds. See Narrower Term and Broader Term

I _________________________________    Top

Imaging is a process, not a product. It combines technology and information management to create a digital archive.

Imaging Technology is the hardware and software needed to create digital files from analog material. The technology is divided into input devices and output devices.

Index is an auxiliary data structure used to speed up access to a data set (e.g., a file of records) in which a pointer to each record of the data set is stored. The pointers in the index are accessed on the basis of a key value of each record. The index may actually contain the key values and the pointers, or the key may be used to generate the address of the pointer in the index, perhaps by hashing. The indexing can be used both to provide an order to the data records and to provide direct access to records in the data set. 16 An index is NOT a list of the all the terms in a data set. See Concordance

Input (Imaging) Hardware includes the scanner, monitor/display device, computer network or PC, and storage device. Imaging software comprises capture software that works with the scanner, Optical Character Recognition software, system (which is also connected to the Information Management system), and compression software.

J _________________________________    Top

JPEG (Joint Photographic Experts Group) is a graphics storage format identified by the extension .jpg on the filename. The JPEG format uses Lossy compression, which is a compression technique that results in lost data.

JPEG 2000 is the ISO (International Organization for Standardization) answer to an integrated standard format. The proposed format encapsulates the Digital Imaging Groups Flashpix's features of independent-resolution, independent size, metadata, and an unambiguous color model.

K_________________________________    Top

KWIC (KeyWord In Context) is a simple printed index for textual material in which keywords in the text are sorted alphabetically and presented linearly, surrounded by portions of the preceding and following text for context. 16

KWOC (KeyWord Out of Context) is a simple printed index for textual material in which keywords found in the text are sorted alphabetically and presented linearly, followed by the original string for context. Sometimes the keyword is replaced by a character such as "^" in the context string. 16

Keyword is 1] a word used in a text search 2] a word in a text document that is used in an index to best describe the contents of the document, and 3] a reserved word in a programming or command language. 11

Key Frame is a snapshot image of each of the scene changes. Key frames may be automatically generated at specified intervals or scene changes or manually generated by the user. A series of consecutive key frames is called a storyboard.

L _________________________________    Top

Latent Semantic Index is an approach to automatic indexing that is based on the assumption of an underlying association or correlation of the terms used in documents, and the content of the documents for retrieval purposes. Most of the techniques used to determine the relevant associations begin with the term occurrences from which term similarities and term associations can then be calculated. 16

Licensing is the act of giving formal legal permission to reproduce or sell a copyrighted work.

Lossy Compression is a compression technique that loses some data during compression and file restoration.

M ________________________________    Top

MPEG (Moving Picture Experts Group) is a digital video Lossy compression format. It can be identified by the file extension .mpg

MPEG-1 is a coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s…. the standard on which such products as Video CD and MP3 are based…. 13

MPEG-2 is designed to produce higher quality images at higher bit rates (e.g., 720x485 studio quality CCIR-601 images at up to 1.5 Mbits/sec). 12

MPEG-7 is the content representation standard for multimedia information search, filtering, management and processing (to be approved July 2001). 13

Mapping [field] is used to identify where metadata will be placed during data migration. The contents of a field in one database are designated for placement into a specified field in another database. The fields may or may not have matching names.

Media Archive is a type of archive that specializes in storing and maintaining media materials such as photography, digital files, graphics, sound recordings, video and film, etc.

Media Assets are the digital and traditional media formats (video, files, film negatives, audio, photographic prints, slides, graphic materials), that have resale or reusable value for an institution.

Mediagraphical is any integrated multimedia information system that includes a wide range of data types, such as audio, graphical, textual, and pictorial. 16

Meta is a prefix often used in information science terminology, such as "metadata" and meta knowledge. Meta X means "X about X", so that metadata means "data about data" (e.g., data dictionaries), and meta knowledge means "knowledge about knowledge" (e.g., knowledge structures). 16

Metadata is the data that describes, for purposes of retrieval or classification, the overall architecture and format of the document or file. It is used extensively on the World Wide Web. Recently, Metadata has been linked with imaging and indexing as a way to create a standard information infrastructure for multimedia databases.

Micron is an icon that contains a moving video clip that allows the user to select from a variety of videos based on a sample of each video. Microns could be used to represent other dynamic data sets such as simulations. 16

Moving Image Document (MID) is a database item defined as a matrix of sequential phenomena (i.e. a string of images plus strings of sounds) that is synchronized by time (a micron). A major difficulty in retrieval of MIDs is the lack of easily defined units, such as text, which can be used for indexing or abstracting purposes. 16

N _________________________________    Top

Narrower Term (NT or N) is a term in a subject thesaurus or indexing system which is a subclass of the descriptor or subject heading under which it is listed (example: "Picture Books" under "Books"). A descriptor or subject heading may have more than one narrower term. Compare with broader term and related term. 5 See Hierarchical Relationships

Natural Language Processing (NLP) is a range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of particular tasks or applications. 14

O_________________________________    Top

Object-oriented Database is a database management system that facilitates the management of objects rather than records. Viewing the data as objects, instead of as records, provides more flexibility in the data types used and removes the need to normalize the data. Since objects can contain other objects as sub-components, these databases can implement inheritance hierarchies. 16

Optical Character Recognition (OCR) is software that works in tandem with the scanner and "recognizes" the characters the scanner "sees" (only letters and numbers in certain fonts) and converts them from the original analog format to digital format.

Output (Imaging) is the final function of the imaging process. Traditionally this means the printed paper; but in the digital world, output has taken on different formats and characteristics. The information can be kept as a digital file, viewed on a monitor, compressed, transferred, or stored. Other output forms of digital information are printed paper, video, photograph, negative, slide, or transparency.

P _________________________________    Top

PCX (Picture Image) is a bitmap file format supported by many programs. It was developed by Microsoft Paintbrush and can sometimes be identified by the file extension .pcx.

POPSI (Postulate-based Permuted Subject Index) is a string indexing algorithm based on Colon Classification, a faceted classification scheme used widely in India. The indexer assigns the appropriate faceted description and the string permutations of that description are generated automatically for the index. 16

PRECIS (Preserved Context Indexing System) is a string indexing system developed by Derek Austin at the British Library in the early 1970s for subject indexing of the British National Bibliography. The terms in the PRECIS string are arranged and connected by relationships found in the original text or context, rather than from classification scheme. The human indexer chooses the base string (i.e. terms and the relationships) and the subsequent permutations of the base string are performed by a computer. 16

Pattern Recognition is a branch of artificial intelligence concerned with the identification of visual or audio patterns by computers. For the computer to recognize the patterns, the patterns must be converted into digital signals and compared with patterns already stored in memory. Some uses of this technology are in character recognition, voice recognition, handwriting recognition, and robotics. 15 Pattern recognition may also be used in image applications for the analysis of image composition (looks-like-this functionality) or face recognition.

Post-coordinate System is a style of indexing in which the relationships of the indexing terms and database entries are not fixed at the time an entry is added to the database, but rather the user can combine and manipulate the indexing terms at query time. The Boolean combination of keywords drawn from the full text of records that is common to most information retrieval systems is an extreme example of post-coordinate indexing. 16

Pre-coordinate System is an indexing method that establishes, at the time an entry is added to the database, the access points for that entry (typically bibliographic). 16

Preservation is the activity associated with maintaining archival materials for use, either in their original physical form or in some other format. 23

Proximity Operator is an operator sometimes used in conjunction with Boolean queries to approximate the specification of phrases by requiring that given keywords occur within a certain number of words. To process this operator, either a word position must be kept for each occurrence in the inverted file or a scan of the text of the selected records must be made at retrieval time. 16

Q_________________________________    Top

R ________________________________    Top

RGB (Red, Green, and Blue) is a color scheme for screen display. RGB images can be converted to CMYK for printing.

Re-purposing is the sharing and reusing of digital and traditional media that was originally intended for graphic designs, ads, manuals, and Web sites. Institutions now recognize the value of their media assets and the need to synthesize their digital images into a collective central digital archive.

RelatedTerm (RT or R) is a term in a subject thesaurus or indexing system which has a close conceptual relationship to the descriptor or subject heading under which it is listed, but is not related hierarchically (example: "Media Specialists" listed under "School Libraries"). A descriptor or subject heading may have more than one related term. Compare with broader term and narrower term. 5 See Associative Relationships

Relevancy is the degree to which a retrieved item is judged by the user to be "about" the topic of the query. Relevancy evaluations tend to be subjective as they depend on the user's perception of the relatedness of the given item to some information need, which may or not, in fact, have been expressed entirely by the query. Relevancy is used in most measures of retrieval effectiveness such as precision and recall. 16

Resolution is often referred to as dpi (dots-per-inch). The dpi of an image is measured as the number of rows by columns. The greater the number of dots, the higher the resolution.

S _________________________________    Top

Search Term is a term expressing an information need or query in the language and format acceptable to the specific system [in an information retrieval system]. Search terms may be combined to form a search statement. 17

String Indexing System is a form of document indexing characterized by a string or set of indexing terms for each entry. The terms within each string are connected by relationships to each other according to a set of rules for the particular scheme. Typically, a basic string is generated by a human indexer, and subsequently manipulated by a computer, to produce a multiple-access index to the documents. 16

Synonym is a word or phrase, which has the same or nearly the same meaning as another word or phrase in the same language. 18 See Associative Relationships

T _________________________________    Top

TIFF (Tagged Image File Format) is a format used most often for archiving images. It can sometimes be identified by the file extension .tif. It is NOT Web supported.

Teletext is the textual and graphic information broadcast in the vertical blanking interval between conventional video frames in television signals. It requires a special adapter. 19 See Text Extraction

Text Extraction is the automatic input of associated digital text. This includes closed-captioning, teletext, audio logging and the extraction of captions, signs or other written language that appears within an image. See Closed-Captioning, Teletext, Audio Logging

Thesaurifacet is an indexing tool that combines the alphabetic access of a thesaurus with the hierarchical access of a faceted classification scheme. The two parts complement each other in that the hierarchical relationships are contained in the classification arrangement, while all other relationships are contained in the thesaurus part. Terms may, however, occur in only one hierarchy of the facet component, but secondary relationships can be included by the thesaural relationships. 16

Thesaurus is a vocabulary tool that provides information about the use of terms, and certain relationships between terms. The relationships normally used between terms include the following: broader term (BT), narrower term (NT), use for or synonymous] (UF), related terms (RT), and use (replace this term with some other). 16 See Thesaurifacet, Classification and Faceted Classification

Truncationis the ability to enter the first part of a keyword [in a search], insert a symbol (usually *), and accept any variant spellings or word endings, from the occurrence of the symbol forward. (e.g., femini* retrieves feminine, feminism, feminism, etc.) 20 See Wildcard

U _________________________________    Top

V _________________________________    Top

Value List is a list that provides a form of vocabulary control consisting of pre-determined lists of terms usually used in a drop-down menu. The user selects from these terms instead of typing free text into a field.

Vector Image is an image composed in a geometrical formulation that can be reduced or enlarged without losing quality.

Video Proxy is a digitally encoded video file made from an analog or digital original.

Voice Recognition is a technology that allows a user to use his/her voice as an input device. Voice recognition may be used to dictate text into the computer or to give commands to the computer (such as opening application programs, pulling down menus, or saving work). 21 Also used in audio capture from tape (video or audio) to transcribe speech to text.

W X Y Z ___________________________   Top

Wildcard is a special character that represents one or more other characters. The most commonly used wildcard characters are the asterisk (*), which typically represents zero or more characters in a string of characters, and the question mark (?), which typically represents any one character. 22 Wildcard Characters are commonly used in searching. See Truncation


SOURCES
__________________________   Top

1.   Merriam-Webster OnLine. 2001. Merriam-Webster Collegiate Dictionary. 12 May 2001
<http://www.merriam-webster.com/dictionary.htm>.

2.   Northern Lights Internet Glossary Page. Northern Lights Technology Inc. 12 May 2001 <http://www.northernlight.com/docs/glossary_help_terms_b.html>.

3.   Whatis?com Definitions Page. TechTarget.com, Inc. 12 May 2001
<http://whatis.techtarget.com/definitionsSearchResults/1,289878,sid9,00.html?query=bit%2Brate> .

Top

4.   Search Engine Glossary Page. Search Engine Watch. 12 May 2001
<http://www.searchenginewatch.com/facts/glossary.html>.

5.   ODLIS: Online Dictionary of Library and Information Science. 2001. Western Connecticut State University Libraries. 12 May 2001

6.   Webopedia Online Encyclopedia. 2001. internet.com. 12 May 2001
<http://webopedia.internet.com/TERM/c/case_sensitive.html>.

Top

7. State of Connecticut Glossary of Telecommunications Terms Page. State of Connecticut Office of Consumer Counsel [OCC]. 12 May 2001
<http://www.occ.state.ct.us/phone/glossary.htm>.

8.   American National Standards Institute. American National Standard for Basic Criteria for Indexes: Z39.4. New York: American National Standards Institute, 1984.

9.   Columbia Encyclopedia Online. 6th ed. 2001. The Columbia Encyclopedia. 12 May 2001
<http://www.bartleby.com/65/fu/fuzzylogi.html>.

Top

10. Cyberconstructors.com Search Engine Glossary Page. cyberconstructors.com. 12 May 2001
<http://www.cyberconstructors.com/glossary_e-f-g-h.htm>.

11.   Techencyclopedia Page. The Complete Language Company Inc. 12 May 2001
<http://www.techweb.com/encyclopedia/defineterm?term=keyword>.

12.   Berkeley MPEG Tools Page. Berkeley Multimedia Research Center. 12 May 2001
<http://bmrc.berkeley.edu/frame/research/mpeg/>.

Top

13. MPEG Home Page. Moving Picture Experts Group. 12 May 2001
<http://www.cselt.it/mpeg/>.

14.   Liddy, Elizabeth D. "Enhanced Text Retrieval Using Natural Language Processing." Bulletin of the American Society of Information Science and Technology. April (1998). 12 May 2001
<http://www.asis.org/Bulletin/Apr-98/liddy.html>.


15.   ComputerUser High-Tech Dictionary Definitions Page. ComputerUser.com, Inc. 12 May 2001
<http://www.computeruser.com/resources/dictionary/definition.html?lookup=5337>.

Top

16.   Watters, Carolyn. Dictionary of Information Science and Technology. San Diego, CA: Academic Press: Boston, 1992.

17.   The ALA Glossary of Library and Information Science. Ed. Heartsill Young. American Library Association: Chicago, 1983.

18.   Cambridge Dictionaries Online. 2001. Cambridge University Press. 12 May 2001
<http://dictionary.cambridge.org/define.asp?key=synonym*1%2B0>.

Top

19.  Glossary of Telecommunication Terms Page. Wiresystems.com. 12 May 2001 <http://www.wiresystems.com/definitons.htm>.

20.  UC Berkeley Library Internet Resources Glossary Page. University of Berkeley Library. 12 May 2001
<http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Glossary.html>.

21.  ATRC Technical Glossary Page. University of Toronto Adaptive Technology Resource Centre [ATRC]. 12 May 2001
<http://www.utoronto.ca/atrc/reference/tech/voicerecog.html#description>.

Top

22.  Whatis?com Definition Page. TechTarget.com, Inc. 12 May 2001
<http://whatis.techtarget.com/definition/0,289893,sid9_gci213935,00.html>.

23.  DePew, John N. A Library, Media, and Archival Preservation Handbook. Santa Barbara: A B C-CLIO, 1991.

Return to top of page

 


Home | About Us | Accomplishments | Activities | Bibliography | Employment | Glossary | Indexing | Links | News | Products | Services | Support | Workshops