To browse this glossary,
click the corresponding letter of the alphabet below
A
| B | C | D | E | F | G | H | I | J | K | L | M |
N
| O | P | Q | R | S | T | U | V | W | X | Y | Z
A_________________________________ Top
Antonym is a word of
opposite meaning. 1 See Associative
Relationships
Archive is a central
place to store and maintain records and historical materials (regardless
of format) created by an organization, government or individual(s).
Associative Indexing is a method of automatic
indexing that augments the terms found in documents with related
terms obtained from a term association map. A term association map
is a vocabulary tool that shows the similarity between terms based on
the co-occurrence of the terms in the database documents. 16
Associative Relationship demonstrate a link between
terms in a hierarchical thesaurus, but is not
part of the hierarchy. See Antonym, Synonym and Related Term
Audio Logging is the transcribing
of the spoken work into text using voice recognition technology. See
Text Extraction
Automatic Indexing is the use of algorithms
(software) to analyze the contents of records, such as bibliographic
entries, and assign keywords that represent the content of the given
database entries. The techniques used to determine
appropriate keywords from the contents of database
entries include phrase detection, thesaural lookup, linguistic analysis, statistical
analysis, and term occurrence probabilities. 16
B_________________________________ Top
Back-file Conversion is an imaging process that converts records into a digital
format. Reasons for a back-file conversion are space savings, preservation, instant access, cost-savings,
and legal requirements.
Bandwidth is the rate at which data can be
transmitted over a line or network connection. 2
Bit is short for "binary
digit" and is the smallest unit of information in a computer system.
It has one of two values: on (represented by the number 1), or off (represented
by 0).
Bitmap Image is composed
of bits and is displayed on the screen as pixels.
Bitmap images lose resolution as they are enlarged. A bitmap image
is sometimes identified by the extension .bmp on the filename.
Bit Rate is the number of binary
digits that pass a given point in a telecommunication network in a given
amount of time, usually a second. The term bit rate is a synonym for data transfer speed (or simply data
rate). 3 Generally speaking, when used to refer to digitally
encoded video files, the higher the bit rate, the higher the quality
of the image. A 56 kb/s file is suitable for World Wide Web display
of transmission on modem lines. Comparatively, a 5 Mb/s file is suitable
for VHS.
Boolean Search is a search allowing the
inclusion or exclusion of documents containing certain words through
the use of operators such as AND, NOT and OR. 4
Broader Term (BT or B)
is a term in a subject thesaurus or indexing
system which includes as a more specific subclass the descriptor or
subject heading under which it is listed (example: "Libraries" listed
under "School libraries"). A descriptor or subject heading may have
more than one broader term. Compare with narrower term and related term. 5 See Hierarchical Relationships
Byte is a string of eight bits, which is
the number needed to store one character such as a letter or a number.
It is also the standard measurement unit of a file.
C_________________________________ Top
CD-ROM (Compact-Disk-Read-Only-Memory) is
a type of optical disk that stores data up to 640 Megabytes (MB). The
designation "read-only" means once stamped with information, the disk
can only be viewed and not written over.
CGM (Computer Graphics Metafile) is the ANSI
(American National Standard Institution) standard for vector and bitmap images.
CMYK (Cyan, Magenta, Yellow, and Black) is a subtractive color model used for printing
on paper.
Case Sensitive [Sensitivity] is a program's
ability to distinguish between uppercase (capital) and lowercase (small)
letters. Programs that distinguish between uppercase and lowercase are
said to be case sensitive. A case-sensitive program that expects you
to enter all commands in uppercase will not respond correctly if you
enter one or more characters in lowercase. It will treat the command
RUN differently from run. Programs that do not distinguish between uppercase
and lowercase are said to be case insensitive. 6
Cataloging is the "preparation of bibliographic
records which entails recording descriptions and determining all points
of access to the record." Cataloging rules are spelled out in Anglo-American
Cataloging Rules (AACR2). MARC (machine-readable cataloging) format
records are used as the standard in recording electronic bibliographic
data. 23
Chain Indexing is a subject categorization
scheme in which terms describing objects (typically documents) are linked
or chained together on the basis of a set of rules in a hierarchical relationship. 16
Classification
is the process of grouping like terms into classification groups or
classes. The classes may exhibit a variety of properties, such as monothetic,
polythetic, exclusive, overlapping, ordered, and unordered. In textual
systems, one generally classifies either the documents into classifications
groups or the keywords into like groups. 16
Closed Captioning is a
service for persons with hearing disabilities that translates television
program dialog[ue] into written words on the television screen. 7.
Closed captioning is not visible without the use of a specially installed
decoder. See Text Extraction
Clustering is the grouping of items in a
database so that members of a cluster exhibit
similarities to each other and dissimilarities to other clusters. In
retrieval systems, the items in a cluster are often retrieved together
in response to a query. For each cluster a composite item, called the
centroid, can be generated that represents the cluster and is used as
the basis for retrieval of the cluster. There are two classes of clustering
methods: hierarchical methods (such as Ward's Method),
which produce a nested set of clusters, and non-hierarchical
or partitioning methods (such as single-link methods), which produce
a single layer of clusters. Clusters may overlap (i.e., some items may
occur in more than one cluster). 16
Concordance, or inverted
list, is a data structure for indexing textual data records by the substantive
terms or keywords associated with each record. The inverted
list is an index for each keyword containing the location of each occurrence
of the keyword in the database. 16
Conservation is the treatment of maintaining
archival materials to stabilize them chemically or strengthen them physically,
sustaining their survival as long as possible in their original form.
23
Content is the digital representation
of a multimedia object. Content may be a still, moving or 3-D image;
a text document; an audio file or a compound document (compound documents
consist of more than one page, or combine video and audio content).
Information about the physical or intellectual characteristics of content
is called metadata.
Content Attribute is a physical characteristic
of a multimedia object. An attribute may include physical dimension,
pattern, chroma, file format, resolution, color mode, and/or compression.
Controlled Vocabulary is a listing of terms that specifies
which terms may be used for indexing and the relationships among them,
produced by the process of vocabulary control. 8
Coordinate Indexing is a method of post-coordinate indexing based on the assignment
of keywords that capture the concepts covered by
a document. These keywords (or word symbols) are then used as the
basis for subsequent retrieval of the records. 16
Copyright is the exclusive right to publish
or sell a book, composition, photograph, work of art, software program,
etc. The government for a defined time period grants this right.
D_________________________________ Top
DPI (Dots-Per-Inch) is a
measure of the resolution. It can refer to a printer, scanner,
or monitor. It is the number of dots in a one-inch line. The more dots
per inch, the higher the resolution.
Database is a program
that captures data for the purpose of being able to manipulate its order,
retrieval, input, output, etc.
Digital Archive is an ideal; a focal point
database and storage facility for all formats
and media of an organization's institutional knowledge and history.
Digitize is to make digital. Digitization
transforms paper or analog documents to electrical/computer or digital
format.
E _________________________________ Top
Encoding is the process by which analog tape
is converted into a digital file. The file can be preserved in a number
of different formats. RM is a standard format for real media files.
MPEG is another standard. Time code is converted
from a SMTPE time code into a proxy time code. SMTPE time code from
videotape is measured in 30 frames per second, while proxy time code
is measured by milliseconds. This can create a number of difficulties
when indexing proxy files.
F_________________________________ Top
Face Recognition is a type
of pattern recognition specific to identifying
human faces.
Faceted Classification is a style of pre-coordinate subject description (typified by Universal
Decimal Classification and Ranganathan's Colon Classification) which
provides a flexible system for generating controlled vocabulary subject classification. Techniques and guiding
principles are used to build up the vocabulary and the relationships
among the terms of the vocabulary rather than a hard and fast classification scheme of subject headings.
16
Fuzzy Logic is a multi-valued
(as opposed to binary) logic developed to deal with imprecise or vague
data. Classical logic holds that everything can be expressed in binary
terms: 0 or 1, black or white, yes or no; in terms of Boolean algebra,
everything is in one set or another but not in both. Fuzzy logic allows
for partial membership in a set, values between 0 and 1, shades of gray,
and maybe-it introduces the concept of the "fuzzy set." 9
Fuzzy Logic Search is a search that matches
close approximations of words. These can include matches from misspelled
words or words hidden within other words. For example, position could
be positioning, positioned, or possibly positions. 10
G_________________________________ Top
GIF (Graphics Interchange Format) is a bitmap
file format for graphics sometimes identified by the .gif extension
on the filename. GIF images support up to 256 colors.
H ________________________________ Top
Hierarchical Relationships is a structured organization of
controlled vocabulary terms in association
with other terms or categories of terms. Examples of narrower
terms in succession are: Animals; Mammals;Dogs; Working Dogs; German
Shepherds. See Narrower Term and Broader Term
I _________________________________ Top
Imaging is a process, not a product.
It combines technology and information management to create a digital
archive.
Imaging Technology is the hardware and software
needed to create digital files from analog material. The technology
is divided into input devices and output devices.
Index is an auxiliary data structure
used to speed up access to a data set (e.g., a file of records) in which
a pointer to each record of the data set is stored. The pointers in
the index are accessed on the basis of a key value of each record. The
index may actually contain the key values and the pointers, or the key
may be used to generate the address of the pointer in the index, perhaps
by hashing. The indexing can be used both to provide an order to the
data records and to provide direct access to records in the data set.
16 An index is NOT a list of the all the terms
in a data set. See Concordance
Input (Imaging) Hardware includes the scanner,
monitor/display device, computer network or PC, and storage device.
Imaging software comprises capture software that
works with the scanner, Optical Character Recognition software, system
(which is also connected to the Information Management system), and
compression software.
J _________________________________ Top
JPEG (Joint Photographic Experts Group) is
a graphics storage format identified by the extension .jpg on the filename.
The JPEG format uses Lossy compression, which is a compression technique
that results in lost data.
JPEG 2000 is the ISO (International Organization
for Standardization) answer to an integrated standard format. The proposed
format encapsulates the Digital Imaging Groups Flashpix's features of
independent-resolution, independent size, metadata, and an unambiguous color model.
K_________________________________ Top
KWIC (KeyWord In Context) is a simple printed
index for textual material in which keywords in the text are sorted alphabetically
and presented linearly, surrounded by portions of the preceding and
following text for context. 16
KWOC (KeyWord Out of Context) is a simple
printed index for textual material in which keywords found in the text are sorted alphabetically
and presented linearly, followed by the original string for context.
Sometimes the keyword is replaced by a character such as "^"
in the context string. 16
Keyword is 1] a word
used in a text search 2] a word in a text document that is used in an
index to best describe the contents of the document, and 3] a reserved
word in a programming or command language. 11
Key Frame is a snapshot image of each of
the scene changes. Key frames may be automatically generated at specified
intervals or scene changes or manually generated by the user. A series
of consecutive key frames is called a storyboard.
L _________________________________ Top
Latent Semantic Index is an approach to automatic indexing that is based on the assumption
of an underlying association or correlation of the terms used in documents,
and the content of the documents for retrieval purposes. Most of the
techniques used to determine the relevant associations begin with the
term occurrences from which term similarities and term associations
can then be calculated. 16
Licensing is the act of giving formal legal
permission to reproduce or sell a copyrighted work.
Lossy Compression is a
compression technique that loses some data during compression and file
restoration.
M ________________________________ Top
MPEG (Moving Picture Experts Group) is a digital video Lossy compression format. It can
be identified by the file extension .mpg
MPEG-1 is a coding of moving pictures and
associated audio for digital storage media at up to about 1.5 Mbit/s….
the standard on which such products as Video CD and MP3 are based….
13
MPEG-2 is designed to produce higher quality
images at higher bit rates (e.g., 720x485 studio quality CCIR-601 images
at up to 1.5 Mbits/sec). 12
MPEG-7 is the content representation standard
for multimedia information search, filtering, management and processing
(to be approved July 2001). 13
Mapping [field] is used to identify where
metadata will be placed during data migration.
The contents of a field in one database are designated for placement into a
specified field in another database. The fields may or may not have matching
names.
Media Archive is a type of archive that specializes in storing and maintaining
media materials such as photography, digital files, graphics, sound
recordings, video and film, etc.
Media Assets are the digital and traditional
media formats (video, files, film negatives, audio, photographic prints,
slides, graphic materials), that have resale or reusable value for an
institution.
Mediagraphical is any integrated multimedia
information system that includes a wide range of data types, such as
audio, graphical, textual, and pictorial. 16
Meta is a prefix often used in information
science terminology, such as "metadata" and meta knowledge. Meta X means "X
about X", so that metadata means "data about data" (e.g., data
dictionaries), and meta knowledge means "knowledge about knowledge"
(e.g., knowledge structures). 16
Metadata is the data
that describes, for purposes of retrieval or classification, the overall architecture
and format of the document or file. It is used extensively on the World
Wide Web. Recently, Metadata has been linked with imaging
and indexing as a way to create a standard information infrastructure
for multimedia databases.
Micron is an icon that
contains a moving video clip that allows the user to select from a variety
of videos based on a sample of each video. Microns could be used to
represent other dynamic data sets such as simulations. 16
Moving Image Document (MID) is a database item defined as a matrix of sequential
phenomena (i.e. a string of images plus strings of sounds) that is synchronized
by time (a micron). A major difficulty in retrieval of MIDs
is the lack of easily defined units, such as text, which can be used
for indexing or abstracting purposes. 16
N _________________________________ Top
Narrower Term (NT or N) is a term in a subject thesaurus or indexing system which is a subclass
of the descriptor or subject heading under which it is listed (example:
"Picture Books" under "Books"). A descriptor or subject heading may
have more than one narrower term. Compare with broader
term and related term. 5
See Hierarchical Relationships
Natural Language Processing (NLP) is a range
of computational techniques for analyzing and representing naturally
occurring texts at one or more levels of linguistic analysis for the
purpose of achieving human-like language processing for a range of particular
tasks or applications. 14
O_________________________________ Top
Object-oriented Database is a database management system that facilitates the
management of objects rather than records. Viewing the data as objects,
instead of as records, provides more flexibility in the data types used
and removes the need to normalize the data. Since objects can contain
other objects as sub-components, these databases can implement inheritance hierarchies.
16
Optical Character Recognition (OCR) is software that works in tandem with
the scanner and "recognizes" the characters the scanner "sees" (only
letters and numbers in certain fonts) and converts them from the original
analog format to digital format.
Output (Imaging) is the final function of
the imaging process. Traditionally this means
the printed paper; but in the digital world, output has taken on different
formats and characteristics. The information can be kept as a digital
file, viewed on a monitor, compressed, transferred, or stored. Other
output forms of digital information are printed paper, video, photograph,
negative, slide, or transparency.
P _________________________________ Top
PCX (Picture Image) is a bitmap file format
supported by many programs. It was developed by Microsoft Paintbrush
and can sometimes be identified by the file extension .pcx.
POPSI (Postulate-based Permuted Subject Index)
is a string indexing algorithm based on Colon Classification,
a faceted classification scheme used widely in India.
The indexer assigns the appropriate faceted description and the string
permutations of that description are generated automatically for the
index. 16
PRECIS (Preserved Context Indexing System) is
a string indexing system developed by Derek Austin
at the British Library in the early 1970s for subject indexing of the
British National Bibliography. The terms in the PRECIS string are arranged
and connected by relationships found in the original text or context,
rather than from classification scheme. The human indexer
chooses the base string (i.e. terms and the relationships) and the subsequent
permutations of the base string are performed by a computer. 16
Pattern Recognition is
a branch of artificial intelligence concerned with the identification
of visual or audio patterns by computers. For the computer to recognize
the patterns, the patterns must be converted into digital signals and
compared with patterns already stored in memory. Some uses of this technology
are in character recognition, voice recognition, handwriting recognition,
and robotics. 15 Pattern recognition may also be used in image
applications for the analysis of image composition (looks-like-this
functionality) or face recognition.
Post-coordinate System is
a style of indexing in which the relationships of the indexing terms
and database entries are not fixed at the time
an entry is added to the database, but rather the user can combine and
manipulate the indexing terms at query time. The Boolean
combination of keywords drawn from the full text of records that
is common to most information retrieval systems is an extreme example
of post-coordinate indexing. 16
Pre-coordinate System is
an indexing method that establishes, at the time an entry is added to
the database, the access points for that entry
(typically bibliographic). 16
Preservation is
the activity associated with maintaining archival materials for use,
either in their original physical form or in some other format. 23
Proximity Operator is an operator sometimes
used in conjunction with Boolean queries to approximate the specification
of phrases by requiring that given keywords occur within a certain number of words.
To process this operator, either a word position must be kept for each
occurrence in the inverted file or a scan of the text of the selected
records must be made at retrieval time. 16
Q_________________________________ Top
R ________________________________ Top
RGB (Red, Green, and Blue) is a color scheme
for screen display. RGB images can be converted to CMYK for printing.
Re-purposing is the sharing and reusing of
digital and traditional media that was originally intended for graphic
designs, ads, manuals, and Web sites. Institutions now recognize the
value of their media assets and the need to synthesize their digital
images into a collective central digital archive.
RelatedTerm (RT or R)
is a term in a subject thesaurus or indexing system which has a close
conceptual relationship to the descriptor or subject heading under which
it is listed, but is not related hierarchically (example: "Media Specialists"
listed under "School Libraries"). A descriptor or subject heading may
have more than one related term. Compare with broader
term and narrower term. 5
See Associative Relationships
Relevancy is the degree to which a retrieved
item is judged by the user to be "about" the topic of the query. Relevancy
evaluations tend to be subjective as they depend on the user's perception
of the relatedness of the given item to some information
need, which may or not, in fact, have been expressed entirely by the
query. Relevancy is used in most measures of retrieval effectiveness
such as precision and recall. 16
Resolution is often
referred to as dpi (dots-per-inch). The dpi
of an image is measured as the number of rows by columns. The greater
the number of dots, the higher the resolution.
S _________________________________ Top
Search Term is a term expressing an information
need or query in the language and format acceptable to the specific
system [in an information retrieval system]. Search terms may be combined
to form a search statement. 17
String Indexing System is
a form of document indexing characterized by a string or set of indexing
terms for each entry. The terms within each string are connected by
relationships to each other according to a set of rules for the particular
scheme. Typically, a basic string is generated by a human indexer, and
subsequently manipulated by a computer, to produce a multiple-access
index to the documents. 16
Synonym is a word or
phrase, which has the same or nearly the same meaning as another word
or phrase in the same language. 18 See Associative Relationships
T _________________________________ Top
TIFF (Tagged Image File Format) is a format
used most often for archiving images. It can sometimes be identified
by the file extension .tif. It is NOT Web supported.
Teletext is the textual
and graphic information broadcast in the vertical blanking interval
between conventional video frames in television signals. It requires
a special adapter. 19 See Text Extraction
Text Extraction is the automatic
input of associated digital text. This includes closed-captioning, teletext,
audio logging and the extraction of captions, signs
or other written language that appears within an image. See Closed-Captioning, Teletext, Audio Logging
Thesaurifacet is
an indexing tool that combines the alphabetic access of a thesaurus with the hierarchical access of a faceted
classification scheme. The two parts complement each other in that
the hierarchical relationships are contained
in the classification arrangement, while all other
relationships are contained in the thesaurus part. Terms may, however, occur in
only one hierarchy of the facet component, but secondary relationships
can be included by the thesaural relationships. 16
Thesaurus is a vocabulary
tool that provides information about the use of terms, and certain relationships
between terms. The relationships normally used between terms include
the following: broader term (BT), narrower term (NT), use for or synonymous]
(UF), related terms (RT), and use (replace this
term with some other). 16 See Thesaurifacet, Classification and Faceted
Classification
Truncationis the
ability to enter the first part of a keyword [in a search], insert a symbol (usually
*), and accept any variant spellings or word endings, from the occurrence
of the symbol forward. (e.g., femini* retrieves feminine, feminism,
feminism, etc.) 20 See Wildcard
U _________________________________ Top
V _________________________________ Top
Value List is a list that provides a form
of vocabulary control consisting of pre-determined lists of terms usually
used in a drop-down menu. The user selects from these terms instead
of typing free text into a field.
Vector Image is an image
composed in a geometrical formulation that can be reduced or enlarged
without losing quality.
Video Proxy is a digitally encoded video
file made from an analog or digital original.
Voice Recognition is a technology that allows
a user to use his/her voice as an input device. Voice recognition may
be used to dictate text into the computer or to give commands to the
computer (such as opening application programs, pulling down menus,
or saving work). 21 Also used in audio capture from tape (video
or audio) to transcribe speech to text.
W X Y Z ___________________________ Top
Wildcard is a special
character that represents one or more other characters. The most commonly
used wildcard characters are the asterisk (*), which typically represents
zero or more characters in a string of characters, and the question
mark (?), which typically represents any one character. 22
Wildcard Characters are commonly used in searching. See Truncation
SOURCES __________________________ Top
1. Merriam-Webster OnLine. 2001. Merriam-Webster
Collegiate Dictionary. 12 May 2001
<http://www.merriam-webster.com/dictionary.htm>.
2.
Northern Lights Internet Glossary Page. Northern
Lights Technology Inc. 12 May 2001 <http://www.northernlight.com/docs/glossary_help_terms_b.html>.
3. Whatis?com Definitions Page. TechTarget.com,
Inc. 12 May 2001
<http://whatis.techtarget.com/definitionsSearchResults/1,289878,sid9,00.html?query=bit%2Brate>
.
Top
4. Search Engine Glossary Page. Search
Engine Watch. 12 May 2001
<http://www.searchenginewatch.com/facts/glossary.html>.
5. ODLIS: Online Dictionary of
Library and Information Science. 2001. Western Connecticut State
University Libraries. 12 May 2001
6.
Webopedia Online Encyclopedia. 2001. internet.com. 12
May 2001
<http://webopedia.internet.com/TERM/c/case_sensitive.html>.
Top
7.
State of Connecticut Glossary of Telecommunications Terms Page. State
of Connecticut Office of Consumer Counsel [OCC]. 12 May 2001
<http://www.occ.state.ct.us/phone/glossary.htm>.
8. American National Standards Institute.
American National Standard for Basic Criteria for Indexes: Z39.4.
New York: American National Standards Institute, 1984.
9. Columbia Encyclopedia Online.
6th ed. 2001. The Columbia Encyclopedia. 12 May 2001
<http://www.bartleby.com/65/fu/fuzzylogi.html>.
Top
10.
Cyberconstructors.com Search Engine Glossary Page. cyberconstructors.com.
12 May 2001
<http://www.cyberconstructors.com/glossary_e-f-g-h.htm>.
11. Techencyclopedia Page. The Complete
Language Company Inc. 12 May 2001
<http://www.techweb.com/encyclopedia/defineterm?term=keyword>.
12.
Berkeley MPEG Tools Page. Berkeley Multimedia Research
Center. 12 May 2001
<http://bmrc.berkeley.edu/frame/research/mpeg/>.
Top
13.
MPEG Home Page. Moving Picture Experts Group. 12 May 2001
<http://www.cselt.it/mpeg/>.
14. Liddy, Elizabeth D. "Enhanced Text
Retrieval Using Natural Language Processing." Bulletin of the American
Society of Information Science and Technology. April (1998). 12 May
2001
<http://www.asis.org/Bulletin/Apr-98/liddy.html>.
15. ComputerUser High-Tech Dictionary
Definitions Page. ComputerUser.com, Inc. 12 May 2001 <http://www.computeruser.com/resources/dictionary/definition.html?lookup=5337>.
Top
16.
Watters, Carolyn. Dictionary of Information Science
and Technology. San Diego, CA: Academic Press: Boston, 1992.
17. The ALA Glossary of Library
and Information Science. Ed. Heartsill Young. American Library Association:
Chicago, 1983.
18. Cambridge Dictionaries Online.
2001. Cambridge University Press. 12 May 2001
<http://dictionary.cambridge.org/define.asp?key=synonym*1%2B0>.
Top
19.
Glossary of Telecommunication Terms Page. Wiresystems.com.
12 May 2001 <http://www.wiresystems.com/definitons.htm>.
20.
UC Berkeley Library Internet Resources Glossary Page. University
of Berkeley Library. 12 May 2001
<http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Glossary.html>.
21.
ATRC Technical Glossary Page. University of Toronto Adaptive
Technology Resource Centre [ATRC]. 12 May 2001
<http://www.utoronto.ca/atrc/reference/tech/voicerecog.html#description>.
Top
22.
Whatis?com Definition Page. TechTarget.com, Inc. 12 May 2001
<http://whatis.techtarget.com/definition/0,289893,sid9_gci213935,00.html>.
23.
DePew, John N. A Library, Media, and Archival Preservation
Handbook. Santa Barbara: A B C-CLIO, 1991.
Return to top
of page