Data models vary in both complexity and richness. However, all data models
are equivalent as far as their ability to model information is concerned.
What is more important as far as selecting a model is concerned is matching
the inherent structure of the problem being modeled. This structure varies
as the problem is investigated and refined. In the beginning, when little
is known about what the final model will be, the simplest, most
flexible and least structured scheme provides the greatest freedom of
expression. As time passes, it becomes more important to fashion the
model using a scheme that closely matches the final implementation.
Text file
Althought it's not a typical approach, this stems from the premise that
all information can be stored in written format. As data storage is
concerned, the text file is unsurpassed in the quality and variety of
information that may be stored. A text editor can be used to access/update
information. There are no restrictions on what kinds of data may be stored or
how long the individual data items may be.
No one usually
thinks of a text file as a data base. However, the scheme shows the
basic notion of storing information in files, and text files capture the
notion of holding the information as an integral whole. Other schemes look
at information in other ways. |
Flat file
If text files capture information as the whole loaf of bread, then
flat files capture information as slices of the loaf. Loosely, flat files
view information as an ordered collection of "records." Well-defined
subsets of each record are known loosely as "fields."
However, there might only be a single field for each record.
Flat file fields may
contain data of any data type. There is no requirement for fields
to be named.
While record and fields may be fixed in size, there is no
requirements for records or fields to be all the same size. Variable-length
records and fields are usually implemented with some kind of
separator between items. Some simple implementations of flat
files are stored in text files with each record constituting a single line of
text separated by end of line characters (linefeed characters or
return-linefeed character pairs).
No requirements for records to be all the same type (that is, to contain
the same kinds of fields in the same order). However, mixing record types
with a single file is usually done to implement some other kind of data
model.
The order of the records may be important within a flat file. The record
"key" (a way to distinguish individual records from others in a collection)
is usually the index of the record into the file. Duplicate records
(records that contain the same content) are acceptable.
Flat files may be implemented in either text or
binary form. There is no requirement to make flat files
text-editor readable.
Examples of flat files include shopping lists,
configuration files and single-page spreadsheets.
|
Hierarchical
In flat files, records are deemed to exist at the same level.
Hierarchical data files permits records to be grouped together. This allows
a superior-subordinate or parent-child relationship (a single one-to-many
relationship) to be defined between records. In simple forms,
the superior or parent records are used to collect information
that is common to all the subordinate/child records of the
same group. This has an immediate effect of reducing redundancy with the
data base.
Record "key" usually includes the "key" of all superiors/parents. Keys
tend to recognize the order of a particular record within a group rather
than within the data file. Duplicate records are acceptable.
Examples: topical outlines,
organizational telephone directories, Microsoft Windows INI files, Microsoft
Windows registry, automated report formats. Hierarchical
data bases have a natural alignment with report
formats, even if the information is being extracted for data bases
implementing other models.
Continuing with the staff of life analogy, while flat files look at
information in slices, hierarchical data bases look at information as a
ordered collection of different kinds of slices, like a club sandwich.
|
Network (CODASYL)
 While hierarchical data
bases emphasize the use of a single "path" to access all records, network
data bases may provide multiple paths to locate individual
records and sets of records. The term "network" has little to
do with communications between computers. Instead, the
"network" refers to the ways in which records may reference
other records. The term "CODASYL" (from the abbreviated name of the group
that formalized the use of such data bases) is often used, but it does not
connote the multi-relationship nature and is usually inaccurate.
Records and fields tend to be fixed in size, but are storable at arbitrary
locations within the data file (or files). The record "key" represents a
physical location (e.g., file, block, and offset) in the data-base file
structure.
Network data bases allows records to participate in multiple
relationships or sets. Performance tends to be extraordinarily efficient.
A well-designed network data base would permit application programs and
extemporaneous queries simply to follow key links between records, without
having to look individual records up through a separate index or
directory.
Because of the efficiency, this model tends to be followed in DBMS
implementations (in constrast with DB implementations) for disk storage.
However, the network-like structure usually is hidden, quite deliberately,
from DBMS users.
If hierarchical data bases look at information like an
assembly-line sandwich, network data bases look at information as a food
service table in a delicatessen where sandwiches may be ordered custom-made
and substitutions are welcomed. There's more than one way to make a
sandwich.
|
Relational
Before the relational data model, existing data models had no
particular good way to separate the conceptual designs from
implementations. Pre-relational models depended upon being able to
determine explicitly where and how individual records were stored. Early
relational proponents argued that the relational data model viewed
information logically rather than physically, but this
is not quite correct. Earlier data models associated the logical and
physical aspects of information together; logically-related information was
stored in physical proximity within a data file. The relational data model
first separated the logical from the physical aspects.
The relational data model looks at information as an unordered
collection of "relations." Each relation is populated with
unordered "tuples" of the same unordered "field" structure.
Fields may only contain
values of a well-defined ("atomic") domain or the null value.
The unordered aspect needs to be emphasized. For expository purposes,
relations are often viewed as "tables". The tuples constitute the "rows" of
the table; values for a specific field constitute "columns". However,
the "table data model" tends to impose a very non-relational ordering on both
tuples and fields. Relations are an abstraction of how data is
stored; tables are just one of many possible implementations.
Some of the relational terms are crafted to emphasize the distinction
between logical and physical features, to avoid confusing one concept with
another. However, vocabulary leakage from other disciplines has sprinkled
into the conversation of relational proponents. There is a strong tendency to
refer to an individual tuple/row as a "record" because collections of fields
in other models are called records. "Attribute" is often used synonomously
with field.
To be sure, "unordered" implies neither "chaotic" nor "random".
Relations and Fields are named uniquely and identified easily.
Distinguishing between tuples is more subtle since the order is
not pre-defined. Rather than depending upon relative (as in hierarchy) or
absolute (as in network) locations, tuples may only be
differentiated according to their contents.
Consequently, duplicate tuples are not permitted within a single
relation. Even more strongly, distinct tuples must have a unique "key" (some
combination of a relation's named fields). The set
of minimal keys includes one "primary key"; the rest are "candidate keys".
Within a tuple, references to other tuples are expressed as a "foreign
key," which should contain the values of the referenced tuple's primary key.
Relational theory provides a firm mathematical foundation for data
management. Set theory could be applied to relations using relational
algebraic operations (union, intersection, join, projection, etc.).
Assertions about the existence or non-existence of some condition with a data
base could be proven with a rigor unachievable with earlier models.
Implementation of vendor-specific RDBMS has created some confusion
about what features are required in a relational model. Most vendor RDBMS
take a decidedly "table-oriented" view that is not strictly relational.
Among the delicatessens of data management, relational data models
represent an open buffet with trays of breads, meats and cheeses ready for
customers to make their own sandwich.
|
Entity-Relationship (E-R)
 The abstractness of the relational data
model was an essential part toward eliminating the reliance of data models
upon machine implementations. However, the abstractness also obscured how
tuples in one relation were associated with tuples in other relations. In
some ways, the entity-relationship model (or ERM, as long as you don't
confuse it with enterprise resource management) can be viewed as an
extension of the relational model where the associations between relations is
made explicit. Relational purists have suggested that ERM is totally
unnecessary. Just the same, the entity-relationship model exists independently
of the relational model and should be judged on its own merits.
The ERM approach presumes that all information can be stored in entities and
relationships between entities. A "entity" is similar to a relation (of the
relational data model), except that any references to other entities is
removed. This would include all foreign keys definitely and may include
other association information as well. What constitutes a tuple in the
relational model is called an "entity instance" by ER purists. In
practice, both schematic entities and entity instances are considered
"entities."
It's in the area of relationships where the ERM approach takes a decidedly
different twist from the relational. For one thing, it's important to
distinguish between the relational "relation" (a collection of tuples with the
same structure) and the ERM "relationship" (an explicit association
between two entities). The closest parallel to an ERM relationship in the
relational model is the notion of foreign keys. In ERM, relationships are
much richer. For instance, relationship are characterized by
cardinality, whether a relationship is one-to-one, one-to-many,
or many-to-many.
In practice, many ERMs are
implemented as relational models within an RDBMS. This has been a
remarkably successful approach, but has been known to the blur the
distinction between ERM and Relational models.
If the relational model is like a buffet of different sandwich ingredients,
ERM is like a picture of a sandwich on the wall over the buffet that reminds
users how the sandwich is supposed to look.
|
Object-Oriented (O-O)
It should not be surprising whatsoever by now that the OO approach to
data modeling is that all information can be stored in objects. The problem
arises in trying to get OO proponents to agree upon exactly what an "object"
is supposed to be. Objects tend to be defined at a very general level.
There are many things that an object could be; many features that
may be implemented. There are relatively few that an object is
required to be.
For instance, in some ways, an object looks a lot like an "entity
instance" from the ERM or a "tuple" from the relational model. Objects
may also include some kind of behavior that manipulates its fields or
attributes. On the other hand, an object need not have any such behavior,
which makes entities and tuples perfectly good examples of objects.
Unfortunately, this provides a distorted notion of what objects are to
practitioners who have mastered the ERM or relational model. It would
quickly raise questions about the value of OO approaches if it did not
seem to provide anything different from what could be done with existing
models.
Consequently, the principles for the OO data model are not as
well-established as for other models. Even the vocabulary
is sometimes at odds. There are at least two distinct
viewpoints for OO that have emerged: an analysis view that
defines a class as the common intersections of features shared
by distinct objects, and the development view that
defines a class as a blueprint for instantiating objects with
common features. To confuse the issue further, another definition of class
means the collection of all objects either instantiated from the
same blueprint (development view) or that happen to possess the same shared
properties (analysis view). Implementation of OO programming languages
(where OO has been remarkably successful) has created some confusion about
what features are needed in an OO data model. Programming languages are
definitely in the development camp.
OO masters deem something to be an object because it is useful to
consider that thing as if it were an object. Objects include state (attributes
or fields) and dynamic behavior (methods). Object behavior is triggered by
receiving a "message" or "event". These triggers may originate
externally or within the object. In regards to data models, some value
has been shown to use objects as shown in the following table:
| Object may be... |
| ...a value in a
field | Binary large objects (blobs). OO programmers consider
blobs to be "snapshots" of some object instantiated in a computer's memory.
OO data modellers consider the blob to be the object and a clone of
that object may be constructed transiently in memory to implement
behavior.
Some RDBMS vendors implement this alone and then advertise
that they support OO. This is technically correct, but it's about
as useful as equating tuples with objects. Prospective RDBMS users should look
closely at the features to determine whether the RDBMS OO means the same to
them. |
| ...a field type
(or column) | One of the hallmarks of OO is the
ability to create user-defined types. Most RDBMS permit only a limited
number of well-defined types for fields (TEXT, NUMERIC, VARCHAR,
DATE, CURRENCY, etc.). |
| ...a tuple or
row | Objects may include complex states using variables
filled with simple values. Objects without significant behavior
defined are little more than wrappers around some record of
information. The behavior usually enforces consistency constraints
when the state is changed. |
| ...a relation or
table or view | An object may incorporate information contained
within a whole relation (base or derived) |
| ...a whole data
base | An object may incorporate information
about all relations contained within a data base. The notion
that an object is like a miniature data base goes a long way
to characterize the complexity that objects are able to
encapsulate. |
The OO approach relies upon a number of features that have become recognized
at defining how OO is different from other approaches.
| OO features |
| encapsulation | Objects form natural
collections of state variables and the behavior needed to manipulate
such variables on an object-wise basis. |
| inheritance | Different objects can be defined in a
way that allows common parts to be shared rather than duplicated. |
| persistence | Once an object is constructed,
the object exists until it is no longer required. This feature is
not as important in programming languages where objects may be
either persistent or transient (the same as other types of
variables). |
| polymorphism | Different objects may
exhibit different reactions to the same message invocation.
This permits objects that are known to respond to the same
set of messages to be implemented in different ways. |
OO practitioners have identified a few fields that are valuable but not
always essential in data modeling efforts. One is a reference to the
object's class (but there is disagreement about what a class is). Another is
an object's "key," system-wide unique identifier. The key is a little
controversial; relational theory abhors system-generated values, but objects
may be so large in scope that it's not reasonable to address their contents
all at once.
Examples include composite word-processing documents with embedded
graphics. The OO approach always tends to make situations that involved
complicated sequences of data types more manageable.
The flat file model took a loaf of bread and sliced it up in order to make
the bread useful for sandwiches. The OO approach would be to start with
bread-like objects (e.g., biscuits, rolls, crackers, buns, bagels, muffins).
|
Multi-dimensional
The multi-dimensional approach starts with the premise that all
information can be stored in multidimensional spreadsheet-like structures.
This is no surprise; it's been possible to store information into a
two-dimensional spreadsheets ever since flat files were first introduced. The
real story comes when we start trying to recognize that storing information
in multi-dimensional spreadsheet-like structures "makes sense." That is,
there is some value in storing information in that format. It allows the
data modeler to manipulate the information in a useful way that could not be
accomplished as easily with other models.
The multi-dimensional approach may be considered an extension of the
relational model where denormalization has been carried to an extreme.
In addition to the relational data algebraic operations,
operations for "slicing," "pivoting" and other transforms are included.
This make the multi-dimensional approach useful for trend-line analyses and
statistical correlations within data warehouses.
In comparison with ERM and network models, the multi-dimensional approach
is much more flexible about the way the user may want to relate information.
The flexibility costs, however. Generally, ERM will out-perform
multi-dimensional because defined pre-relationships can be optimized for
access. The multi-dimensional approach may not be what you want to
use for routine operations on your data, but becomes more important when you
are trying mine information for those elusive and lucrative nuggets of gold.
Many
multi-dimensional data models are implemented as relational models. Others
are usually implemented in some proprietary network data model. Neither
implementation hampers the use of the multi-dimensional model for conceptual
purposes.
Examples include decision support software for online analytical
processing (OLAP).
Most other data models are going to look at a fruitcake as a sweetbread
with fruit and nut mixture. The multi-dimensional approach says that
sometimes that makes sense, but sometimes you want to be able to look at
fruitcake as a piece of fruit surrounded by sweetbread.
|
Future
There is no reason to expect that all the important data
models that there ever will be have already been engineered. The
evolution of data models has shown remarkable ingenuity on the part of
data modelers to apply foreign disciplines to their craft. When one problem is
solved, modelers compete to optimize some other aspect. It's not a case
of trying to store new kinds of informations; being able to represent all
kinds of information has been fundamental to data models since it was
shown that data could be stored in files. The battle is on to match data
model structure with domain-related problem structure.
For example, the object-oriented paradigm encapsulates both data and
behavior in a way that obscures the difference between the two. For
programmers, this is perfect. It means that you don't have to know how an
object is implemented in order to send an object messages. However,
obscuring information and processes is not always well suited to many
business management practices, where managers prefer seeing the process made
quite visible. It's plausible that in the not-so-distant future, a data
modelling scheme derived from object-oriented principles that models
processes separate from information could become an important aspect of
strategic planning.
Of course, we have no idea what it might be called. Someone will always
be able to create a new kind of sandwich.
|