This is a story about the
underpinnings of what we now call relational databases. It may sound
fancy to remember Codd and Date, the progenitors of the relational
database, are not household names. But even with with these two, the
story goes back several years. This is that story in the outlines,
or at least a version of it, which reaches back not to Codd and
Date, but to Galileo Galileo, and forward to an unknown future.
It is a story of theory, that is how
things ought to be, practice, that is the trade-offs between
different kinds of theory, and structure, that is what needs to be
codified and what will be left for other people to decide. Because,
in programming languages, there is what you need to decide, and what
can be left for other people to decide at a later date.
Before we can begin with the
beginning, we have to start out with what Codd and Date database
actually is, and what has two happen on the computer side. Is it
to be done by the processor, or the database. Because in the end
processor and database are to completely separate objects.
What is the database suppose to do.
Remember that the database is a way of storing structured data in a
binary format, which regards tupples as the primary way of storing
data. You may think that this is obvious, but you would be mistaken.
Several brilliant minds, some you know and some you don't, have made
it obvious through the design that is obvious only in retrospect.
That's why we're going to cover this in detail, so that you will know
what kind of power you're dealing with in a relational database.
There are simpler ways of dealing with data, and my be, for
example size, that you will use one of them.
First of all, we need to say what a
relation to its is, and how we can fake it. Because that is what we
do. Relational database combines a series of tuples across a large
range of rows. This then is key: there are a lot more rows which
contain database generic vs. database specific. In other words, there
are a lot more records than rows. Not in all cases, but most
databases have at least one, and probably more than one. The other
thing that databases have is across between two fields, say people
and companies, and they relate these two fields.
So what did people use before
relational databases came in to being? Usually hierarchical database
design, which organized data by trees, and is still used today. What
the relational database proved is that any hierarchy can be
relational. Even more databases were flat, with one column which was
a tab delimited text file, and a series of records. This meant that
it was relation one sense, because it could have numbers in one
direction, that could easily be changed. But it's numbers in the
other direction were fixed. It had only one set of rows which were
defined by the program. And that is where Codd and Date came in. the
proved that by making just a few changes, they could make any number
of fields. And as we shall show, that made all the difference between
a flat file, and any number of flat files.
And that leads to our next nutcase to
crack. As far as pure relational theorists are concerned, there is no
relational databases, or perhaps one. As far as practicable means,
there are dozens. If your already a computer programmer, this is just
another case out of several. If you are not, it seems like
theoreticians and the programmers are at odds with each other. I
would tell you, there are reasons for this which we will get into at
a later date, and there are widely different ideas. But the main
thrust of the convention, is whether or not this discussion should
even be had. To the theoreticians, it says that no relational
database exists, and they can stay in the towers, dreaming about
things that don't exist. While programmers can argue which model is
close enough for government work, and they have several different
options to choose from. What this says, is are you theoretician or
So the discussion is, is it
relational, and to what degree? If you have two suit all of the ()
you don't have more than one choice, and so I will let you get on
with that choice. You know who you are. For everyone else, you have
to define how relational database you want, and then select a choice
based on that.
So for the theoreticians, Codd and
Date are your keys to the entire world of theoretical databases. You
will have to look back at the ancestors of Codd and Date to see who
the contributors to their theory was. But for the programmers there
are two contributors, Kernighan and Ritchie, the two gentlemen who
created the C programming language. You might think, what does a
programming language do for the relational database construct. It was
these two gentlemen who formulated the hidden language, not what the
relational database model used, but how it was implemented. Without
the relation database model, C would have just been another
programming language. And remember their were lots of programming
C was written in a style called “
the tutorial” which began with an explanation by way of examples.
It then moved to a more broad based explanation and then finally with
a reference manual. Where as Codd and Date had myriad attempts to
explain what they were doing, Kernighan and Richie produced exactly
one manual, and then moved on to different things, only coming back
to the menu when it was time to revisit it for ANSI C, that is, when
it became a standard that needed a revision. Including a trivial
change saying that 8 and 9 are not octal digits.
The reason that see was the language
of programming was because it took the relational database model and
implement as just another way of doing databases. ANSI C didn't care
overly much about what database you were using, that was something to
be decided when you linked in languages. Remember this was a
revelation. The computer language only worried about the processor,
and it made other decisions only about the processor. And that meant
that whether you're database was flat or something else was a matter
for the language, not the processor.
This meant something important. C was
small. This important point needs to be emphasized. C was small. That
meant that anything which could be done in a library, was done there.
The second point of C was it was
developed at the same time as the relation database model. This was
not a coincidence. 1970 was a time where people looked at flat file
and flat file relational databases, and they were at the cusp of
doing things a different way, and they knew it. People do not know
that there are doing things a different way most of the time. But
1970, in the world of computing, the new that there had to be a
better way than Lisp, or Cobol, or any one of a number of other
languages. That is why when you look back years around 1970 produced
two of the seminal pieces of programming logic. These two pieces are
still around, though they had changed, and they have been crowded by
imitators. They even have been superseded, but not forgotten. Not yet
So is there a point to all this? Yes,
there are three points.
One, no matter which database you
select, there will be a compromise on a purely theoretical basis.
Since we are coming to the relational database on theoretical terms,
this is an issue that we must decide. People who are coming to this
from practical terms may spend a few minutes say “ this database is
relational enough, or has a few exceptions that we know about.” Sadly for us, that will not be possible, where defining which things
are possible in which circumstances. Suffice it to say we are the
people that people are looking to for answers when they have
Two, do we want to be relational at
all? This is a serious question, when to we want to allow enough
exceptions that the core of a relational database is not worth the
trouble. Clearly there are reasons to do so, which is why we have
NoSQL, and objective, and other forms rather than relational at all.
Some of these are reformulations of older things that have a purpose
that relational database models are not necessary for. Some of these
are for when there is no need for them, or at least for free small
set of relational database design. Other times, there is so much data
to be crunch through, that the full relational database model is to
ornate. Why bother to do if you don't have to? Why not allow somebody
else to do it? Remember there are thousands of examples, and only a
few will need the full relational database model.
And third, to we want programmers to
know that we have done this? They don't really need to know, in most
cases. I will give you an example. If you assign a number to
something, that number will come around and have two cases which one
case is zero, and the other case is some large number which isn't
supposed to be going from positive to negative, but does so anyway.
In what cases to say need to know about this other number? And that
will actually change. For example, what about sex, as in male or
female? In 1960, the way you determined whether people needed to know
or not, was were they part of the medical establishment that needed
to know that there were more than two genders? In 1960, that was a
very small number indeed. But in 2010, everyone had to know, because
anyone could have a gender that was different from what seemed to be.
A person could be one, have an aunt or an uncle who was one, or
simply do someone who was different.
So well some questions I will be the
answer for you, there are many more questions which you will
encounter and have to answer for yourself. This is because you may be
practicing relational database building in 50 years. And in that
time, things may be different than they are right now. And you will
have to answer, and explain the answer to other people, and it might
be a different answer than we come up with here. Things change, and
that his actually a relational database way of looking at things.
Because in our place there me only be three answers, true, false, and
null, but in the world of 2100, which I remind you, at least some of
you will, be sitting here with a group of students, and answering the
same questions, but they will have different answers, though you will
have to note the fact that way back in the 2000's they did not
realize at the time that that was the case.
So what we will look at is what is
the relational database model in theory and in practice, and how do
those things come together and form this triangle of theory,
practice, and structure.