Saturday, December 20, 2014

Wagner, Rackham, Joyce, Tolkien, Pound

Their needs to be something  on the figures,  and on their relationship. I will work on this
after doing the relational database  management.

Relational database management 1

Before

1

This is a story about the underpinnings of what we now call relational databases. It may sound fancy to remember Codd and Date, the progenitors of the relational database, are not household names. But even with with these two, the story goes back several years. This is that story in the outlines, or at least a version of it, which reaches back not to Codd and Date, but to Galileo Galileo, and forward to an unknown future.

It is a story of theory, that is how things ought to be, practice, that is the trade-offs between different kinds of theory, and structure, that is what needs to be codified and what will be left for other people to decide. Because, in programming languages, there is what you need to decide, and what can be left for other people to decide at a later date.

Before we can begin with the beginning, we have to start out with what Codd and Date database actually is, and what has two happen on the computer side. Is it to be done by the processor, or the database. Because in the end processor and database are to completely separate objects.

What is the database suppose to do. Remember that the database is a way of storing structured data in a binary format, which regards tupples as the primary way of storing data. You may think that this is obvious, but you would be mistaken. Several brilliant minds, some you know and some you don't, have made it obvious through the design that is obvious only in retrospect. That's why we're going to cover this in detail, so that you will know what kind of power you're dealing with in a relational database. There are simpler ways of dealing with data, and my be, for example size, that you will use one of them.

First of all, we need to say what a relation to its is, and how we can fake it. Because that is what we do. Relational database combines a series of tuples across a large range of rows. This then is key: there are a lot more rows which contain database generic vs. database specific. In other words, there are a lot more records than rows. Not in all cases, but most databases have at least one, and probably more than one. The other thing that databases have is across between two fields, say people and companies, and they relate these two fields.

So what did people use before relational databases came in to being? Usually hierarchical database design, which organized data by trees, and is still used today. What the relational database proved is that any hierarchy can be relational. Even more databases were flat, with one column which was a tab delimited text file, and a series of records. This meant that it was relation one sense, because it could have numbers in one direction, that could easily be changed. But it's numbers in the other direction were fixed. It had only one set of rows which were defined by the program. And that is where Codd and Date came in. the proved that by making just a few changes, they could make any number of fields. And as we shall show, that made all the difference between a flat file, and any number of flat files.

And that leads to our next nutcase to crack. As far as pure relational theorists are concerned, there is no relational databases, or perhaps one. As far as practicable means, there are dozens. If your already a computer programmer, this is just another case out of several. If you are not, it seems like theoreticians and the programmers are at odds with each other. I would tell you, there are reasons for this which we will get into at a later date, and there are widely different ideas. But the main thrust of the convention, is whether or not this discussion should even be had. To the theoreticians, it says that no relational database exists, and they can stay in the towers, dreaming about things that don't exist. While programmers can argue which model is close enough for government work, and they have several different options to choose from. What this says, is are you theoretician or programer?

So the discussion is, is it relational, and to what degree? If you have two suit all of the () you don't have more than one choice, and so I will let you get on with that choice. You know who you are. For everyone else, you have to define how relational database you want, and then select a choice based on that.

So for the theoreticians, Codd and Date are your keys to the entire world of theoretical databases. You will have to look back at the ancestors of Codd and Date to see who the contributors to their theory was. But for the programmers there are two contributors, Kernighan and Ritchie, the two gentlemen who created the C programming language. You might think, what does a programming language do for the relational database construct. It was these two gentlemen who formulated the hidden language, not what the relational database model used, but how it was implemented. Without the relation database model, C would have just been another programming language. And remember their were lots of programming languages.

C was written in a style called “ the tutorial” which began with an explanation by way of examples. It then moved to a more broad based explanation and then finally with a reference manual. Where as Codd and Date had myriad attempts to explain what they were doing, Kernighan and Richie produced exactly one manual, and then moved on to different things, only coming back to the menu when it was time to revisit it for ANSI C, that is, when it became a standard that needed a revision. Including a trivial change saying that 8 and 9 are not octal digits.

The reason that see was the language of programming was because it took the relational database model and implement as just another way of doing databases. ANSI C didn't care overly much about what database you were using, that was something to be decided when you linked in languages. Remember this was a revelation. The computer language only worried about the processor, and it made other decisions only about the processor. And that meant that whether you're database was flat or something else was a matter for the language, not the processor.

This meant something important. C was small. This important point needs to be emphasized. C was small. That meant that anything which could be done in a library, was done there.

The second point of C was it was developed at the same time as the relation database model. This was not a coincidence. 1970 was a time where people looked at flat file and flat file relational databases, and they were at the cusp of doing things a different way, and they knew it. People do not know that there are doing things a different way most of the time. But 1970, in the world of computing, the new that there had to be a better way than Lisp, or Cobol, or any one of a number of other languages. That is why when you look back years around 1970 produced two of the seminal pieces of programming logic. These two pieces are still around, though they had changed, and they have been crowded by imitators. They even have been superseded, but not forgotten. Not yet anyway.

So is there a point to all this? Yes, there are three points.

One, no matter which database you select, there will be a compromise on a purely theoretical basis. Since we are coming to the relational database on theoretical terms, this is an issue that we must decide. People who are coming to this from practical terms may spend a few minutes say “ this database is relational enough, or has a few exceptions that we know about.” Sadly for us, that will not be possible, where defining which things are possible in which circumstances. Suffice it to say we are the people that people are looking to for answers when they have questions.

Two, do we want to be relational at all? This is a serious question, when to we want to allow enough exceptions that the core of a relational database is not worth the trouble. Clearly there are reasons to do so, which is why we have NoSQL, and objective, and other forms rather than relational at all. Some of these are reformulations of older things that have a purpose that relational database models are not necessary for. Some of these are for when there is no need for them, or at least for free small set of relational database design. Other times, there is so much data to be crunch through, that the full relational database model is to ornate. Why bother to do if you don't have to? Why not allow somebody else to do it? Remember there are thousands of examples, and only a few will need the full relational database model.

And third, to we want programmers to know that we have done this? They don't really need to know, in most cases. I will give you an example. If you assign a number to something, that number will come around and have two cases which one case is zero, and the other case is some large number which isn't supposed to be going from positive to negative, but does so anyway. In what cases to say need to know about this other number? And that will actually change. For example, what about sex, as in male or female? In 1960, the way you determined whether people needed to know or not, was were they part of the medical establishment that needed to know that there were more than two genders? In 1960, that was a very small number indeed. But in 2010, everyone had to know, because anyone could have a gender that was different from what seemed to be. A person could be one, have an aunt or an uncle who was one, or simply do someone who was different.

So well some questions I will be the answer for you, there are many more questions which you will encounter and have to answer for yourself. This is because you may be practicing relational database building in 50 years. And in that time, things may be different than they are right now. And you will have to answer, and explain the answer to other people, and it might be a different answer than we come up with here. Things change, and that his actually a relational database way of looking at things. Because in our place there me only be three answers, true, false, and null, but in the world of 2100, which I remind you, at least some of you will, be sitting here with a group of students, and answering the same questions, but they will have different answers, though you will have to note the fact that way back in the 2000's they did not realize at the time that that was the case.

So what we will look at is what is the relational database model in theory and in practice, and how do those things come together and form this triangle of theory, practice, and structure.