The techniques around data management remains an important aspect of computer science. Which is why we dedicated three chapters to databases: Chapter 32, Chapter 33 and Chapter 34. This post serves as an introduction to the vast topic.
Glossary
Database | A collection of organised data files held in the computer. |
File | A collection of records that are categorised by an entity e.g., Parent. |
Table or Entity | This is a file in a DBMS. |
Record | A group of related data that points to a single entity. |
Field | A single attribute or property within a record. |
Primary Key | A special field in a record that must be uniquely identify a record. |
DBMS | Database Management System, a piece of software used by programmers to structure data so that is can be used by programs. |
Data Type | A field must belong to a data type to indicates what kind of data is to be stored. |
Data, Records and Fields
Do you remember from the early days what we said data is? Let us refresh your memory.
Data can be anything that we would like to store and make use of for a particular application. Let's think of a couple of examples:
Social Media We might want to store data related to a user's profile details like name, surname, interests and display photos. We might also want to store comments and likes.
School We might want to store data about students, teachers, lessons, classrooms etc.
Online Tournament We might want to store data about each player, including their login information, games to be played, scores etc.
Job Hire Application We might want to store data about candidates, and job offers available etc.
What is a record?
A record is a group of related data about an entity. In the above examples, a record would be a group of data about
A Social Media User
A Student
A Player
A Job Candidate (e.g. on the right)
What is a field?
Records tend to consist of multiple fields, where each field is a specific data item. In the example of the job candidate displayed above, some example fields are
Name e.g., "Ricardo"
Surname e.g., "Cooper"
Date of Birth e.g., "June 8 1990"
Email e.g., "ricardocooper@example.com"
Data Source
We have talked about data but we did not mention yet where it will live and how it will be stored. Data is normally stored in a file. You have probably worked with files before.
What is a file?
A file is a container of information. This means that it contains a sequence of records, each having a number of fields. This determines how much 'space' or 'size' the fields inside the record are allocated.
A record can be:
Fixed-Length Each field is of a fixed size e.g. a field can always take up the space of 15 characters.
Variable-Length Each field can take up different sizes based on their need e.g. a field can take up any space of no more than 100 characters.
Let's explain with an example. Imagine we have a file with information about students in a school. For each student record we are only keeping the following information:
Name
Surname
Address
ID number (or no. in short)
From what we learnt so far we can deduce that:
We must have a file called Students.
Every record in the file is about a single student.
Every record has 4 fields: Name, Surname, Address and ID number.
The image below shows that when fields are fixed-length, the row always takes the same amount of space, but there may be a lot of wasted blank characters. In contrast, when fields are variable-length, space is used more efficiently!
What happens if a data item is bigger than the fixed-length field size?
This is an important factor to keep in mind. It is what we call field overflow. If you try and enter a 110 character address in a field with a fixed length of 100 characters, the last 10 characters will be lost because of overflow.
Field Types
When working with fields, we should keep in mind different types, based on the nature of the data they will contain. For textual data we generally make use of the following options:
TEXT or LONG TEXT Used for long generic text of variable-length, e.g., Description that contains paragraphs.
VARCHAR or SHORT TEXT Used for shorter text of variable-length, e.g., Full Name like "Margaret Thatcher".
CHAR Used for short text of a fixed-length, e.g., Number Plate like "EQZ 432".
For numerical data we generally make use of the following options:
NUMBER Used for numbers with a precision set.
Other useful options include:
DATE Used to specify a Date, Month and Year e.g., Date of Birth like 04/05/1987.
BOOLEAN or YES/NO Used to flag a setting as on, TRUE, or off, FALSE, e.g., Email Notifications.
CURRENCY Used to represent money, e.g., Price like £3.50.
AUTONUMBER Integers, e.g., "1", "2" that can be automatically assigned by the DBMS when a record is created.
We will discuss these types more when we get into Databases and the definition on tables.
We now have a good understanding of data in the context of files, records and fields. By keeping these examples in mind, we can finally start talking about databases.
What is a Database?
A database is a collection of files used to store data. We started with the simplest database whereby data is organised and maintained manually in a flat file. Do you remember the file with a record for each student? We could easily open that file in a program like Microsoft Excel and use the data, in fact some programmers do.
However many times we need something a little bit more sophisticated than flat file databases for a number of reasons:
❌ Flat files cannot easily be read from/written to from applications. Have you tried writing to a file in Java? It is a lot of work!
❌ Flat files cannot enforce data integrity in the sense that nothing stops someone from putting a word in a Date of Birth field.
❌ Flat files cannot handle complex queries. Searching through a file to find all the students whose name starts with a letter between A and F is difficult!
❌ It is not possible to adding/remove records without opening the file.
Most modern applications require a lot of control over the data so manual record keeping is not common practice.
What if there is a way for our data to be stored in a better way? A way that is structured but also easily used by an application?
Database Management System (DBMS)
A DBMS is an electronic database tool that creates and manages a number of files to store our data in a structured way. Moreover, it allows us to run queries to get data, update and remove data in a controlled way. Some ways in which DBMSs control this is:
User Permissions Some users can update and delete, others can only view.
Definitions for records and fields We don't want someone putting the value '123' in a Date of Birth.
Keeping backups and logs We want our data safe in case of system failure or attempted intrusion.
Hereunder we list the major features of a DBMS.
Data Security The DBMS can prevent unauthorised users from viewing or updating the database. This is done by using passwords.
Data Integrity The DBMS ensures that the data in the database is correct and is not corrupted. Backups help in this issue. Data integrity is also maintained through the use of various error-checking methods and validation procedures. This aspect will be covered comprehensively in Chapter 41.
Interactive Query Language A DBMS provides a query language and report writer that lets users interactively interrogate the database.
Interactive Data Entry and Updating A DBMS typically provides a way to interactively enter and edit data.
Data Independence The data inside the database is independent of the programs that use them or the users that access them i.e. the database can be accessed by means of different languages and different query languages.
Relational Databases
The data that we would like to store in the computer is complex in nature. Various bits of data is related.
In our School example, we mentioned that a file must store Student records. In real life students also have parents. This means that we might want another file for Parent records to store a lot of data about them, making them entities in their own right. It would be really untidy if all the information about the parents is stored in the Student file.
File or Table | Parent |
Record | Single parent |
Fields | Name, Surname, Mobile Number, Occupation |
How would we keep track of which Parent is linked to which Student?
This is precisely why Relational Databases are popular. With relational databases we can define relationships between different entities or tables. This feature is very handy and makes it convenient for us to grow our database. For example, if our School suddenly started giving badges to students (for being really good), we can easily add a new table called Badge and link (or relate it) it to our Student.
We will see more about how these relationships are defined in Chapter 33. For now, we just need to understand why a relational database is useful.
Defining Your First Table
Let's have a look at what a table looks like in a database. Key things to keep in mind...
A table has a number of records.
Each record has a number of fields.
Each field has a type and a length, which may be variable.
How do we make sure that we don't mix up similar records?
It is very important to define one field as a Primary Key. This is a special field that is unique across all the records in that table. Very often we call this field the identifier or in short simply ID.
Comments