What is a database?

Understanding databases

November 29, 2021
10 minute reading
databeses

Database is a term you likely regularly hear, particularly when discussing the buzzword of the 21st Century: data. While there's no denying these two terms are intrinsically linked, you may be surprised how vital databases are to the collection, access, and storage of data — particularly for your own business.

Data might be driving our increasingly digital world, but you'd be forgiven for not knowing a lot about the subject of databases. Thankfully, we've created this in-depth guide to help you learn not just what is a database, but also its evolution, benefits, challenges, examples in everyday life, and so much more.

Database (DB)

1. Database defined

A database is an organized collection of related data, or information, which is stored and accessed electronically within a computer system. This data is usually controlled by a database management system (DBMS). Through the DBMS, users can also store and retrieve large amounts of data. This database management system, along with its data and associated applications, is referred to as a database system — or simply ‘database’ for short.

Most databases use what is known as structured query language (SQL) for writing and querying data. Additionally, the data within these databases are typically modeled — or represented — in the form of rows and columns within tables. Setting out the data in this way allows it to be easily accessed, controlled, revised, updated, managed, and structured.

There are many different types of databases, including relational databases, NoSQL databases, object-oriented databases, cloud databases, and more. We examine all of these in additional detail in Chapter 4.

2. What is SQL?

In the previous chapter, we briefly mentioned SQL (or structured query language), along with the fact that most databases use it.

To elaborate on exactly what SQL is: It is a programming language that is used to communicate with a database. SQL is so widely used, that the American National Standards Institute (ANSI) deemed it the standard language for relational database management systems.

Relational databases are those we described in the previous chapter which use SQL and present their data in rows, columns, and tables.

Just some of the most well-known relational database management systems that use SQL include:

  • Oracle

  • Sybase

  • Microsoft SQL Server

  • Access

  • Ingres

The SQL language communicates with databases in a manner that allows users to perform vital tasks such as updating or retrieving data. Standard SQL commands, for example, include Select, Insert, Update, Delete, Create, and Drop. Together, these six basic commands allow a user to carry out practically anything they may need to do within a database.

It’s also important to point out that although the majority of database systems use SQL, most of them also have their own additional built-in extensions, which are unique to their system alone.

3. Evolution of database

The database isn’t a new concept; however, it is one that has continued to evolve over time to become the intelligent, big data-housing system that we know of today.

Data modeling and databases actually date back to the 1960s, and their evolution can be discussed in five key phases.

In the very beginning, database management systems were developed to adhere to three different models: the hierarchical model, network model, and inverted file model.

This took place from the 1960s, however in the 1990s, a new model of DBMS was born: the object-oriented model, also known as object databases.

The second phase of the database’s evolution came in the form of the relational model, which introduced all of the SQL products, along with a few non-SQL products, during the early 1990s. In 2008, this database model began to experience some decline.

Online Analytical Processing (OLAP) and specialized DBMSs were also introduced around 1990, both of which are still quite popular to this day.

The fourth phase, known as the graph database phase, began in 1999 with The Semantic Web stack from the Worldwide Web Consortium. In 2008, this trend continued with the emergence of property graph databases.

Finally, the most recent stage of the evolution of the database began in 2008. Known as the NoSQL phase, it includes databases as we know of them today, including the big data phenomenon.

4. Types of databases

By now, you’ve probably realized that there are numerous different types of databases, each with its own properties and systems. Let’s take a closer look at the various database types below, along with some well-known examples of each.

1. Relational databases

Relational databases are one of the most well-known and widely used types of databases. They get their name from the way they store their data, which is in many related tables. These tables then feature the information in rows and columns. Because of this, relational databases are very reliable, and they work well with structured data. This makes them unsuitable for companies that work with a lot of unstructured or semi-structured data, however.

Relational databases also utilize SQL (structured query language, as discussed earlier in Chapter 2) to read, create, update, and delete data. The program that allows users to create, update, and manage a relational database is known as a relational database management system (RDBMS).

This type of database is also compliant with what’s known as ACID, standing for Atomicity, Consistency, Isolation, and Durability. When these four properties are present, it assures a database transaction is reliable and accurate, despite any other errors which may occur.

Examples of well-known relational databases include:

  • Microsoft SQL Server

  • Oracle Database

  • MySQL

  • PostgreSQL

  • IBM Db2

2. NoSQL databases

You’ve learned about databases that use SQL, but what about NoSQL? NoSQL databases are simply those that don’t use structured query language as their central data access language. They are also referred to as non-relational databases because, unlike relational databases, the data in a NoSQL database doesn’t have to obey a predefined structure or model. Because of this, they are perfect for businesses or users looking to store and oversee both unstructured and semi-structured data. 

Another benefit of a NoSQL database is the fact that developers are able to make a change to the database “on the fly” — or as it is carrying out other tasks — without affecting the programs that are using that database.

Examples of NoSQL databases include:

  • Apache Cassandra

  • MongoDB

  • CouchDB

  • CouchBase

3. Cloud databases

As the name suggests, a cloud database is one that has been designed to run “in the cloud” — or in other words, in servers that are accessed over the Internet. Because they are offered in this software as a service (SaaS) distribution model, these databases are often low maintenance. Additional benefits include flexibility, scalability, and high availability.

Some common examples of cloud databases that you may have heard of include:

  • Microsoft Azure SQL Database

  • Amazon Relational Database Service

  • Oracle Autonomous Database

4. Columnar databases

Columnar databases store their data in columns rather than rows. Also known as column data stores, this database model is well-suited to processing analytical queries, thus making them common within data warehouses.

When requesting to generate data from a columnar database, you can only retrieve information from the columns you want. Therefore, it basically ignores all of the data that doesn’t apply to your query.

Just some examples of columnar databases include:

  • Google BigQuery

  • Cassandra

  • HBase

  • MariaDB

  • Azure SQL Data Warehouse

5.  Wide column databases 

Wide column databases, also referred to as wide column stores, are perfect for supporting real-time big data applications. Because they store their data in column groups rather than rows and columns, they are both highly scalable and can handle an incredibly large amount of data.

Three wide column databases you may have heard of include:

  • BigTable

  • Apache Cassandra

  • Scylla

6. Object-oriented databases

With an object-oriented database, all of its attributes are tied together as an object. These databases are based on object-oriented programming and managed by object-oriented database management systems (OODBMS). As a result, an object-oriented database responds to object-oriented programming languages, such as Java and C++. Furthermore, these databases conform to ACID (Atomicity, Consistency, Isolation, Durability standards), just like relational databases.

Examples of object-oriented databases are:

  • Wakanda

  • ObjectStore

7. Key-value databases

Key-value databases are a type of NoSQL database. Also referred to as a key-value store, these databases save data as a group of key-value pairs containing two data items each. Due to their high scalability and ability to cope with high volumes of traffic, key-value databases are optimal for tasks such as managing sessions of web applications, online shopping carts, or user sessions for giant multi-player online games.

Two key-value databases include:

  • Amazon DynamoDB

  • Redis

8. Hierarchical databases

Known as a parent-child model, hierarchical databases organize data in a family tree-like model, with one object branching out to multiple objects underneath it. It was originally developed by IBM during the early 1960s, so is therefore one of the earliest database models around.

The family tree-like structure that hierarchical databases use is inflexible, so child records cannot have more than one parent record (the data from which it stems). Even to this day, however, this type of database is commonly used to support high-performance and high-availability applications.

Examples include:

  • IBM Information Management System (IMS)

  • Windows Registry

9. Document databases

Document databases are also commonly referred to as document stores or document-oriented databases. They differ from the other databases we’ve talked about so far in the sense that they use JSON-like documents to model data rather than rows and columns. JSON documents comprise of fields, which are name-value pairs. The fields can be in any order and be nested or arranged in arrays (a data structure consisting of a collection of elements).

Document databases were created to store and manage document-oriented information, which is simply another term to describe semi-structured data. Because this database model is straightforward and easy to scale, it’s a great solution for mobile apps that require regular and fast updates or improvements.

Three well-known examples of document databases include:

  • MongoDB

  • Amazon DocumentDB

  • Apache CouchDB

10. Graph databases

Graph databases are another example of a NoSQL database; however, this model is also based on graph theory. Graph theory is the study of relationships through graphs, or more specifically, a method for quantifying and simplifying the numerous moving parts of dynamic systems.

Therefore, graph databases are commonly used to analyze the relationships between varied data points, such as in fraud prevention or for mining data about customers from a company’s social media following.

In order to identify and work with the connections between data points, a user must utilize Graph-Oriented Database Management Systems (DBMS) software.

Some graph databases are:

  • Datastax Enterprise Graph

  • Neo4J

11. Time series databases

As the name implies, a time-series database deals with time series or time-stamped data. This might include things such as network data, sensor data, and application performance monitoring data, for example.

Just some common types of time series databases include:

  • Druid

  • eXtremeDB

  • InfluxDB

5. Database examples

As you learned in the previous chapter, there are numerous different types of databases. By now, you may be wondering how databases are used in your day-to-day life. From gaming to the products you buy online, databases are extremely versatile. Here are 6 examples to show you how.

1. On-demand online video streaming

When using an on-demand video streaming platform, you may have noticed how it keeps track of your viewing preferences. Platforms such as Netflix, for example, match these preferences with the TV shows or movies they feature on your home screen.

Video streaming companies actually use databases to do this, as they have a staggering amount of data to organize and analyze at any given time. Doing so allows them to provide more optimal watching recommendations every time you log in to their service.

netlix

2. Personal Cloud Storage

Personal cloud storage systems you might use include Dropbox, Google Drive, Microsoft OneDrive, and iCloud. Each of these use complex data models and powerful data warehouses to guarantee your data is stored securely and is available to you any time you need to access it — day or night.

icloud

3. Social media

Social media platforms collect and store an extraordinary amount of user information. They also need this data to be cross-referenced, so they can accurately recommend friends, businesses, products, and topics to each user, thus requiring highly reliable and capable database software.

Facebook, for example, runs MySQL in its data centers and has been doing so since its creation. Other social media platforms utilize NoSQL databases, proving that there isn’t a single solution that fits all.

facebook

5. eCommerce

Without the use of databases, eCommerce stores simply wouldn’t be able to run. This is because databases are responsible for tasks such as organizing products, pricing, customer information, and purchasing history.

An online store can use their data to assist them with a wide range of things, such as recommending specific products to individual users based on their past purchases, for example.

As a result, a highly secure database is also important to have, particularly one with PCI Compliance certification.

amazon

6. Healthcare

Healthcare, including hospitals and doctor’s offices, is another major industry that relies heavily on database usage. They must store vast amounts of patient data and be able to access it both securely and easily. As you can imagine, these databases contain complex data structures to not only handle these enormous amounts of data but also to comply with HIPAA (Health Insurance Portability and Accountability Act of 1996) standards for sensitive data management.

Leading healthcare website Healthcare.gov uses NoSQL databases to manage their health insurance information, for example.

healthcare gov uk

6. What is database software?

We’ve mentioned the term database software multiple times now, so let’s take a closer look at what database software is, as well as how it is used.

Database software allows users to create, change, and manage database files and records. Without it, tasks such as data entry, data editing, updating, and reporting would be incredibly difficult.  Database software is also responsible for things such as storage, backup, multi-access control, and security of data.

This last point is particularly important, particularly as volumes of data continue to increase and play a major role in our world. Data theft has become more frequent over time and is an increasingly important issue that organizations must protect against.

Additionally, database software can be praised for making the process of data management much easier. It does this by allowing users to store data in a structured form then access it. Usually, the database software has a graphical interface from which users can create and manage the data. In more advanced situations, users may even be able to construct their own databases through the assistance of database software.

7. What is a database management system (DBMS)?

A database management system — or DBSM as it’s shortened to — is quite similar to database software, as explored in Chapter 6. Its primary purpose, however, is to function as an interface between the end-user and the database. To do this, the DBSM must manage the data, database engine, and database schema all at once in order to enable the organization and management of data.

The main features and capabilities of a DBMS include:

  • A user-accessible catalog describing metadata

  • A DBMS library management system

  • Data abstraction and independence

  • Data security

  • Recording and reviewing activity

  • Concurrency and transaction support

  • Support for authorization of access

  • The ability to access support from remote locations

  • DBMS data recovery in the event of damage or loss

  • Applying restrictions to ensure data follows specific rules

A database management system operates through the use of system commands. First, it obtains instructions from a database administrator in the management system. Then, when it is instructed accordingly, it either retrieves data, modifies data, or loads existing data from the system.

8. What is a MySQL database?

MySQL database is another term you will encounter when dealing with data. Simply put, a MySQL database is an open-source relational database management system, or RDBMS.

It gained its name from the word ‘My’ (the name of the co-founder’s daughter) as well as SQL (structured query language) which is used to generate, change, and extract data from relational databases. As a type of relational database, it organizes data into data tables, where the relationship between data types helps to structure the data.

MySQL also works with an operating system to create a relational database in a computer's storage system. Additionally, it makes it possible to manage users, provides network access, and helps with testing database integrity, along with data backup.

Most often, MySQL is used with other programs to execute applications that need relational database capability.

MySQL is used by many popular websites, including Facebook, Flickr, MediaWiki, Twitter, and YouTube, for example.

9. Key factors that influence database performance

There are five key factors that influence database performance. These include workload, throughput, resources, optimization, and contention. Let’s take a closer look at each one below.

  1. Workload refers to the demands of the database. This might include processes such as online transactions, batch tasks, ad hoc queries, data warehousing analysis, utilities, and system commands — all of which can be happening at any time. Sometimes, this workload might be predictable, such as a lighter workload after employees go home, or a heavier workload at the end of the month when payroll is processed. Usually, however, the workload can fluctuate drastically — even on a minute-by-minute basis.

  2. Throughput describes the capacity and ability of the hardware and software to process data. It is made up of several elements, including I/O speed, CPU speed, parallel capacities of the machine, the core of the DBMS, and the competence of the operating system and software.

  3. The resources of the system refer to the hardware and software tools at the database’s disposal. These resources include things such as database kernel, disk space, memory, cache controllers, and microcode, for example.

  4. Optimization is another key factor that influences database performance. All types of database systems can be optimized, and many can also perform their own query optimization internally. In order to gain the most efficient access paths to the required data, things such as SQL formulation, database parameters, database organization, and more must also be optimized.

  5. Contention describes the scenario when two or more components of the workload are attempting to use a single resource in a contradictory way. This may happen when the demand for a specific resource is high, such as when the system is trying to apply dual updates to the same dataset. As contention increases, however, it also has an impact on throughput, decreasing it.

10. Database management challenges

Although databases provide businesses with unique advantages that weren’t available just decades ago, they do still come with their difficulties. Let’s delve into the five most common database management challenges an organization might experience.

  1. Growing complexity in landscape. As we covered in Chapter 4, there are numerous types of databases available, including relational databases, columnar databases, object-oriented databases, NoSQL databases, and so many more. When selecting a new database management system, the number of options can be overwhelming. As the database market continues to grow and evolve, many businesses are finding it difficult to determine which solution meets their needs the most.

  2. Limits on scalability. All software is limited by resource usage and scalability, and database services are no different. There are a number of elements that can test — and potentially break — the limits of a database, including cataloging components, database architecture, and the configuration of operating systems and hardware.

  3. Increasing data volumes. Did you know more data has been created in the past two years than in the entirety of the human race? As data is being created and stored at an alarming rate, businesses are finding it increasingly difficult to keep up. Just a 10% increase in access to data can generate more than $65 million in net income for an average Fortune 1000 company, however. Therefore, it is in a business’s best interests to continue pursuing more advanced database systems — despite the struggle.

  4. Data security. Databases have become the core of every business, particularly when it comes to their private or sensitive data. Over the years, however, people have learned to take advantage of these systems to gain access to this restricted data. Data security is more important now than ever. After all, a single data breach costs a company $4 million on average, not to mention a loss of trust and reputation.

  5. Decentralized data management. Although decentralized data management (splitting the workload up among multiple machines) has its perks, it does present challenges too. Without centralized knowledge of the entire database, it can be extremely difficult to design and manage. Questions that arise include:

  • How will the data be allocated?

  • What’s the optimal decentralization method?

  • What’s the appropriate degree of decentralization?

11. What is a database administrator?

By now, you might be wondering how a business gains a database to store and manage all of their data in the first place. This is where a data administrator, or DBA, comes in handy. They set up databases based on the organization’s specific needs and then maintain it, so it operates smoothly. Additionally, they assist a business by fine-tuning, upgrading, and testing adjustments made to the database when required.

Simply put, you can think of a data administrator as the guardian of a company’s precious database. They are relied on to employ security measures that ensure sensitive data doesn't fall prey to data theft, while also managing the databases that help store and organize it.

Database administrators are employed in a broad range of settings, including in the public and private sectors. Additionally, some database administrators work as consultants to businesses.

12: What does a database administrator do?

Now that you know what a data administrator is, let’s take a closer look at their role and responsibilities. On any given day, a database administrator carries out the following tasks:

  • Making sure data analysts can easily use the database to find the information they need

  • Developing security measures

  • Backing up systems in case of a power outage or other damaging event

  • Restoring data to prevent loss

  • Guaranteeing the data stored within a database comes from reliable sources

  • Monitoring a database system’s performance to determine when action is required

  • Diagnosing and correcting database problems as they arise

  • Merging old databases into new ones

Database administrators often carry out this work within teams and therefore must be able to communicate effectively with stakeholders such as developers, managers, and other staff.

While general-purpose data administrators perform all of these tasks, specialties within the role can exist.

  • System database administrators are in charge of the physical and technical aspects of a database. They do things such as ensure the database in a firm’s computer system works, as well as install upgrades and patches to fix bugs.

  • Application database administrators, on the other hand, maintain a database that has been created for a specific purpose or a set of functions, such as customer service software. Using complex programming languages, they might write programs or debug them. Additionally, they must be able to manage applications that work with the database. They also do all the tasks of a general DBA, but only for their specific application.

13. Database FAQs

To clear up any questions you may still have about this topic, here are our answers to the most frequently asked questions about databases.

1. Why are databases important?

Along with allowing users to collect and store vast amounts of data in an organized and easily accessible fashion, databases also assist businesses to grow in various ways:

  • Helping organizations to make more informed and research-backed decisions

  • Being able to effectively store and retrieve relevant data

  • Assisting with the aggregation and analysis of business data

  • Gathering and storing critical customer data from various applications

  • Delivering applications and analytics that are data-driven, timely, personalized, and detailed

  • Guaranteeing speedy access to important business data that can be used by numerous departments within a business to understand data patterns, produce reports, and forecast future trends

  • Mapping data from categorized databases used by legacy systems to relational databases used in the data warehouses.

2. What is the difference between spreadsheet and database?

Simply put, a spreadsheet is a better choice for those wanting to track numbers or keep a list of information that’s shared with others from time to time. In scenarios where more complicated data is being worked with or a single source of data is being used for numerous purposes, a database is preferred.

The major technical difference between the two, however, lies in the way they store data. In a spreadsheet, for example, data is stored in cells. They can then be formatted, modified, and manipulated within that cell.

Databases, on the other hand, have cells that contain information from external tables. This simply means that spreadsheets are static (data that remains the same after it's collected), whereas databases can be relational. As a result, when you upload, change, or delete a piece of data in one place, the change will be made in every other place that addresses that data.

3. What is a relational database?

We spoke in-depth about relational databases in Chapter 4, but to summarize, a relational database is a type of database that houses and enables access to data points that are related to one another.

Data in a relational database is presented in tables, and each row, column, and record hold its own interconnecting information. Every row contains a record with a unique ID called the key, while the columns hold attributes of the data, and each record usually features a value for each attribute.

Although relational databases have been around for numerous decades, they are still one of the most popular databases to this day. They are particularly used to track inventories, process eCommerce transactions, manage vast amounts of customer information, and more.

4. What are database fields?

A database field can be described as a set of data values — of the same data type — presented in a table structure.  You may also hear a database field referred to as a column or an attribute. Database fields are arranged into records, which possess all the information within the table related to a specific entity.  The fields make up the columns of the table, whereas their records make up the rows. Additionally, database fields can either be:

  • Required: mandatory to enter data

  • Optional: may contain data or be left blank

  • Calculated: the value is a result of an automatically applied formula concerning other fields

5. What is a record in a database?

A record in a database is a set of fields or group of data saved in a table. Where a field acts as the column within a table, a record acts as the row. Therefore, when a new record is produced, it creates a new row in the table.

A record can have one or more values. Because of this, records are also referred to as a file, data set, or table.

Records provide a practical way to save and extract data from a database because they can be easily created, modified, and removed without affecting other data in the database.

The bottom line: databases are the key in our data-driven world

Although many think of the database as a home for data, it really is so much more than that. Not only are databases integral to things such as video streaming services, online gaming, eCommerce, and even healthcare, but they also offer companies benefits that are key to their success. 

From understanding data patterns and making research-backed decisions to producing reports and forecasting future trends, databases play a pivotal role in allowing companies to not just collect and store vital data but to access, analyze, secure, and back it up too.

Now that you understand what is a database, along with all of its key elements, you have all the knowledge you need to move forward and take advantage of the incredible benefits that databases have to offer.