Relational Databases

Relational Databases#

Midjourney: Tabular Relief, ref. Ben Nicholson

If you think you understand SQL, you’re probably wrong.

— C. J. Date

Slides/PDF #

Introduction to Relational Databases#

Relational databases form the foundation of modern information systems and are indispensable in today’s engineering practice. A relational database is a database system that stores information in tables, where the data are organized as tuples and each column has a defined data type. This structured approach enables engineers to efficiently manage complex data sets and to relate them in a meaningful way.

📘 Definition: Relational Database

A relational database is a database system that stores information in tables (relations), where the data are organized as tuples (rows) and each column (attribute) has a defined data type.

The term “relation” originates from mathematics and denotes a table with all its data rows. A relational database thus consists of a collection of tables that are logically connected. For engineers, this means, for example, that project data, bills of materials, and cost estimates can be stored in separate, but linked, tables.

Table structure and basic concepts#

Each table in a relational database has a unique name (Relationennamen), which describes its content. The Relation itself refers to the set of all data tuples in this table. A Tupel corresponds to a single row and consists of a sequence of attribute values. The meaning of these values is defined by the Relationenschema, which defines the column names and their order.

Attribute 1	Attribute 2	Attribute 3
Attribute value	…	…
Attribute value	…	…

In environmental engineering, a table ‘Measurement Stations’ could, for example, contain information about various monitoring locations. Each row would represent a specific measurement station, while the columns describe properties such as location, installation date, or sensor type.

Unique Keys#

Relational databases are built on fixed principles. Information is typically spread across different tables (construction sites, addresses, orders, etc.) and must be linked to derive actionable knowledge. This is done via key attributes. These are special columns that serve to uniquely identify records. These attributes are uniquely assignable and exist in the table only once, which enables reliable identification of each tuple. Row numbers are not suitable as keys, since they can be altered during delete operations and thus lose their uniqueness.

Keys can be defined as a single attribute or consist of a set of attributes. The most common are:

Single attribute or a set of attributes
Unique ID (UID) - single attribute
Globally Unique Identifiers (GUID) - globally unique
UUID (universally unique identifier) - ISO standard

The strength of relational databases lies in the ability to connect tables via keys. The keys of a table are called the Primary Key, while keys from other tables act as Foreign Keys. The foreign key always references the primary key in the other table.

This relationship can be illustrated with the example of municipalities and buildings. The Buildings table contains, in addition to its own BuildingID, the MunicipalityID as a Foreign Key. This creates a logical connection to the Municipalities table without repeating municipality information in the Buildings table. A civil engineer can, for example, find all hotels in Dummerstorf by querying buildings with the type “Hotel” and the corresponding MunicipalityID.

Municipalities Table

MunicipalityID	Name	Population
1	Dummerstorf	7,329
2	Graal-Müritz	4,278
3	Sanitz	5,831

Primary Key: MunicipalityID

Buildings Table

BuildingID	BuildingType	MunicipalityID
5000	Gas Station	2
5001	Hotel	1
5002	Church	2

Foreign Key: MunicipalityID

Integrity constraints and data quality#

Keys define important integrity constraints that ensure data quality and consistency. Primary keys guarantee the local integrity of individual tables by preventing identical records from being stored more than once. Foreign keys, on the other hand, safeguard the global integrity of the entire database system by ensuring that all references between tables remain valid.

For engineers, this integrity is crucial. In a construction database, for example, every reference to a building material must point to an existing material. An invalid reference could lead to errors in cost calculation or material procurement. Equally critical in environmental engineering is the correct mapping of measurement data to their source stations to avoid drawing incorrect conclusions about environmental conditions.

Thus, the tables are related via keys, creating a cohesive and consistent data model that forms the foundation for reliable engineering decisions.

Quiz#

--- shuffleQuestions: true shuffleAnswers: true --- ### Which Python packages are used to load CSV data from the Internet? - [x] `urllib` and `pandas` - [ ] `requests` and `numpy` - [ ] `json` and `matplotlib` - [ ] `sqlite3` and `os` ### What is SQL? - [x] A language for querying and manipulating databases - [ ] A program for data visualization - [ ] A programming language like Python - [ ] A data format like JSON ### What is SQLite? - [x] A local relational database without a server - [ ] A cloud database service - [ ] A NoSQL database system - [ ] A data visualization tool ### Sort the following lines for saving a Pandas table to SQLite: ```python import sqlite3 import pandas as pd ``` 1. `df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})` 2. `con = sqlite3.connect("test.sqlite")` 3. `df.to_sql("tabelle", con, if_exists="replace")` ### Which error is in this example? ```python con = sqlite3.connect("test.sqlite") df.to_sql("tabelle", con, if_exists="append") ``` - [ ] `sqlite3` is not imported - [ ] The database name is too short - [x] No error - the code is correct - [ ] `to_sql` does not exist ### What does the SQL command `SELECT * FROM baustellen;` do? - [ ] Deletes all data from the table - [x] Returns all columns and rows of the table - [ ] Creates a new table - [ ] Counts the rows in the table ### How do you execute an SQL command in Python? - [x] With `cur.execute(sql)` after creating a cursor - [ ] With `con.run(sql)` - [ ] With `sqlite3.query(sql)` - [ ] With `pd.sql(sql)` ### Sort the following lines for an SQL query: ```python con = sqlite3.connect("test.sqlite") ``` 1. `cur = con.cursor()` 2. `sql = 'SELECT * FROM tabelle;'` 3. `for row in cur.execute(sql):` 4. ` print(row)` ### Which SQL command filters Baustellen with a construction start date from 2023? - [ ] `FILTER baubeginn >= '2023-01-01'` - [x] `WHERE baubeginn >= '2023-01-01 00:00:00+01'` - [ ] `IF baubeginn >= '2023-01-01'` - [ ] `SELECT baubeginn >= '2023-01-01'` ### How do you combine multiple conditions in SQL? - [x] With `AND`, `OR`, `NOT` - [ ] With `&&`, `||`, `!` - [ ] With `+`, `-`, `*` - [ ] With `COMBINE`, `MERGE`, `JOIN` ### Which error is in this SQL example? ```sql SELECT * FROM baustellen WHERE baubeginn = '2023-01-01' AND bauende = '2024-01-01' ``` - [ ] `AND` is not allowed - [ ] Date format is wrong - [x] No error - the code is correct - [ ] `WHERE` must be at the end ### What does the SQL function `count(*)` do? - [ ] Sums all values - [x] Counts the number of rows - [ ] Finds the maximum value - [ ] Calculates the average ### Which aggregate functions are in SQL? - [x] `min()`, `max()`, `avg()`, `sum()`, `count()` - [ ] `first()`, `last()`, `middle()` - [ ] `add()`, `subtract()`, `multiply()` - [ ] `top()`, `bottom()`, `center()` ### Sort the following parts of an SQL aggregate query: 1. `SELECT sparte, count(*), avg(baudauer)` 2. `FROM baustellen` 3. `GROUP BY sparte` ### What does `GROUP BY sparte` do in an SQL query? - [ ] Sorts by category - [x] Groups rows with the same category - [ ] Filters by category - [ ] Deletes duplicate categories ### How do you sort results in SQL in descending order? - [ ] `SORT BY spalte DOWN` - [x] `ORDER BY spalte DESC` - [ ] `ARRANGE BY spalte REVERSE` - [ ] `DESCEND BY spalte` ### What does `LIMIT 3` do in an SQL query? - [ ] Limits to 3 columns - [x] Returns at most 3 result rows - [ ] Sorts the first 3 entries - [ ] Filters for values smaller than 3 ### Which error is in this SQL example? ```sql SELECT sparte, count(*) FROM baustellen ORDER BY count(*) GROUP BY sparte ``` - [x] `ORDER BY` must come after `GROUP BY` - [ ] `count(*)` cannot be sorted - [ ] `GROUP BY` is in the wrong position - [ ] A `WHERE` is missing ### What does a `JOIN` do in SQL? - [ ] Combines columns of a table - [x] Joins data from multiple tables - [ ] Sorts table data - [ ] Deletes duplicate entries ### Why are table prefixes like `b.spalte` used in JOINs? - [ ] To beautify the code - [x] To uniquely identify columns with the same name - [ ] To speed up the query - [ ] For error handling ### What does `DISTINCT` do in an SQL query? - [ ] Sorts the results - [ ] Limits the number of results - [x] Removes duplicate rows from the results - [ ] Joins multiple tables ### What is the difference between INNER JOIN and LEFT JOIN? - [x] INNER JOIN returns only matching rows, LEFT JOIN returns all rows from the left table - [ ] LEFT JOIN returns only matching rows, INNER JOIN returns all rows from the left table - [ ] There is no difference, both are the same - [ ] INNER JOIN is faster than LEFT JOIN ### Sort the following parts of a JOIN query: 1. `SELECT b.spalte1, a.spalte2` 2. `FROM baustellen AS b` 3. `JOIN adressenliste AS a` 4. `WHERE b.id = a.id` ### What is the difference between different JOIN types? - [x] They determine which rows are returned when there is no match - [ ] They influence the speed of the query - [ ] They change the sorting of the results - [ ] They determine the number of joined tables ### Which error is in this JOIN example? ```sql SELECT * FROM baustellen NATURAL JOIN adressenliste, gemeinden ``` - [ ] `NATURAL JOIN` does not exist - [ ] Too many tables in the JOIN - [x] Potential cross-join due to multiple tables without explicit conditions - [ ] Falsche Syntax bei `NATURAL JOIN`