Create tables with SQL#

Midjourney: The creation of SQL, ref. Michelangelo

The problem with SQL is that it pretends to implement the relational model, but doesn’t.

— David McGoveran

Slides/PDF#

Implement Tables with SQL#

SQL not only allows us to query data from existing tables, but also to create new tables and modify them. We want to use this functionality in the following to implement our logical design for the geometry objects.

For this, we will use SQLite again and create a new database geometry.sqlite. We will also set up a cursor cur directly to execute SQL commands on the database.

import sqlite3

# Create a SQL connection to our SQLite database
con = sqlite3.connect("geometry.sqlite")
cur = con.cursor()

First, we must map the generic data types of the logical design to the concrete data types of SQLite. In principle, data types differ somewhat between DBMSs, and one must always consider which data types are best suited. SQLite is limited to a relatively small set of data types that align well with Python’s primitive data types. The following table compares SQLite’s data types with Python.

SQLite

Python

Boolean

(INTEGER)

bool

Integer

INTEGER

int

Floating-Point Number

REAL

float

Interprets arbitrary input data as a number

NUMERIC

-

Text string stored in UTF-8 or UTF-16

TEXT

str

Arbitrary block of binary data

BLOB

bytes

No data

NULL

None

One feature here is the NUMERIC data type, which is intended to automatically convert numbers from text input into the appropriate numeric representation in the database. The NULL value is not a data type in the table itself; it is used when loading data into tables to convey missing values.

With these data types and the ER diagram of the logical design model above, we can now create our tables. This is done in SQL with the CREATE TABLE command, to which the table name (entity name) follows, and then in parentheses the individual attributes. For the five entities, the following SQL commands result.

sql='''
CREATE TABLE Points (
  point_id INTEGER,
  x REAL,
  y REAL
);

CREATE TABLE Lines (
  lines_id INTEGER,
  start INTEGER,
  end INTEGER
);

CREATE TABLE PolygonTypes (
  polygontype_id INTEGER,
  name TEXT
);

CREATE TABLE Polygons (
  polygon_id INTEGER,
  polygontype INTEGER
);

CREATE TABLE PolygonPoints (
  polypoint_id INTEGER,
  polygon_id INTEGER,
  point_id INTEGER
);
'''

In principle, we could already send these commands to the SQLite database and create the tables. However, the declaration of the primary and foreign keys as well as the NOT NULL constraints is still missing here.

To do this, we mark the lines of the attributes that serve as primary keys with the annotation PRIMARY KEY. In our example we define the key as AUTOINCREMENT, meaning that new rows added without a key are automatically assigned a key.

We define the foreign keys as an additional constraint (condition) by specifying which attributes are foreign keys and on which table and key attributes they refer. The foreign key polytype in the table Polygons for instance references the primary key polytype_id in the table PolygonTypes. Multiple occurrences are possible here, as the example for the start and end Point in the table Lines shows.

We continue to annotate all attributes and foreign keys that must not be null with NOT NULL. Primary keys must never be null, which is why we do not need this annotation.

Thus, our completed SQL script for creating all tables is:

sql='''
CREATE TABLE Points (
  point_id INTEGER PRIMARY KEY AUTOINCREMENT,
  x REAL NOT NULL,
  y REAL NOT NULL
);

CREATE TABLE Lines (
  lines_id INTEGER PRIMARY KEY AUTOINCREMENT,
  start INTEGER NOT NULL,
  end INTEGER NOT NULL,
  FOREIGN KEY(start,end) REFERENCES Points(point_id,point_id)
);

CREATE TABLE PolygonTypes (
  polytype_id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT
);

CREATE TABLE Polygons (
  polygon_id INTEGER PRIMARY KEY AUTOINCREMENT,
  polytype INTEGER,
  FOREIGN KEY(polytype) REFERENCES PolygonTypes(polytype_id)
);

CREATE TABLE PolygonPoints (
  polypoint_id INTEGER PRIMARY KEY AUTOINCREMENT,
  polygon_id INTEGER,
  point_id INTEGER,
  FOREIGN KEY(polygon_id) REFERENCES Polygons(polygon_id),
  FOREIGN KEY(point_id) REFERENCES Points(point_id)
);
'''

Since this SQL script contains multiple commands, we can’t send it to the database with cur.execute(sql), but we use

cur.executescript(sql)

To verify that our tables have been created, we can query all tables in the SQLite database with:

sql="SELECT name FROM sqlite_schema WHERE type='table' ORDER BY name;"

rows=cur.execute(sql)
for row in rows:
    print(row)
('Lines',)
('Points',)
('PolygonPoints',)
('PolygonTypes',)
('Polygons',)
('sqlite_sequence',)

Add data#

In the next step, we want to load some values into the database. This is done with the SQL command INSERT INTO with the table name and the values to be inserted, in exactly the order of the previously declared columns (that is, point_id, x, y).

For example, we want two points with coordinates (54.083336, 12.108811) and (12.094167, 54.075211). We also specify the primary key values as 1 and 2:

sql="""
INSERT INTO Points VALUES(1, 54.083336, 12.108811);
INSERT INTO Points VALUES(2, 12.094167, 54.075211);
"""
cur.executescript(sql)

This primary index must be unique. If we try to create the same points again, the database will return an error indicating that the points already exist.

cur.executescript(sql)
---------------------------------------------------------------------------
IntegrityError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 cur.executescript(sql)

IntegrityError: UNIQUE constraint failed: Points.point_id

Should we deviate from the column order or omit values? This way we can also specify the column names to be written in the INSERT command. If we omit a column, it will be set to NULL, provided the column allows NULL values. Since we declared the primary key point_id as AUTOINCREMENT in the table declaration, we do not necessarily need to include it. A value will then be assigned to it automatically.

sql="INSERT INTO Points(x, y) VALUES(54.083336, 12.108811);"
cur.execute(sql)

However, we have a problem here: we don’t know the value, and thus can’t use it later as a foreign key. In Python, this is stored in the cursor under

cur.lastrowid
3

Next, we want to create a line between the point (54.083336, 12.108811) and (12.094167, 54.075211). We can’t do this directly with the coordinates; instead we must use the foreign keys of the points that we assigned when creating them. So 1 and 2.

sql="INSERT INTO Lines(start, end) VALUES(1, 2);"
cur.execute(sql)
cur.lastrowid
1

Before we can create a polygon, we first need to declare the polygon type in the PolygonTypes table. For example, we say that the polygon type ‘Triangle’ exists with the primary key value 0.

sql='INSERT INTO PolygonTypes VALUES(0, "Dreieck");'
cur.execute(sql)
cur.lastrowid
0

Now we create the entry for the polygon in the Polygon table. Here there is only the primary key, e.g., 33, and the foreign key 0 for the polygon type.

sql='INSERT INTO Polygons VALUES(33, 0);'
cur.execute(sql)
cur.lastrowid
33

For example, if we try to create a polygon for a non-existent polygon type with the foreign key 6, the database will return an error message.

sql='INSERT INTO Polygons VALUES(66, 6);'
cur.execute(sql)
cur.lastrowid
66

Here you can see the database that safeguards data integrity and keeps us from entering incorrect data.

However, our polygon is not yet fully defined, and we still need to link the points belonging to the polygon. We do this with three entries in the table PolygonPoints, specifying the new primary keys (331, 332, 333), the foreign key for the polygon 33, and the foreign keys for the three points (1, 2, 3).

sql="""INSERT INTO PolygonPoints VALUES(331, 33, 1);
INSERT INTO PolygonPoints VALUES(332, 33, 2);
INSERT INTO PolygonPoints VALUES(333, 33, 3);
"""
cur.executescript(sql)
cur.lastrowid
66

Delete data#

To delete entries from a table in SQL, you use the DELETE FROM command with the table and a WHERE clause that specifies which entries should be deleted. For example, if we want to delete the points that were just linked from the PolygonPoints table, we can write that we delete all rows where the referenced polygon has a polygon_id of 33.

sql="DELETE FROM PolygonPoints WHERE polygon_id=33;"
cur.execute(sql)
cur.lastrowid  # returns the most recently deleted row
333

Renaming and Modifying Tables#

Sometimes it is necessary to back up tables or to ‘materialize’ SQL queries with a SELECT, i.e., to store them as a new table. For this we can use CREATE TABLE ... AS with a SELECT. This is useful, for example, to create a copy of a table before making changes. Note

sql="CREATE TABLE PointsCopy AS SELECT * FROM Points;"
cur.execute(sql)

If you want to rename a table, you use the ALTER TABLE ... RENAME command in SQL. To rename the table Points to Punkte, we write

sql="ALTER TABLE Points RENAME TO Punkte;"
cur.execute(sql)

In a similar way, we can also rename columns in tables or add new columns. For example, if we want to add a new column to the polygon types that contains the maximum number of points, this is done as follows:

sql="ALTER TABLE PolygonTypes ADD COLUMN maxPoints INTEGER DEFAULT 0;"
cur.execute(sql)

It should be noted that this creates the column but doesn’t write any values. Therefore, we assign the default value of 0 to the column here using DEFAULT 0.

Delete tables#

SQL also naturally provides a command to completely delete a table’s contents. If you want to delete only the data from a table while keeping the table’s definition (for example when you swap the data), you can use DELETE FROM without a WHERE clause. This is also called truncating a table.

sql="DELETE FROM PointsCopy;"
cur.execute(sql)
<sqlite3.Cursor at 0x10ab5e0c0>

If you want to drop a table entirely, you use the DROP TABLE statement. Here we use the optional IF EXISTS clause to ensure that the table is only dropped if it exists.

sql='''
DROP TABLE IF EXISTS PointsCopy;
DROP TABLE IF EXISTS Punkte;
DROP TABLE IF EXISTS Lines;
DROP TABLE IF EXISTS PolygonTypes;
DROP TABLE IF EXISTS Polygons;
DROP TABLE IF EXISTS PolygonPoints;
'''
cur.executescript(sql)

That wraps up our foray into SQL for working with databases. Finally, we need to close the database connection with close().

cur.close()
con.close()

Note on storing tables in SQLite with Pandas#

In the last exercise we created tables directly with Pandas in SQLite. This is certainly sensible when you want to quickly and securely store larger datasets in a database. However, it should be noted that the tables generated by Pandas do not define relationships with foreign keys, constraints (e.g., NOT NULL), or search indexes. The resulting tables are therefore by no means 3NF-compliant and not optimized for querying and storing large amounts of data in practical applications. It is always recommended to create a proper ER design. Create the tables with CREATE including all relations, constraints, and indexes, and only then load data into the existing tables with Pandas.

Quiz#

--- shuffleQuestions: true shuffleAnswers: true --- ### Which steps are part of the classic database design? - [x] Requirements analysis, conceptual design, logical design - [ ] Only conceptual and physical design - [x] Physical design, implementation and testing - [ ] Only implementation and maintenance ### What is the main difference between conceptual and logical design? - [x] Conceptual design is a rough design, logical is a detailed design - [ ] Conceptual design uses SQL, logical does not - [ ] There is no difference - [ ] Logical design comes before conceptual ### Sort the steps of database design in the correct order: 1. Requirements analysis 2. Conceptual design 3. Logical design 4. Physical design 5. Implementation and testing ### What characterizes an entity in the UML notation for ER diagrams? - [x] The annotation `<<Entity>>` - [ ] The name `Entity` - [ ] The annotation `<Table>` - [ ] The special symbol `ER` - [ ] No special designation ### Which UML elements are NOT used in ER diagrams? - [x] Methods and inheritance - [ ] Attributes and classes - [x] Polymorphism and encapsulation - [ ] Associations ### What does the symbol `>>` in a relation mean? - [x] The relation is read from left to right - [ ] The relation is bidirectional - [ ] It is a notation error - [ ] The relation is read from right to left ### What does the cardinality "1 to 0..*" mean? - [x] An entity can be linked to zero to many others - [ ] An entity must be linked to at least one other - [ ] Exactly one linkage is allowed - [ ] The cardinality is invalid ### In which cardinality are new tables for relations required? - [ ] 1 to 1 - [ ] 1 to 0..1 - [x] 1 to 0..* - [x] 1 to 1..* ### What is the error in this ER diagram fragment? ``` Person >>lives in>> Place 1 1..* ``` - [x] The reading direction doesn't make sense for this cardinality - [ ] People cannot live in places - [ ] The notation is completely wrong - [ ] Attributes are missing ### What is the goal of normalization? - [x] Minimize redundancies - [x] Ensure data integrity - [ ] Slow down queries - [ ] Create more tables ### What does the 1st Normal Form (1NF) require? - [x] All attribute values are atomic - [ ] All attributes are primary keys - [x] No complex attributes like lists - [ ] At least three tables ### Which normal form is violated if customer data is stored directly in the invoice table? - [ ] 1NF - [x] 2NF - [ ] 3NF - [ ] No violation ### Sort the normal forms by their strictness (from less to more strict): 1. 1NF (First Normal Form) 2. 2NF (Second Normal Form) 3. 3NF (Third Normal Form) 4. BCNF (Boyce-Codd Normal Form) ### What is the error in this table design? ```sql CREATE TABLE Customers ( id INTEGER, name TEXT, addresses LIST<TEXT> ) ``` - [x] The attribute `addresses` violates 1NF (not atomic) - [ ] The datatype INTEGER is incorrect - [ ] A primary key is missing - [ ] The table name is invalid ### How is a 1:1 relation represented in tables? - [x] By a foreign key in one of the tables - [ ] By a separate relation table - [ ] By duplicating all attributes - [ ] Not at all, as it is invalid ### When is a foreign key "Nullable"? - [x] With cardinality 1 to 0..1 - [ ] With cardinality 1 to 1 - [x] If the relationship is optional - [ ] Never ### Sort these steps for normalizing a 1:n relation: 1. Create a new relation table 2. Add foreign keys to both original tables 3. Define the primary key for the relation table 4. Check cardinality ### Which SQL command creates a new table? - [x] CREATE TABLE - [ ] INSERT TABLE - [ ] NEW TABLE - [ ] MAKE TABLE ### What does AUTOINCREMENT do on a primary key? - [x] Automatically assigns unique values - [ ] Automatically deletes old entries - [x] Facilitates inserting new records - [ ] Prevents data changes ### What is the error in this SQL code? ```sql CREATE TABLE Points ( point_id INTEGER PRIMARY KEY, x REAL NOT NULL, y REAL NOT NULL, FOREIGN KEY(x) REFERENCES Lines(line_id) ) ``` - [x] An attribute (x) cannot be both a coordinate and a foreign key - [ ] REAL is not a valid data type - [ ] PRIMARY KEY is misspelled - [ ] NOT NULL is redundant ### Sort these SQL commands in the logical order for creating a database: 1. CREATE TABLE for main entities 2. CREATE TABLE for foreign-key dependent tables 3. INSERT seed data 4. INSERT linked data ### With which SQL command do you add data? - [x] INSERT INTO - [ ] ADD DATA - [ ] CREATE DATA - [ ] PUT INTO ### What happens with this SQL command? ```sql DELETE FROM Points WHERE x > 50; ``` - [x] All points with x-coordinate > 50 are deleted - [ ] The table Points is deleted - [ ] Only the first found point is deleted - [ ] There is a syntax error ### Which error is in this INSERT statement? ```sql INSERT INTO Lines VALUES(1, 99, 100); ``` *Assume the points with IDs 99 and 100 do not exist* - [x] Foreign key constraint is violated - [ ] Too many values provided - [ ] INTEGER is not a valid data type - [ ] INSERT is misspelled ### Sort these steps for safe deletion of data: 1. Create a backup of the table 2. Test the DELETE statement with a WHERE condition 3. Perform the actual deletion 4. Verify the result ### With which command do you rename a table? - [x] ALTER TABLE ... RENAME TO - [ ] CHANGE TABLE NAME - [ ] RENAME TABLE - [ ] UPDATE TABLE NAME ### What is the difference between DELETE FROM table and DROP TABLE? - [x] DELETE deletes only data, DROP deletes the entire table - [ ] There is no difference - [x] DROP also removes the table structure - [ ] DELETE is faster ### Which error is in this ALTER statement? ```sql ALTER TABLE Points ADD COLUMN z REAL; ``` *Assume there is already data in the table* - [ ] ADD COLUMN is wrong - [ ] REAL is not a valid data type - [x] A DEFAULT value is missing for existing rows - [ ] The column name z is too short ### Sort these steps for safe table modification: 1. Create a backup of the table 2. Plan the ALTER statement 3. Apply the change 4. Check data integrity