Create tables with SQL#

The problem with SQL is that it pretends to implement the relational model, but doesn’t.
— David McGoveran
Slides/PDF#
Implement Tables with SQL#
SQL not only allows us to query data from existing tables, but also to create new tables and modify them. We want to use this functionality in the following to implement our logical design for the geometry objects.
For this, we will use SQLite again and create a new database geometry.sqlite
. We will also set up a cursor cur
directly to execute SQL commands on the database.
import sqlite3
# Create a SQL connection to our SQLite database
con = sqlite3.connect("geometry.sqlite")
cur = con.cursor()
First, we must map the generic data types of the logical design to the concrete data types of SQLite. In principle, data types differ somewhat between DBMSs, and one must always consider which data types are best suited. SQLite is limited to a relatively small set of data types that align well with Python’s primitive data types. The following table compares SQLite’s data types with Python.
SQLite |
Python |
|
---|---|---|
Boolean |
(INTEGER) |
bool |
Integer |
INTEGER |
int |
Floating-Point Number |
REAL |
float |
Interprets arbitrary input data as a number |
NUMERIC |
- |
Text string stored in UTF-8 or UTF-16 |
TEXT |
str |
Arbitrary block of binary data |
BLOB |
bytes |
No data |
NULL |
None |
One feature here is the NUMERIC
data type, which is intended to automatically convert numbers from text input into the appropriate numeric representation in the database. The NULL
value is not a data type in the table itself; it is used when loading data into tables to convey missing values.
With these data types and the ER diagram of the logical design model above, we can now create our tables. This is done in SQL with the CREATE TABLE
command, to which the table name (entity name) follows, and then in parentheses the individual attributes. For the five entities, the following SQL commands result.
sql='''
CREATE TABLE Points (
point_id INTEGER,
x REAL,
y REAL
);
CREATE TABLE Lines (
lines_id INTEGER,
start INTEGER,
end INTEGER
);
CREATE TABLE PolygonTypes (
polygontype_id INTEGER,
name TEXT
);
CREATE TABLE Polygons (
polygon_id INTEGER,
polygontype INTEGER
);
CREATE TABLE PolygonPoints (
polypoint_id INTEGER,
polygon_id INTEGER,
point_id INTEGER
);
'''
In principle, we could already send these commands to the SQLite database and create the tables. However, the declaration of the primary and foreign keys as well as the NOT NULL
constraints is still missing here.
To do this, we mark the lines of the attributes that serve as primary keys with the annotation PRIMARY KEY
. In our example we define the key as AUTOINCREMENT
, meaning that new rows added without a key are automatically assigned a key.
We define the foreign keys as an additional constraint (condition) by specifying which attributes are foreign keys and on which table and key attributes they refer. The foreign key polytype
in the table Polygons
for instance references the primary key polytype_id
in the table PolygonTypes
. Multiple occurrences are possible here, as the example for the start
and end
Point in the table Lines
shows.
We continue to annotate all attributes and foreign keys that must not be null with NOT NULL
. Primary keys must never be null, which is why we do not need this annotation.
Thus, our completed SQL script for creating all tables is:
sql='''
CREATE TABLE Points (
point_id INTEGER PRIMARY KEY AUTOINCREMENT,
x REAL NOT NULL,
y REAL NOT NULL
);
CREATE TABLE Lines (
lines_id INTEGER PRIMARY KEY AUTOINCREMENT,
start INTEGER NOT NULL,
end INTEGER NOT NULL,
FOREIGN KEY(start,end) REFERENCES Points(point_id,point_id)
);
CREATE TABLE PolygonTypes (
polytype_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
CREATE TABLE Polygons (
polygon_id INTEGER PRIMARY KEY AUTOINCREMENT,
polytype INTEGER,
FOREIGN KEY(polytype) REFERENCES PolygonTypes(polytype_id)
);
CREATE TABLE PolygonPoints (
polypoint_id INTEGER PRIMARY KEY AUTOINCREMENT,
polygon_id INTEGER,
point_id INTEGER,
FOREIGN KEY(polygon_id) REFERENCES Polygons(polygon_id),
FOREIGN KEY(point_id) REFERENCES Points(point_id)
);
'''
Since this SQL script contains multiple commands, we can’t send it to the database with cur.execute(sql)
, but we use
cur.executescript(sql)
To verify that our tables have been created, we can query all tables in the SQLite database with:
sql="SELECT name FROM sqlite_schema WHERE type='table' ORDER BY name;"
rows=cur.execute(sql)
for row in rows:
print(row)
('Lines',)
('Points',)
('PolygonPoints',)
('PolygonTypes',)
('Polygons',)
('sqlite_sequence',)
Add data#
In the next step, we want to load some values into the database. This is done with the SQL command INSERT INTO
with the table name and the values to be inserted, in exactly the order of the previously declared columns (that is, point_id, x, y).
For example, we want two points with coordinates (54.083336, 12.108811) and (12.094167, 54.075211). We also specify the primary key values as 1 and 2:
sql="""
INSERT INTO Points VALUES(1, 54.083336, 12.108811);
INSERT INTO Points VALUES(2, 12.094167, 54.075211);
"""
cur.executescript(sql)
This primary index must be unique. If we try to create the same points again, the database will return an error indicating that the points already exist.
cur.executescript(sql)
---------------------------------------------------------------------------
IntegrityError Traceback (most recent call last)
Cell In[8], line 1
----> 1 cur.executescript(sql)
IntegrityError: UNIQUE constraint failed: Points.point_id
Should we deviate from the column order or omit values? This way we can also specify the column names to be written in the INSERT
command. If we omit a column, it will be set to NULL
, provided the column allows NULL values. Since we declared the primary key point_id
as AUTOINCREMENT
in the table declaration, we do not necessarily need to include it. A value will then be assigned to it automatically.
sql="INSERT INTO Points(x, y) VALUES(54.083336, 12.108811);"
cur.execute(sql)
However, we have a problem here: we don’t know the value, and thus can’t use it later as a foreign key. In Python, this is stored in the cursor under
cur.lastrowid
3
Next, we want to create a line between the point (54.083336, 12.108811) and (12.094167, 54.075211). We can’t do this directly with the coordinates; instead we must use the foreign keys of the points that we assigned when creating them. So 1 and 2.
sql="INSERT INTO Lines(start, end) VALUES(1, 2);"
cur.execute(sql)
cur.lastrowid
1
Before we can create a polygon, we first need to declare the polygon type in the PolygonTypes
table. For example, we say that the polygon type ‘Triangle’ exists with the primary key value 0.
sql='INSERT INTO PolygonTypes VALUES(0, "Dreieck");'
cur.execute(sql)
cur.lastrowid
0
Now we create the entry for the polygon in the Polygon
table. Here there is only the primary key, e.g., 33, and the foreign key 0 for the polygon type.
sql='INSERT INTO Polygons VALUES(33, 0);'
cur.execute(sql)
cur.lastrowid
33
For example, if we try to create a polygon for a non-existent polygon type with the foreign key 6, the database will return an error message.
sql='INSERT INTO Polygons VALUES(66, 6);'
cur.execute(sql)
cur.lastrowid
66
Here you can see the database that safeguards data integrity and keeps us from entering incorrect data.
However, our polygon is not yet fully defined, and we still need to link the points belonging to the polygon. We do this with three entries in the table PolygonPoints
, specifying the new primary keys (331, 332, 333), the foreign key for the polygon 33, and the foreign keys for the three points (1, 2, 3).
sql="""INSERT INTO PolygonPoints VALUES(331, 33, 1);
INSERT INTO PolygonPoints VALUES(332, 33, 2);
INSERT INTO PolygonPoints VALUES(333, 33, 3);
"""
cur.executescript(sql)
cur.lastrowid
66
Delete data#
To delete entries from a table in SQL, you use the DELETE FROM
command with the table and a WHERE
clause that specifies which entries should be deleted. For example, if we want to delete the points that were just linked from the PolygonPoints
table, we can write that we delete all rows where the referenced polygon has a polygon_id
of 33.
sql="DELETE FROM PolygonPoints WHERE polygon_id=33;"
cur.execute(sql)
cur.lastrowid # returns the most recently deleted row
333
Renaming and Modifying Tables#
Sometimes it is necessary to back up tables or to ‘materialize’ SQL queries with a SELECT
, i.e., to store them as a new table. For this we can use CREATE TABLE ... AS
with a SELECT
. This is useful, for example, to create a copy of a table before making changes. Note
sql="CREATE TABLE PointsCopy AS SELECT * FROM Points;"
cur.execute(sql)
If you want to rename a table, you use the ALTER TABLE ... RENAME
command in SQL. To rename the table Points
to Punkte
, we write
sql="ALTER TABLE Points RENAME TO Punkte;"
cur.execute(sql)
In a similar way, we can also rename columns in tables or add new columns. For example, if we want to add a new column to the polygon types that contains the maximum number of points, this is done as follows:
sql="ALTER TABLE PolygonTypes ADD COLUMN maxPoints INTEGER DEFAULT 0;"
cur.execute(sql)
It should be noted that this creates the column but doesn’t write any values. Therefore, we assign the default value of 0 to the column here using DEFAULT 0
.
Delete tables#
SQL also naturally provides a command to completely delete a table’s contents. If you want to delete only the data from a table while keeping the table’s definition (for example when you swap the data), you can use DELETE FROM
without a WHERE
clause. This is also called truncating a table.
sql="DELETE FROM PointsCopy;"
cur.execute(sql)
<sqlite3.Cursor at 0x10ab5e0c0>
If you want to drop a table entirely, you use the DROP TABLE
statement. Here we use the optional IF EXISTS
clause to ensure that the table is only dropped if it exists.
sql='''
DROP TABLE IF EXISTS PointsCopy;
DROP TABLE IF EXISTS Punkte;
DROP TABLE IF EXISTS Lines;
DROP TABLE IF EXISTS PolygonTypes;
DROP TABLE IF EXISTS Polygons;
DROP TABLE IF EXISTS PolygonPoints;
'''
cur.executescript(sql)
That wraps up our foray into SQL for working with databases. Finally, we need to close the database connection with close()
.
cur.close()
con.close()
Note on storing tables in SQLite with Pandas#
In the last exercise we created tables directly with Pandas in SQLite. This is certainly sensible when you want to quickly and securely store larger datasets in a database. However, it should be noted that the tables generated by Pandas do not define relationships with foreign keys, constraints (e.g., NOT NULL), or search indexes. The resulting tables are therefore by no means 3NF-compliant and not optimized for querying and storing large amounts of data in practical applications. It is always recommended to create a proper ER design. Create the tables with CREATE
including all relations, constraints, and indexes, and only then load data into the existing tables with Pandas.