<div class="vslide">
  <div class="vslide-title">
    <p style="font-family: Protomolecule; font-size: 2.3em; line-height: 90%; margin: 0px auto; text-align: center; width: 100%;"><span style="letter-spacing: .04rem;">programming</span><br><span style="letter-spacing: .0rem;">and databases</span></p>
<p class="author" style="font-family: Protomolecule; margin: 0px auto;  text-align: center; width: 100%; font-size: 1.2em;">Joern Ploennigs</p>
<p class="subtitle" style="font-family: Protomolecule; margin: 1em auto; text-align: center; width: 100%; font-size: 1.2em;">Data storage</p>
    <figcaption>Midjourney: Librarian, ref. Giuseppe Arcimboldo</figcaption>
  </div>
<script>
  function setSectionBackground(c,v){
    let e=document.currentScript.previousElementSibling;
    while(e&&e.tagName!=='SECTION')e=e.parentElement;
    if(e){
      if(c)e.setAttribute('data-background-color',c);
      if(v){
        e.setAttribute('data-background-video',v);
        e.setAttribute('data-background-video-loop','true');
        e.setAttribute('data-background-video-muted','true');
      }
    }
  }
  setSectionBackground('#000000', 'images/09_Datenhaltung/mj_title.mp4');
</script>
<style>
.flex-row{display:flex; gap:2rem; align-items:flex-start; justify-content:space-between;}
.flex-row .col1{flex:1; min-width:10px}
.flex-row .col2{flex:2; min-width:10px}
.flex-row .col3{flex:3; min-width:10px}
.flex-row .col4{flex:4; min-width:10px}
.flex-row .col5{flex:5; min-width:10px}
.flex-row .col6{flex:6; min-width:10px}
.flex-row .col7{flex:7; min-width:10px}
.vcent{display:flex; align-items:center; justify-content:center}
</style>
</div>

# Working with Files

<figure class="mj-tile-band">
    <img src='images/09_Datenhaltung/mj_title_band.jpg'>
    <figcaption>Midjourney: Librarian, ref. Giuseppe Arcimboldo</figcaption>
</figure>

> We don’t have better algorithms. We just have more data.
>
> — Peter Norvig

## <a href="../lec_slides/09_Datenhaltung.slides.html">Slides</a>/<a href="../pdf/slides/09_Datenhaltung.pdf">PDF</a>
<iframe src="../lec_slides/09_Datenhaltung.slides.html" width="750" height="500"></iframe>

## Process

![](images/partB_1.svg)

In programs, data must be loaded and saved regularly. This is done on computers using files that are organized into directories. In the previous section on packages, we have already seen some examples that work with files. We now want to explore them in more detail.

## Reading Files

A typical task is loading files. For this, Python provides the `open()` function with the `r` (read) flag. For this, you pass to the `open()` function the path of the file to be loaded and also the mode of the file, i.e., whether the file is a text file `t` or a binary file `b`.

The `open()` function is usually used within the `with` construct, which assigns the file to a variable (`fi`) and automatically closes the file after the block ends. To read the contents of the file, we use the `read()` method of the file object.

In [None]:
with open("geometry/shapes/Line.py", "tr") as fi:
    file_contents = fi.read()
    print(f"File type {type(fi)}")
    print(f"Type of variable {type(file_contents)}\n")
    print(file_contents)

In the same way, binary files can also be read. For this, we replace the file type `t` with the binary `b` and load the file. We can see that the data type of the loaded file contents now changes to `byte`. If we print the file contents, we also immediately see the special characters in the file such as `\r` and `\n`, which denote line breaks.

In [None]:
with open("geometry/shapes/Line.py", "br") as fi:
    file_content = fi.read()
    print(f"Data type of file {type(fi)}")
    print(f"Data type of variable {type(file_content)}\n")
    print(file_content)

## Writing files

In the same way, we can also create new files using the `open()` function. For this, we use the shorthand `w` (write). Here too, text files are distinguished with `t` and binary files with `b`. To write the file, we use the `write()` method of the file object `fo`.

In [None]:
with open("meineDatei.txt", "tw") as fo:
    fileContent = "Meine eigener Inhalt"
    fo.write(fileContent)

To verify, we read the file again.

In [None]:
with open("meineDatei.txt", "tr") as f:
    print(f.read())

It's important to note that the file will be overwritten completely.

In [None]:
with open("meineDatei.txt", "tw") as fo:
    date_content = "Neuer Inhalt"
    fo.write(date_content)

In [None]:
with open("meineDatei.txt", "tr") as fi:
    print(fi.read())

## File existence test

Often you want to check whether a file already exists and load it accordingly or, for example, recreate it. The standard library `os`, which we have already learned about, offers such functions and more.

In [None]:
import os

if os.path.exists("meineDatei.txt"):
    print("Datei existiert")
else:
    print("Datei existiert noch nicht")

## List files

To list all files in a directory named `folder`, we can use the `os.listdir()` function. With the `os.path.isfile()` function, we can check whether the name refers to a file or a directory. If it is a file, we can open it with the `open` function to, for example, load its contents and compute the number of lines of code. For this, we use the `readlines` function instead of the `read` function to obtain all lines individually in a list.

In [None]:
import os

folder = "geometry/shapes/"
files = 0
codelines = 0
for count, name in enumerate(os.listdir(folder)):
	if os.path.isfile(os.path.join(folder, name)):
		with open(os.path.join(folder, name), "tr") as fi:
			codelines += len(fi.readlines())
			files += 1

print(f"{codelines} Codezeilen in {files} Dateien")

## Delete files

The `os` package also provides functions for deleting files. Of course, these should be used with caution.

In [None]:
os.remove("meineDatei.txt")

## Common text file formats

### TXT files

One of the simplest formats for saving text on a computer is text files. They usually have the file extension `.txt`. We have already used this file extension change above.

### JSON files

Nowadays, structured information is often exchanged in the `JSON` format. In particular, many APIs of web servers on the Internet use this standard. It has the advantage that the data remains readable by humans and can thus be interpreted by web developers. At its core, the standard resembles the representation of a `dict` in Python.

For example, we can store the data record for a person in the following `dict`.

In [None]:
person={
	"firstName": "John",
	"lastName": "Smith",
	"isAlive": True,
	"age": 25,
	"address": {
		"streetAddress": "21 2nd Street",
		"city": "New York",
		"state": "NY",
		"postalCode": "10021-3100" 
	},
 	"children": [],
	"spouse": None
}

Using the `json` package, this dataset can now be easily converted into a `JSON` string and written to a file.

In [None]:
import json

with open("person.json", "tw") as fo:
    json.dump(person, fo, indent=2)

Let's take a look at the file. Since it's a text file, we can load it with `open(name, "tr")`.

In [None]:
with open("person.json", "tr") as file:
    date_content = file.read()
    print(date_content)

We can see that the JSON representation is very similar to the dictionary `person` defined above. The only differences are that the capitalized `True` in Python is written in lowercase here, and the `None` from Python has been replaced with `null`. The structure of both representations is, however, identical.

From this text file we can now load our dataset directly back as a `dict`. For display we will use pretty print this time, because it is easier to read.

In [None]:
from pprint import pprint

with open("person.json", "tr") as fi:
    person_loaded = json.load(fi)
    print(f"Datentyp {type(person_loaded)}")
    pprint(person_loaded)

The loaded `dict` corresponds to our original dictionary `person`. Although the order of the entries has changed, this is not guaranteed in J.

## GeoJSON files

A special variant of the JSON format that is especially relevant for environmental informatics is the standardized [GeoJSON format](https://geojson.org/). This JSON-based format defines how certain geometric objects such as points, lines, and polygons should be represented in JSON. Each element is described as a JSON object (`dict` in Python) and defines the attributes `type` and `coordinates`.

A point is defined here as an element of type `Point` with two coordinates.

In [None]:
point = {
    "type": "Point",
    "coordinates": [12.095843457646907, 54.075229197333016]
}

A line is given as a `LineString` with a list of point coordinates, which are usually the start and end coordinates. If the `LineString` contains more than two coordinates, we have a polyline.

In [None]:
line_oki_on = {
    "type": "LineString",
    "coordinates": [
        [ 12.095844241344963, 54.075206445655795 ],
        [ 12.09606074723871, 54.075028604743636 ],
        [ 12.09593084370266, 54.074930156768204 ],
        [ 12.096282665780166, 54.07495873846247 ],
        [ 12.096558710795335, 54.07507941651065 ],
        [ 12.096840168457192, 54.074863466071434 ],
        [ 12.098052601464076, 54.07534617726671 ],
        [ 12.098187917647891, 54.07534617726671 ],
        [ 12.098317821183883, 54.07541286718799 ],
        [ 12.098377360305278, 54.075339825840246 ],
        [ 12.098501851194726, 54.0753779343855 ]
    ]
}

A polygon is defined with the type `Polygon`, whose coordinates are given as a list of one or more closed line strings, such that the end point coincides with the start point.

In [None]:
campus = { 
    "type": "Polygon",
    "coordinates": [
        [
            [ 12.093402064538196, 54.07479416035679 ],
            [ 12.094194380118807, 54.074246433609375 ],
            [ 12.094578770845374, 54.074103747303894 ] ,
            [ 12.095018074534778, 54.074191200259065 ],
            [ 12.095661340649713, 54.074435147002276 ],
            [ 12.096328140890677, 54.073947252082434 ],
            [ 12.098359920447564, 54.075010487417984 ],
            [ 12.098822758261605, 54.07471591412107 ],
            [ 12.099866104521425, 54.07523141601854 ],
            [ 12.09959938442529, 54.075383303749476 ],
            [ 12.100462302384159, 54.075700885391115 ],
            [ 12.098869826513692, 54.0770356222489 ],
            [ 12.09752838132394, 54.076602988106 ],
            [ 12.095394620552042, 54.076082900668524 ],
            [ 12.09422575895411, 54.07581595060367 ],
            [ 12.094743509729398, 54.07538790639916 ],
            [ 12.093402064538196, 54.07479416035679 ]
        ]
    ]
}

Since many objects not only have geometry but also additional attributes, there is the helper object `Feature`, which provides the `properties` attribute where you can define your own metadata. This way we can define a GeoJSON object to store the position of the OKI.

In [None]:
oki_feature = {
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [12.095843457646907, 54.075229197333016]
  },
  "properties": {
    "name": "OKI",
    "addresse": "Justus-von-Liebig-Weg 2",
    "stadt": "Rostock",
    "postleitzahl": "18059",
    "land": "Deutschland"
  }
}

auf_feature = {
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [12.098494794410726, 54.075390284810425]
  },
  "properties": {
    "name": "AUF",
    "addresse": "Justus-von-Liebig-Weg 6",
    "stadt": "Rostock",
    "postleitzahl":"18059",
    "land": "Deutschland"
  }
}

oki_to_auf_route = {
  "type": "Feature",
  "geometry": line_oki_to_auf,
  "properties": {
    "name": "Weg OKI zu AUF",
    "stadt": "Rostock",
    "postleitzahl":"18059",
    "land": "Deutschland"
  }
}
    
campus_to_auf = {
  "type": "Feature",
  "geometry": campus,
  "properties": {
    "name": "Campus",
    "stadt": "Rostock",
    "postleitzahl":"18059",
    "land": "Deutschland"
  }
}

A collection of `Features` is stored in a `FeatureCollection`. It includes, besides the `type`, the `features` list.

In [None]:
features = {
  "type": "FeatureCollection",
  "features": [
    oki,
    on,
    path_oki_on,
    campus_on
  ]
}

To process these GeoJSON objects in Python, we can use the `geojson` package. We'll install it again with `pip`.

In [None]:
The input is a shell command, not Python code. Please provide the Python source you’d like translated (German to English for variable names, function/class names, docstrings, and inline comments).

The `geojson` package also provides standard classes for points, lines, and polygons, which we had previously defined ourselves as [classes](7a_Object.html). Due to the broad range of Python packages, you can often find packages that provide corresponding classes for your own problems, so a search is always worthwhile. A point in GeoJSON is created using the package.

In [None]:
from geojson import Point

geojson_point = Point((12.095843457646907, 54.075229197333016))
print(type(geojson_point))
print(geojson_point)

New instances can also be created directly from JSON. For this, we use the package's `loads` function. It converts a JSON string into an object. To create the JSON string, we convert the dictionary `punkt` into a string using the `json.dumps()` function.

In [None]:
import geojson
import json

json_str = json.dumps(point)
geojson_point = geojson.loads(json_str)

print(type(geojson_point))
print(geojson_point)

This is how our entire Feature Collection can be loaded as well.

In [None]:
import geojson
import json

json_str=json.dumps(features)
gson_features=geojson.loads(json_str)

print(type(gson_features))
print(gson_features)

The advantage of GeoJSON objects is that there are many other packages available to analyze this format. If we want to visualize our feature collection on a map, we can use the `geojsonio` package.

In [None]:
# pip install geojsonio  --quiet

In [None]:
import geojsonio

geojsonio.display(json_str)

If we follow the link, we reach a webpage that displays the polygons, the line, and the points.

![Campus](images/09_Datenhaltung/campus.png)

We will learn about additional applications of GeoJSON during the exercise.

## XML

XML is another very widespread file format. All websites on the Internet use, for example, this format. It is older than JSON and still very popular because it allows schemas (XLS) to be defined, against which the file can be validated. This ensures, for example, that HTML files are valid.

With the help of the external packages `dicttoxml` and `xmltodict`, XML files can also be written and read easily. We install them with `pip`.

In [None]:
import subprocess
subprocess.run(["pip", "install", "dicttoxml", "xmltodict", "--quiet"], check=True)

In [None]:
import dicttoxml

with open("person.xml", "bw") as fo:
    fo.write(dicttoxml.dicttoxml(person, custom_root="person"))

In [None]:
with open("person.xml", "tr") as fi:
    date_content = fi.read()
    print(date_content)

In [None]:
import xmltodict
from pprint import pprint

with open("person.xml", "rt") as fi:
    person_loaded = xmltodict.parse(fi.read(), xml_attribs=False)
    print(f"Datentyp {type(person_loaded)}")
    pprint(person_loaded)

Here too, the loaded `dict` matches our original.

### CSV files

Tables and measurements are frequently exchanged as CSV files. This is a very simple format in which the first line of the text file contains the column names, and then each line represents a row of the table. All values are separated by commas `,`. Since the comma is used as the decimal separator in German, a `;` or a tab character `\t` is often used here.

For processing tables, Python typically uses the `pandas` library. For example, if we want to store the data set of two people, we first convert it into a `pandas` DataFrame.

In [None]:
people=[
    {"FirstName":"John", "LastName":"Smith", "IsAlive":True, "Age":25},
    {"FirstName":"Mary", "LastName":"Sue", "IsAlive":True, "Age":30}
]

In [None]:
import pandas as pd

table = pd.DataFrame(people)
table

We can now save these as a CSV file.

In [None]:
table.to_csv("leute.csv", index=False) # index=False ensures that the row indices 0 and 1 are omitted

We'll read the file back in as a test. Since it's text-based, we can use `open()` with `tr`.

In [None]:
with open("leute.csv", "tr") as fi:
    date_content = fi.read()
    print(date_content)

We can now load the CSV file back into a table.

In [None]:
table_read = pd.read_csv("leute.csv")
table_read

And convert it back into the Dictionary.

In [None]:
table_read.to_dict("records")

In [None]:
# we delete the file
os.remove("leute.csv")

## Typical binary file formats
### XLS files

These CSV files can also be easily opened in other programs such as Microsoft Excel, or saved from there. Excel’s native format is `.xlsx` files. We can also write these directly from pandas using the `openpyxl` package. We install `openpyxl` using `pip`.

In [None]:
Please provide the Python code containing German variable names, function/class names, docstrings, or inline comments that you would like translated. I will translate only those elements to natural English without altering the program logic.

After the installation, we can simply export the table as an Excel file.

In [None]:
table.to_excel("leute.xlsx", index=False)  # index=False ensures that the row index is omitted

This file is currently a binary file. We can't read it with `open()` and `tr`, so we have to use the binary variant with `br`.

In [None]:
with open("leute.xlsx", "br") as fi:
    file_content = fi.read()
    print(file_content)

What we're seeing is a lot of unreadable binary data. Behind it, in this case, lies a compressed ZIP file, since the `.xlsx` file format is actually just a ZIP file that contains several XML files.

### ZIP files

ZIP files are files that contain other files and directories and compress them. This allows multiple files to be consolidated into a single file and to take up less space. Therefore ZIP files are commonly used when sending multiple files.

Also, the `.xlsx` file from Excel is a disguised ZIP file that contains several XML files in the Open XML format.

This can be shown by renaming the file to a .zip file with `os.rename()`.

In [None]:
os.rename("leute.xlsx", "leute.zip")

If we want to view the files in the ZIP file, we can open them with the `ZipFile` object from the standard library's `zipfile` module. It works just like `open()`, but for ZIP files. With the `namelist` method, we can list all the files in the ZIP file.

In [None]:
import zipfile

with zipfile.ZipFile("leute.zip",'r') as zip_file:
    for fname in zip_file.namelist():
        print(fname)

To read a single file from the ZIP file, we can use the `read()` method. For example, loading `xl/worksheets/sheet1.xml`, which contains our data, will show our data in the typical XML structure.

In [None]:
import zipfile
from pprint import pprint

with zipfile.ZipFile("leute.zip",'r') as zip_file:
    xml_file = zip_file.read("xl/worksheets/sheet1.xml")
    print(xml_file)

Using the `parse` function of the `xmltodict` package, for example, we can convert this XML file into a `dict` in Python.

In [None]:
xml_dict = xmltodict.parse(xml_file, xml_attribs=False)
pprint(xml_dict)

Our original dictionary `leute`, which we defined above, is no longer evident in this dictionary. That's because this format was defined by Microsoft Excel and not specifically designed for our purposes. However, it's important to note that the format is indeed human-readable, so today many other tools, such as LibreOffice, Google Docs, etc., can read and write this format. This is an important reason for using open XML formats.

In [None]:
# we delete the temporary files
os.remove("person.json")
os.remove("person.xml")
os.remove("leute.zip")

## Where is our data stored?

<div class="flex-row">
  <div class="col1">
  
**Smartphones and Tablets**
- Use: Apps that are as simple to use as possible and highly focused
- Data: Data are stored per-app on the device's built-in storage

  </div>
  <div class="col1">

**Desktop PCs**
- Use: Nowadays mostly used as a workstation or hobby machine
- Data: Data reside in folder structures, stored on local drives

  </div>
  <div class="col1">

**Web and Cloud Applications**
- Use: Everywhere—from apps to high-performance computing
- Data: Data reside in globally distributed server farms, usually managed by large corporations

  </div>
</div>

## Lecture Hall Question

<script>setSectionBackground('#FFD966');</script>
<div class="flex-row">
  <div class="col4 vcent">

- What hardware makes up a computer?
- How do these compare to human memory?

  </div>
  <div class="col6"> 
    <figure class="mj-fig">
        <img src="images/09_Datenhaltung/image_4.jpg" class="mj-fig-img">
        <figcaption class="mj-fig-cap">
            DALL-E 2: Early designs of the iPhone by Leonardo da Vinci
        </figcaption>
    </figure>
  </div>
</div>

## Introduction - Where is data stored in a computer?

<img src="images/09_Datenhaltung/computer_hardware2.svg" style="width: 60%; display: block; margin: 0 auto;" alt="Image">

The computer has memory types similar to those of humans:
- CPUs and GPUs have small registers and cache memory (ultra-short-term memory)
- RAM is volatile memory, i.e., its contents are lost when the power is turned off (short-term memory)
- The HDD/SSD is non-volatile storage, i.e., the contents remain intact (long-term memory)

## CPU Register & Cache - Storage at the Processor Level

- *Register* – The memory used by the CPU for calculations (very small)
- *Cache* – Here code and data are prefetched (caching), which will likely be used soon or which should be written elsewhere

- Characteristics:
    - Very fast, on-processor memory
    - Expensive, extremely small capacity
    - Must operate at roughly the same speed as the arithmetic unit to avoid creating a bottleneck

## RAM - Random Access Memory

- Also known as "Direct-access memory" or "Working memory"
- Read-write memory that does not have to be read sequentially; data can be addressed directly by its address
- These accesses are fast; blocks can be addressed efficiently
- Nowadays most commonly used in the context of CPU- or GPU-near memory, i.e., it holds the data currently in use, which would be lost if power were lost

## How is data organized in RAM?

- The memory cells in RAM are divided into addressable blocks
- The operating system assigns each running program as many blocks as it needs
- Programs organize these blocks into *Stack* and *Heap*:
  - *Stack* contains the function calls and important (simple) variables
  - *Heap* contains all other variables

- Each variable in code consists of:
    - a reference (pointer) to that address
    - the size of the variable in memory (determined by the data type)
    - in garbage-collected languages (Python, Java, JavaScript, etc.) there is for each variable also a counter of how often the variable is used

## HDD/SSD – Hard Disk Drive / Solid State Drive

<div class="flex-row">
  <div class="col1">

**Hard Disk Drive (HDD)**
- Data is stored magnetically on the storage medium
- Can last for many years
- Susceptible to shocks

  </div>
  <div class="col1"> 

**Solid State Drive (SSD)**
- Data is stored in electrical charges
- Can last several years, but will eventually discharge
- Erasing uses a voltage spike (flash) → Flash memory

  </div>
  <div class="col1"> 

**Magnetic tapes**
- Data is magnetically stored on plastic tapes
- The data can last for many decades
- Are still used for backups today
- Inexpensive, but very slow

  </div>
</div>

## How is data organized in long-term storage?

- The memory cells in long-term storage are also divided into addressable blocks.
- The operating system organizes these blocks and keeps track of which data is stored where (e.g., which blocks belong to which file)
- This organizational structure is called a *file system*

## File System - Desktop PCs: Files in Folder Structures

- Hierarchical (like in office filing) organized into drives (volumes), folders & files
- Data is stored in files
- These are placed in folder hierarchies (tree structures!)
- They are identified by folder paths and names

<br/>
<div class="flex-row">
<div class="col4">

*Advantages:*
- This allows a very large number of files to be organized

</div>
<div class="col4"> 

*Disadvantages:*
- Can quickly become hard to navigate
- Programs can read almost anything

</div>
</div>

<center>
<img src="images/09_Datenhaltung/oscomp.png" alt="OS File Systems" style="height: 40vh;">
</center>

## File System - Smartphones & Tablets: App-specific Storage

- Data are typically assigned to apps (encapsulated)
- This way the app doesn't see the full file system (and the user often doesn't either)
- This reduces the app's ability to mess things up and improves security

<br/>
<div class="flex-row">
  <div class="col1">

*Advantages:*
- More intuitive use with fewer files per app
- Increased security

  </div>
  <div class="col1"> 

*Disadvantage:*
- It is difficult to sync files between apps

  </div>
</div>

## File System - Cloud: Databases

- Data cannot be stored locally
- They are usually stored only in databases — see the next lecture

<br/>
<div class="flex-row">
  <div class="col1">

*Advantages:*
- Programs and data are physically separated
- Many program instances can access the same data
- Can be added or removed arbitrarily

  </div>
  <div class="col1">

*Disadvantages:*
- Harder to configure and debug
- A certain loss of control

  </div>
</div>

## File System - in Python

- Python generalizes working with file systems as much as possible
- Many different functions and libraries! `os`, `io`, `open()`, `fileinput` …
- Different levels of abstraction – from direct string-based read operations to the hierarchical, object-oriented representations of entire directory structures
- Files are rarely read "as a whole" (inefficient for large files), but line by line, character by character, or also selectively, e.g., through a table of contents

## File formats – Rough categorization

<div class="flex-row">
  <div class="col1">

*Text*
- The file is a large string
- Can be read by humans and software
- Usually results in larger files
- Easier to debug since readable
- Good for structured content (e.g., text, attributes, statistics)
- Additional structure is added via syntax rules (e.g., .csv, .json, …)

  </div>
  <div class="col1"> 

*Binary*
- The file is a byte array
- Only readable by software, not by humans
- Usually results in smaller files
- Harder to debug since not readable
- Good for unstructured and large content (e.g., to display images and videos)
- File extensions indicate which programs the files are associated with

  </div>
</div>

## Data formats for BU engineers - Type 1: Geometric Model Formats

- Vector and raster data defined using a coordinate system
- Different dimensionalities (2D, 2.5D, 3D, …)

- Commonly used formats:
    - *2D Design Formats:* DWG, DXF, SVG, PDF, PNG, TIFF
    - *3D Design Formats:* IFC (STEP), IFC (XML), IFC (JSON), DWG, DXF, OBJ, 3DS
    - *Geospatial Data Formats:* Shapefile, GML (XML), KML (XML), GeoTIFF, GeoJSON (JSON)

## Data formats for BU engineers — Type 2: Attribute formats

- Descriptive, non-geometric data for a specific context
- Often organized as tables or lists of data objects

- Commonly used formats:
    - CSV, ODF (XML), XLSX (XML), XLS, JSON
    - Different levels of complexity

## Data formats for BU Engineers - Type 3: Geometry formats

- Geometric data are usually mathematical, and there is no single clearly defined path
- Graphic formats define such 'styles', e.g., for points, lines, polygons, etc.

- Commonly used formats:
    - CSS, SLD (XML), ArcGIS Styles (*.lyr)

## Data Formats for Civil Engineers - Type 4: Topology Formats

- Geometry by adjacency relations instead of coordinate systems
- Nodes, edges, meshes, grids

- Commonly used formats:
    - GML (XML), TopoJSON (JSON)

## Exchange formats - JSON (JavaScript Object Notation)

<div class="flex-row">
  <div class="col1">

- A human- and machine-readable, structured data format
- Primarily used as an interchange format
- Implemented using key-value pairs (similar to Python dictionaries)
- Allows mapping of all data types from Python:
  - Number
  - String
  - Boolean
  - List
  - Dict
  - None (null)

  </div>
  <div class="col1"> 

```json
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "children": [ ],
  "spouse": null
}
```

  </div>
</div>

## Exchange formats - XML (Extensible Markup Language)

<div class="flex-row">
  <div class="col1">

- Markup languages allow marking up parts of a text to add additional (mostly machine-readable) information and semantics
- XML is in this context a meta-language that allows such languages to be defined
- Markup here is done through opening and closing tags, which are enclosed by `<` and `>`

- Examples of XML-related languages:
    - HTML
    - XHR (XML HTTP Request)
    - GML

  </div>
  <div class="col1"> 

```xml
<person>
  <name> John </name>
  <isAlive> true </isAlive>
  <age> 25 </age>
  <address>
    <cityStreet> New York, 21 2nd Street </cityStreet>
    <postalCode> 10021-3100 </postalCode>
  </address>
  <children> </children>
  <spouse> </spouse>
</person>
```

  </div>
</div>

## Exchange formats - CSV (Comma-Separated Values)

<div class="flex-row">
  <div class="col1">

- Textual representation of structured data (tables, lists, etc.)
- Table rows and columns are separated by delimiter characters
- Delimiter characters can be semicolon, comma, tab, etc.

  </div>
  <div class="col1"> 

```
FirstName;LastName;IsAlive;Age
John;Smith;true;25
Mary;Sue;true;30
```

  </div>
</div>

## Lesson Learned

<script>setSectionBackground('#66ccffff');</script>
<div class="flex-row">
<div class="col2">


</div>
<div class="col3">
    <figure class="mj-fig">
        <img src="images/09_Datenhaltung/mj_title.png" class="mj-fig-img">
        <figcaption class="mj-fig-cap">
            Midjourney: Wilhelm Tell's son with an apple on his head, an arrow stuck in it
        </figcaption>
    </figure>
  </div>
</div>

## Lesson Learned - Goal Setting

<script>setSectionBackground('#66ccffff');</script>

- **S**pecific: describe the goal clearly and precisely in just two sentences.
- **M**easurable: The goal's attainment can be determined quantitatively or qualitatively.
- **A**ttractive: The attainment of the goal is desirable for YOU (I-perspective).
- **R**ealistic: The goal is ambitious and achievable.
- **T**ime-bound: A concrete deadline by which the goal should be achieved.

## Quiz


```{quizdown}
    ---
    shuffleQuestions: true
    shuffleAnswers: true
    ---

    ### Why are files commonly used in programs?
    - [x] To store data permanently
    - [ ] To execute the code faster
    - [ ] To automatically initialize variables
    - [ ] To create graphical user interfaces

    ### Where are files located on a computer?
    - [x] In directories (folders)
    - [ ] In RAM
    - [ ] In the CPU
    - [ ] Only in the cloud

    ### Which of the following statements about handling files in Python is correct?
    - [x] You should always close files after opening or use `with`.
    - [ ] Files can only be loaded from the Internet without specifying a path.
    - [ ] There is no way to read files line by line.
    - [ ] Python does not support a write mode.

    ### What does the following code do?
    ```python
    with open("notizen.txt", "r") as f:
        zeilen = f.readlines()
    ```
    - [x] It reads all lines of the file `notizen.txt` into a list.
    - [ ] It writes all lines to the file.
    - [ ] It deletes the file.
    - [ ] It prints each line to the screen immediately.

    ### What does the function `open("datei.txt", "r")` do in Python?
    - [x] It opens the file `datei.txt` for reading.
    - [ ] It creates a new file named `datei.txt`.
    - [ ] It opens the file `datei.txt` for writing.
    - [ ] It deletes the contents of `datei.txt`.

    ### What happens if the file does not exist and `open("nichtda.txt", "r")` is executed?
    - [x] A `FileNotFoundError` is raised.
    - [ ] An empty file is created.
    - [ ] The program continues normally.
    - [ ] The file is automatically opened in write mode.

    ### What error is in this example?
    ```python
    file = open("daten.txt", "r"
    data = file.read()
    ```
    - [x] A closing parenthesis is missing in the `open` function.
    - [ ] The filename is invalid.
    - [ ] `read()` can only be applied to binary files.
    - [ ] `open` is not a valid command.


    ### Which mode opens a file for writing and overwrites its contents?
    - [x] "w"
    - [ ] "r"
    - [ ] "a"
    - [ ] "x"

    ### What happens in the following code if the file already exists?
    ```python
    with open("ausgabe.txt", "w") as f:
        f.write("Hallo Welt")
    ```
    - [x] The old contents of the file are deleted and overwritten.
    - [ ] An error is raised.
    - [ ] The new text is appended to the end of the file.
    - [ ] The file is not changed.

    ### How do you check in Python whether a file exists?
    - [x] With `os.path.exists("datei.txt")`
    - [ ] With `open("datei.txt")`
    - [ ] With `exists("datei.txt")`
    - [ ] With `os.exist("datei.txt")`

    ### What is necessary to be able to use `os.path.exists`?
    - [x] The `os` module must be imported.
    - [ ] The `datetime` module must be imported.
    - [ ] No imports are required.
    - [ ] `os.path.exists` is a built-in function.

    ### Which function lists all files in a directory?
    - [x] `os.listdir()`
    - [ ] `os.showfiles()`
    - [ ] `os.files()`
    - [ ] `list.files()`

    ### How can you list only `.txt` files from a folder with Python?
    - [x] By combining `os.listdir()` with filtering in a loop
    - [ ] Only with `os.gettxtfiles()`
    - [ ] Automatically with `open("*.txt")`
    - [ ] With `file.list("*.txt")`

    ### How do you delete a file in Python?
    - [x] With `os.remove("datei.txt")`
    - [ ] With `os.delete("datei.txt")`
    - [ ] With `os.clear("datei.txt")`
    - [ ] With `os.erase("datei.txt")`

    ### How do you load a JSON file in Python?
    - [x] With `json.load(open("datei.json"))`
    - [ ] With `json.read("datei.json")`
    - [ ] With `json.load("datei.json")`
    - [ ] With `json.open("datei.json")`

    ### What is the purpose of `json.dump()`?
    - [x] To write Python objects to a JSON file.
    - [ ] To read JSON data.
    - [ ] To delete JSON data.
    - [ ] To format JSON data.

    ### What is JSON?
    - [x] A text-based format for exchanging data.
    - [ ] A binary file format.
    - [ ] An image format

    ### What are text-based files?
    - [ ] *.png
    - [x] *.csv
    - [x] *.xml
    - [x] *.json
    - [ ] *.jpg

    ### What is the difference between JSON and XML?
    - [x] JSON is simpler and more compact, XML is more extensive and flexible.
    - [ ] JSON is older than XML.
    - [ ] XML is easier to read than JSON.
    - [ ] JSON cannot represent nested structures.

```

<div class="vslide">
  <div class="vslide-title">
    <p style="font-family: Protomolecule; font-size: 2.3em; margin: 0px auto; text-align: center; width: 100%;">questions?</p>
  </div>
  <script>setSectionBackground('#000000', 'images/mj_questions.mp4');</script>
</div>