Database Design using the E-R Model

View on GitHub

Design Phases

Initial phase: characterize fully the data needs of the prospective database users. Feedback from domain experts and users.
Second phase: conceptual design
- applying the concepts of the chosen data model
- translating these requirements into a conceptual schema of the database
- a fully developed conceptual schema indicates the functional requirements of the enterprise
  - describe the operations that will be performed
Final phase: Moving from an abstract data model to the implementation of the database
- Logical design: deciding on the database schema. Database design requires that we find a "good" collection of relation schemas
  - Business decision - what attributes?
  - Computer science decision - what relation schemas should we have and how should the attributes be distributed among the various relation schemas?
- Physical design - deciding physical layout

Alternatives

Two pitfalls
- Redundancy: a bad design may result in repeat information, and data inconsistency
- Incompleteness: a bad design may make somethings impossible to model

Entity Relationship Model (E-R Model)

Models an enterprise as a collection of entities and relationships
- Entity: a "thing" or "object" in the enterprise that is distinguishable from other objects
  - Described by a set of attributes
- Relationship: an association among entities
Represented diagrammatically by an entity-relationship diagram

Database Modelling

The ER data model specifies an enterprise schema that represents the overall logical structure of a database.

The ER model employs three concepts:

entity sets
relationship sets
attributes

Also has an associated diagram representation, the ER diagram which expresses the logical structure.

Entity Sets

An entity is an object, ex. specific person/company
An entity set is a set of entities of the same type
An entity is represented by a set of attributes
- ex instructor = (ID, name, salary)
A subset of the attributes form a primary key of the dataset

Relationship Sets

A relationship is an association among several entities. For example:

44553 (Peltier) -     *advisor*   -  22222 (Einstein)
student entity - relationship set -  instructor entity

A relationship set is a mathematical relation among $n \geq 2$ entities, each taken from entity sets.

An attribute can also be associated with a relationship set.

Roles

Entity sets of a relationship need not be distinct, you can have a 'role' in the relationship

Degree of a Relationship Set

Binary relationship
- involves two entities
- most relationships in a DB are Binary
more than two are rare, but do occur in certain scenarios

Complex attributes

Simple and Composite
- Composite attributes can be divided into sub parts.
singe-valued and multivalued (phone_numbers)
Derived attributes
Domain - the set of permitted values for each attribute

Mapping Cardinality Constraints

One to one
One to many
Many to one
Many to many

In diagrams, arrow for one, and no arrow for many.

Total and Partial Participation

Total participation is when every entity in the entity set participates in at least one relationship set.

Partial participation is where some entities may not participate in the relationship set.

Keys

Superkeys: a set of one ore more attributes that, taken collectively allow us to identify uniquely a tuple in the relation (table)
Candidate keys: superkeys for which no proper subset is a superkey
Primary keys: a candidate key that is chose by the database designer as the principal means by identifying tuples within a relation (table)

Primary Key

Primary keys provide a way to specify how entities and relations are distinguished.

Entity Sets

No two entities in an entity set are allowed to have exactly the same value for all attributes. A key for an entity is a set of attributes that suffice to distinguish entities from each other.

Relationship Sets

To distinguish among the various relationships of a relationship set we use the individual primary keys of the entities in the relationship set.

Let R be a relationship set involving entity sets $E_1, E_2, ... E_n$
The primary key for R consists of the union of the primary keys of entity sets $E_1, E_2, ... E_n$ , $PK(R)= PK(E_1) \union PK(E_2) \union ... PK(E_n)$
if the relationship set $R$ has attributes $a_1, a_2, ..., a_m$ associated with it, then the primary key $R$ also includes those attributes.

Choice of Primary key for Binary Relationship

Many-to-Many relationships: the union of the primary keys is chosen as the primary key
One-to-Many/Many-to-One relationship: the primary key of the "Many" side is used as the primary key
One-to-One relationships: the primary key of either one of the participating entity sets can be chosen as the primary key

Weak Entity Sets

A weak entity set is one whose existence is dependent on another entity, called its identifying entity. Instead of associating a primary key with a week entity, we use the identifying entity, along with extra attributes called discriminator to uniquely identifying a weak entity.

An entity set that is not a weak entity set is termed a strong entity set.

Every weak entity must be associated with an identifying entity. In other words the weak entity set is said to be existence dependent on the identifying entity set.

The identifying entity set is said to own the weak entity set that it identifies. The relationship associating the weak entity set with the identifying set is called the identifying relationship.

Extended E-R features

Specialization

Top-down design process; we designate sub-groupings within an entity set that are distinctive from other entities in the set.
These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.
Depicted by a hollow arrow-head pointing from the specialized entity to the other entity
We refer to this relationship as the ISA relationship (which stands for “is a”)
Attribute inheritance: a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.
Higher- and lower-level entity sets also may be designated by the terms superclass and subclass, respectively.
Overlapping: entity may belong to more than one lower-level entity set
Disjoint: an entity belongs to no more than one lower-level entity set

Generalization

A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.

Completeness Constraints

specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization or specialization
- total: an entity must belong to one of the lower-level entity sets
- partial: an entity need not belong to one of the lower-level entity sets
Partial generalization is the default.
We can specify total generalization in an ER diagram by adding the keyword total in the diagram and drawing a dashed line from the keyword to the corresponding hollow arrow-head to which it applies (for a total generalization), or to the set of hollow arrow-heads to which it applies (for an overlapping generalization)

Aggregation

You can eliminate some redundancy via Aggregation

treat relationship as an abstract entity
allows relationships between relationships
abstraction relationship into new entity

Design Decisions

The use of an attribute or entity set to represent an object.
Whether a real-world concept is best expressed by an entity set or a relationship set.
The use of a ternary relationship versus a pair of binary relationships.
The use of a strong or weak entity set.
The use of specialization/generalization – contributes to modularity in the design.
The use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure.

Symbols and Notation

Intro to Databases

Relational Model