Database Design using the E-R Model
Design Phases
- Initial phase: characterize fully the data needs of the prospective database users. Feedback from domain experts and users.
- Second phase: conceptual design
- applying the concepts of the chosen data model
- translating these requirements into a conceptual schema of the database
- a fully developed conceptual schema indicates the functional requirements of the enterprise
- describe the operations that will be performed
- Final phase: Moving from an abstract data model to the implementation of the database
- Logical design: deciding on the database schema. Database design requires that we find a "good" collection of relation schemas
- Business decision - what attributes?
- Computer science decision - what relation schemas should we have and how should the attributes be distributed among the various relation schemas?
- Physical design - deciding physical layout
- Logical design: deciding on the database schema. Database design requires that we find a "good" collection of relation schemas
Alternatives
- Two pitfalls
- Redundancy: a bad design may result in repeat information, and data inconsistency
- Incompleteness: a bad design may make somethings impossible to model
Entity Relationship Model (E-R Model)
- Models an enterprise as a collection of entities and relationships
- Entity: a "thing" or "object" in the enterprise that is distinguishable from other objects
- Described by a set of attributes
- Relationship: an association among entities
- Entity: a "thing" or "object" in the enterprise that is distinguishable from other objects
- Represented diagrammatically by an entity-relationship diagram
Database Modelling
The ER data model specifies an enterprise schema that represents the overall logical structure of a database.
The ER model employs three concepts:
- entity sets
- relationship sets
- attributes
Also has an associated diagram representation, the ER diagram which expresses the logical structure.
Entity Sets
- An entity is an object, ex. specific person/company
- An entity set is a set of entities of the same type
- An entity is represented by a set of attributes
- ex instructor = (ID, name, salary)
- A subset of the attributes form a primary key of the dataset
Relationship Sets
A relationship is an association among several entities. For example:
44553 (Peltier) - *advisor* - 22222 (Einstein)student entity - relationship set - instructor entity
A relationship set is a mathematical relation among entities, each taken from entity sets.
An attribute can also be associated with a relationship set.
Roles
Entity sets of a relationship need not be distinct, you can have a 'role' in the relationship
Degree of a Relationship Set
- Binary relationship
- involves two entities
- most relationships in a DB are Binary
- more than two are rare, but do occur in certain scenarios
Complex attributes
- Simple and Composite
- Composite attributes can be divided into sub parts.
- singe-valued and multivalued (phone_numbers)
- Derived attributes
- Domain - the set of permitted values for each attribute
Mapping Cardinality Constraints
- One to one
- One to many
- Many to one
- Many to many
In diagrams, arrow for one, and no arrow for many.
Total and Partial Participation
Total participation is when every entity in the entity set participates in at least one relationship set.
Partial participation is where some entities may not participate in the relationship set.
Keys
- Superkeys: a set of one ore more attributes that, taken collectively allow us to identify uniquely a tuple in the relation (table)
- Candidate keys: superkeys for which no proper subset is a superkey
- Primary keys: a candidate key that is chose by the database designer as the principal means by identifying tuples within a relation (table)
Primary Key
Primary keys provide a way to specify how entities and relations are distinguished.
Entity Sets
No two entities in an entity set are allowed to have exactly the same value for all attributes. A key for an entity is a set of attributes that suffice to distinguish entities from each other.
Relationship Sets
To distinguish among the various relationships of a relationship set we use the individual primary keys of the entities in the relationship set.
- Let R be a relationship set involving entity sets
- The primary key for R consists of the union of the primary keys of entity sets ,
- if the relationship set has attributes associated with it, then the primary key also includes those attributes.
Choice of Primary key for Binary Relationship
- Many-to-Many relationships: the union of the primary keys is chosen as the primary key
- One-to-Many/Many-to-One relationship: the primary key of the "Many" side is used as the primary key
- One-to-One relationships: the primary key of either one of the participating entity sets can be chosen as the primary key
Weak Entity Sets
A weak entity set is one whose existence is dependent on another entity, called its identifying entity. Instead of associating a primary key with a week entity, we use the identifying entity, along with extra attributes called discriminator to uniquely identifying a weak entity.
An entity set that is not a weak entity set is termed a strong entity set.
Every weak entity must be associated with an identifying entity. In other words the weak entity set is said to be existence dependent on the identifying entity set.
The identifying entity set is said to own the weak entity set that it identifies. The relationship associating the weak entity set with the identifying set is called the identifying relationship.
Extended E-R features
Specialization
- Top-down design process; we designate sub-groupings within an entity set that are distinctive from other entities in the set.
- These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.
- Depicted by a hollow arrow-head pointing from the specialized entity to the other entity
- We refer to this relationship as the ISA relationship (which stands for “is a”)
- Attribute inheritance: a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.
- Higher- and lower-level entity sets also may be designated by the terms superclass and subclass, respectively.
- Overlapping: entity may belong to more than one lower-level entity set
- Disjoint: an entity belongs to no more than one lower-level entity set
Generalization
- A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.
- Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.
- The terms specialization and generalization are used interchangeably.
Completeness Constraints
- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization or specialization
- total: an entity must belong to one of the lower-level entity sets
- partial: an entity need not belong to one of the lower-level entity sets
- Partial generalization is the default.
- We can specify total generalization in an ER diagram by adding the keyword total in the diagram and drawing a dashed line from the keyword to the corresponding hollow arrow-head to which it applies (for a total generalization), or to the set of hollow arrow-heads to which it applies (for an overlapping generalization)
Aggregation
You can eliminate some redundancy via Aggregation
- treat relationship as an abstract entity
- allows relationships between relationships
- abstraction relationship into new entity
Design Decisions
- The use of an attribute or entity set to represent an object.
- Whether a real-world concept is best expressed by an entity set or a relationship set.
- The use of a ternary relationship versus a pair of binary relationships.
- The use of a strong or weak entity set.
- The use of specialization/generalization – contributes to modularity in the design.
- The use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure.