Skip to content

Latest commit

 

History

History
433 lines (271 loc) · 22.1 KB

Introduction_to_Relational_Databases.md

File metadata and controls

433 lines (271 loc) · 22.1 KB

Data Management

For Scientific Research

stanza body copyPhoto: © Stanza. Used with permission.

Systems Development Life Cycle (SDLC)

One of several approaches to systems development is the SDLC, also called the "Waterfall" model.

waterfall modelImage: Peter Kemp / Paul Smith / Wikimedia

========================================================

publish and forget modelImage: Mario Valle. Used with permission.

========================================================

publish and publish and publish modelImage: Mario Valle. Used with permission.

Session 4: Introduction to Relational Databases

ERD exampleImage: Wikimedia

several relationsImage: Wikimedia

Relational Databases

A relational database ...

  • Is based on the relational model developed by E.F. Codd
  • Allows the definition of ...
    • data structures
    • storage and retrieval operations
    • integrity constraints

In such a database, the data and relationships between them are organized into tables.

Source: Wikia.com

Relational Model in Action

Example:

Given an Activity table and Event table, find all events of the "Overlay" activity.

  1. Find code for "Overlay".
  2. "Overlay" has a code of 24.
  3. Find dates with code = 24.
  4. Two dates have code = 24.

relational modelImage: Wikimedia

Designing a Relational Data System

To design a data system, we need to identify requirements and map out interactions and components.

  • Use Cases
  • Process Models
  • Data Flow Diagrams
  • Entity Relationship Diagrams

data modelingGraphic: EPISTLE and its successors / Matthew West, Julian Fowler, Razorbliss / Wikimedia

Conceptual Models and Logical Models

Conceptual Model:

For non-technical or higher-level stakeholders

artist performs song conceptual erd


Logical Model:

For technical stakeholders involved in design or implementation

artist performs song logical erd

Entity-Relationship Diagrams

For a Data Model, ERDs present ...


relational databaseGraphic: Mozilla/dietrich

ER Diagram Symbols

several relationsImage: Wikimedia

Conceptual Data Model ERD

Shows:

May also show:


artist performs song conceptual erd

Keep in mind:

  • Uses common language of the business or field
  • For non-technical or higher-level stakeholders

Example Use Case Diagram

example use case diagram

Example Use Case Diagram Highlighted

example use case diagram

Example Conceptual ERD

example conceptual ERD

Example Conceptual ERD Simplified

example conceptual ERD simplified

Logical Data Model ERD

Shows:


artist performs song logical erd

Database Normalization

Normalization is part of successful database design; without normalization, database systems can be inaccurate, slow, and inefficient, and they might not produce the data you expect.

-- Michelle A. Poolet, SQL by Design: Why You Need Database Normalization

Goals of Database Normalization

  • Logical groupings
  • Minimal duplication
  • Efficient access
  • Data integrity

Normalization in a Nutshell

"Each attribute must represent a [single] fact about the key,^1 the whole key,^2 and nothing but the key."^3

-- Chris Date, An Introduction to Database Systems

  1. Attributes contain single values -- no attribute groups
  2. Non-primary-key attributes depend on entire primary key
  3. Non-primary-key attributes depend only on primary key

Normalization Forms

  • 1st Normal Form (1NF): Remove repeating groups of data
  • 2nd Normal Form (2NF): Remove partial dependencies
  • 3rd Normal Form (3NF): Remove transitive dependencies

There are other forms, but they become increasingly tedious.

Basically, the other forms are about dividing a table into smaller tables to avoid anomalies and reduce duplication.

The First Three Forms: 1NF-3NF

The first three forms will usually cover most real-world situations adequately.

normal formsGraphic: Wikimedia

Basic Normalization Tips

Break up attributes that ...

  • Don't directly relate
  • Are inconsistently structured
  • Lead to repetition

... into separate entities.

Generally, imagine having to input or maintain the data in your tables. What problems or annoyances might come up?

Detailed Normalization Tips

Cardinality:

  • Separate tables into natural (real world) entities
  • Indentify cardinality (1:1, 1:many, many:many, etc.)
  • 1:1 are rare ... think about combining tables
  • many:many are common and messy: divide into more tables

Primary Keys:

  • Using a "natural" unique identifier is often recommended
  • Auto-numbered "id" fields as primary keys avoid problems

Example Logical ERD

example logical ERD

Hands-on Group Exercise

Working as a group create Conceptual and Logical Model ERDs for this use case: Subject takes survey. (Keep it simple.)

Subject takes survey

Discussion

Explain your ERDs.

discussionGraphic: Jagbirlehl / Wikimedia

Confusing Relational Database Terms

We have puposely avoided the use of some basic jargon of relational database theory.

The terms are mathematical in nature and conflict with the terminology of the tools we have just been using.

The next few slides are included for the curious, but can be safely skipped by the impatient, bored, overwhelmed or confused reader.

Okay, let's get pe · dan · tic ...

Basic Relational Database Terms

Here is a comparison of three sets of terms commonly used with relational databases.

Table Row Column
Relation Tuple Attribute
File Record Field

In our previous diagrams, we have used the terms "Actor" and "Entity". In relational-theory-speak these become "Relations".


relational database termsImage: Wikimedia

relational database termsImage: Charles Severance

Relations??

It is a common error to think that "relational" in a database context has something to do with relating data items. It does not. It comes from the mathematical concept of a "relation," basically a collection of data elements that all relate to a single object.

-- Egmont, "relational database technology?", wordreference.com

Relations versus Relationships

A "relation" is different from a "relationship".

  • Relation (noun)
  • Relationship (verb)

For example:

Artist (relation) performs (relationship) song (relation).

artist performs song conceptual erd

Relations (Tables)

A relation is a table organized by rows and columns, according to these rules:

  1. Rows represent a unique instance of an entity.
  2. Uniquely-named columns are the attributes of an entity.
  3. Cells only hold a single value.
  4. All cells in a column hold values of the same data type.

In the Coming Sessions...

  • Building Database Tables
  • Database Applications
  • Structured Query Language (SQL)

Action Items (videos, readings, and tasks)

watching
readings
tasks

See Also

Questions and Comments

questionImage: © Nevit Dilmen / Wikimedia

Some Parting Words

The greatest value of a picture is when it forces us to notice what we never expected to see.

-- John Tukey, American Mathematician


flammarionSource: Andreas Schmidt-Rhaesa, Corinna Schulze and Ricardo Neves/Nikon Small World/Discover Magazine