For Scientific Research
Photo: © Stanza. Used with permission.
One of several approaches to systems development is the SDLC, also called the "Waterfall" model.
Image: Peter Kemp / Paul Smith / Wikimedia
========================================================
Image: Mario Valle. Used with permission.
========================================================
Image: Mario Valle. Used with permission.
Image: Wikimedia
Image: Wikimedia
A relational database ...
- Is based on the relational model developed by E.F. Codd
- Allows the definition of ...
- data structures
- storage and retrieval operations
- integrity constraints
In such a database, the data and relationships between them are organized into tables.
Source: Wikia.com
Example:
Given an Activity table and Event table, find all events of the "Overlay" activity.
- Find code for "Overlay".
- "Overlay" has a code of 24.
- Find dates with code = 24.
- Two dates have code = 24.
Image: Wikimedia
To design a data system, we need to identify requirements and map out interactions and components.
- Use Cases
- Process Models
- Data Flow Diagrams
- Entity Relationship Diagrams
Graphic: EPISTLE and its successors / Matthew West, Julian Fowler, Razorbliss / Wikimedia
Conceptual Model:
For non-technical or higher-level stakeholders
Logical Model:
For technical stakeholders involved in design or implementation
For a Data Model, ERDs present ...
Graphic: Mozilla/dietrich
Image: Wikimedia
Shows:
May also show:
Keep in mind:
- Uses common language of the business or field
- For non-technical or higher-level stakeholders
Shows:
Normalization is part of successful database design; without normalization, database systems can be inaccurate, slow, and inefficient, and they might not produce the data you expect.
-- Michelle A. Poolet, SQL by Design: Why You Need Database Normalization
- Logical groupings
- Minimal duplication
- Efficient access
- Data integrity
"Each attribute must represent a [single] fact about the key,^1 the whole key,^2 and nothing but the key."^3
-- Chris Date, An Introduction to Database Systems
- Attributes contain single values -- no attribute groups
- Non-primary-key attributes depend on entire primary key
- Non-primary-key attributes depend only on primary key
- 1st Normal Form (1NF): Remove repeating groups of data
- 2nd Normal Form (2NF): Remove partial dependencies
- 3rd Normal Form (3NF): Remove transitive dependencies
There are other forms, but they become increasingly tedious.
Basically, the other forms are about dividing a table into smaller tables to avoid anomalies and reduce duplication.
The first three forms will usually cover most real-world situations adequately.
Graphic: Wikimedia
Break up attributes that ...
- Don't directly relate
- Are inconsistently structured
- Lead to repetition
... into separate entities.
Generally, imagine having to input or maintain the data in your tables. What problems or annoyances might come up?
Cardinality:
- Separate tables into natural (real world) entities
- Indentify cardinality (1:1, 1:many, many:many, etc.)
- 1:1 are rare ... think about combining tables
- many:many are common and messy: divide into more tables
Primary Keys:
- Using a "natural" unique identifier is often recommended
- Auto-numbered "id" fields as primary keys avoid problems
Working as a group create Conceptual and Logical Model ERDs for this use case: Subject takes survey. (Keep it simple.)
Explain your ERDs.
Graphic: Jagbirlehl / Wikimedia
We have puposely avoided the use of some basic jargon of relational database theory.
The terms are mathematical in nature and conflict with the terminology of the tools we have just been using.
The next few slides are included for the curious, but can be safely skipped by the impatient, bored, overwhelmed or confused reader.
Okay, let's get pe · dan · tic ...
Here is a comparison of three sets of terms commonly used with relational databases.
Table | Row | Column |
---|---|---|
Relation | Tuple | Attribute |
File | Record | Field |
In our previous diagrams, we have used the terms "Actor" and "Entity". In relational-theory-speak these become "Relations".
Image: Wikimedia
Image: Charles Severance
It is a common error to think that "relational" in a database context has something to do with relating data items. It does not. It comes from the mathematical concept of a "relation," basically a collection of data elements that all relate to a single object.
-- Egmont, "relational database technology?", wordreference.com
A "relation" is different from a "relationship".
- Relation (noun)
- Relationship (verb)
For example:
Artist (relation) performs (relationship) song (relation).
A relation is a table organized by rows and columns, according to these rules:
- Rows represent a unique instance of an entity.
- Uniquely-named columns are the attributes of an entity.
- Cells only hold a single value.
- All cells in a column hold values of the same data type.
- Building Database Tables
- Database Applications
- Structured Query Language (SQL)
|
|
|
- UW Libraries Data Management Guide
- How to Develop a Data Management and Sharing Plan
- C2.com Database Best Practices
- Normal Forms, without the logic fetish
- A Simple Guide to Five Normal Forms
- The Normal Forms: In a Nutshell (lots of related guides here)
- Enhanced entity–relationship (EER) model
- MySQL Workbench EER Diagram
- Gliffy
- ERD Tutorial
- DB Designer
Image: © Nevit Dilmen / Wikimedia
The greatest value of a picture is when it forces us to notice what we never expected to see.
-- John Tukey, American Mathematician
Source: Andreas Schmidt-Rhaesa, Corinna Schulze and Ricardo Neves/Nikon Small World/Discover Magazine