-
Notifications
You must be signed in to change notification settings - Fork 3
Syllabus
This is a tentative syllabus for a collaborative, hands-on course in data management for academic scientific research projects.
Please see the rationale page for why we are running this course. You may also wish to view some participant profiles to get an idea of who might want to be involved in a course like this.
We will use a casual, study-group or workshop Guerrilla Education approach: No tuition, no tests, no grades, no credits - just fun and learning!
We will meet once a week for a presentation and workshop, followed by a quick summary discussion. Before parting, we will agree on action items (i.e., "homework") to prepare for the next meeting.
5 min. - Review of last meeting and "homework"
15 min. - Presentation of new material (see outline below)
20 min. - HoE: A guided "hands-on exercise" (laptop or pen/paper)
10 min. - Discussion: share exercise results and choose action items
-------
50 min.
We will be using the following as a textbook for our workshop sessions:
Practical Computing for Biologists
- http://www.sinauer.com/catalog/biology/practical-computing-for-biologists.html
- http://www.amazon.com/dp/0878933913
The handy reference tables from the appendices can be downloaded freely here:
We will not have time to review much of this material during our workshops. Instead we will be assigning readings from this text and will refer to (and use) the information and techniques described in the text. Ideally, this material would have already been covered in a previous course, as it lays a foundation in computer skills needed for data management and analysis. These skills include navigating filesystems, use of a command-line interface (CLI) known as the "shell" (Terminal), matching text with regular expressions, creating data pipelines, shell scripting, and installing software. We will have some time in our meetings to answer questions about these topics.
We will skip the middle section of the book which is about programming with the Python language. It is a great language choice for the book and for your data work, but we simply do not have time to cover this topic. We encourage you to consider learning Python, along with R, if you do not already know those two important data science languages. The chapter on relational databases, however, will be covered in our workshops, however, and expanded upon with material from other sources.
For a more in-depth coverage of database design and SQL, please consider (optional):
- Relational Database Design and Implementation, Third Edition: Clearly Explained, 3rd ed.
- SQL Clearly Explained, 3rd ed.
... both by Jan L. Harrington, who really does "clearly explain" things. The used prices for these are very affordable - $8 to $12 each.
Most other course materials will be available freely over the Internet. Some resources, however, will be accessed as eBooks through the Seattle Public Library. If you do not already have a SPL card, you can register to get one here (restrictions apply):
http://www.spl.org/using-the-library/get-started/get-a-library-card
Participants in this course should expect to learn:
- When to consider the use of a database system for scientific research projects
- How to determine project requirements and anticipate disk, memory and processing needs
- The basics of data security in networked environments
- Practical skills in managing, converting, and processing data files
- How to use a command-line-interface (CLI), such as the Bash, R, and SQL interactive consoles
- Basic database programming using the SQL language
- How to design and implement a relational database
- How to connect to and use a database from various statistical applications
- How websites are built on (and from) database systems (and other web technologies)
- Basic systems administration skills such as installing software and configuring services
- Familiarity with virtual machine (VM) technology and how to use it for data system development
- How to use collaborative project management applications and revision control systems
Exact topics, exercises, dates and times TBD.
- Session 1: Data System Essentials
- Session 2: Database Analysis and Design
- Session 3: Introduction to Relational Databases
- Session 4: Building Database Tables
- Session 5: Database Applications
- Session 6: Structured Query Language (SQL)
- Session 7: Embedded SQL
- Session 8: Mobile Data Collection
- Session 9: Server Administration
- Session 10: Project Management (PM) and Version Control Systems (VCS)
- Session 11: Web-enabled Data: Applications and Frameworks