diff --git a/.travis.yml b/.travis.yml index 5912109..6fdd7f5 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,34 +1,16 @@ +dist: xenial language: python python: - # We don't actually use the Travis Python, but this keeps it organized. - - "3.6" - -before_install: cd tools + - 3.7 install: - - sudo apt-get update - # We do this conditionally because it saves us some downloading if the - # version is the same. - #- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then - # wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh; - # else - - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh; - # fi - - bash miniconda.sh -b -p $HOME/miniconda - - export PATH="$HOME/miniconda/bin:$PATH" - - hash -r - - conda config --set always_yes yes --set changeps1 no - - conda update -q conda - # Useful for debugging any issues with conda - - conda info -a - - # Replace dep1 dep2 ... with your dependencies - - conda create -q -n test-environment python=3.6 jupyter - - source activate test-environment - + - pip install nbfancy + - pip install --upgrade nbfancy script: - - make html + - nbfancy configure -y all_magic + - nbfancy render + - nbfancy html deploy: provider: pages @@ -39,3 +21,4 @@ deploy: skip-cleanup: true on: branch: master + python: 3.7 diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..6e9624d --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,32 @@ +## Instructional Material + +This work is Copyright © Jack Betteridge and contains material derived from sources. +This material is made available under the Creative Commons Attribution license. +The following is a human-readable summary of (and not a substitute for) the full legal text of the CC BY 4.0 license. + +You are free: + +- to **Share**---copy and redistribute the material in any medium or format +- to **Adapt**---remix, transform, and build upon the material for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license terms. + +Under the following terms: + +- **Attribution** --- You must give appropriate credit (mentioning that your work is derived from work that is Copyright © James Grant), [the material from which it was derived](https://github.com/arc-lessons/intro-data-plotting/blob/master/README.md), and, where practical linking to https://github.com/arc-lessons/intro-python), provide a [link to the license][cc-by-human], and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. +- **No additional restrictions** --- You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. With the understanding that: + +Notices: + +You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation. +No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. +Software + +Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ +[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode diff --git a/notebooks_plain/00_schedule.ipynb b/nbplain/00_schedule.ipynb similarity index 81% rename from notebooks_plain/00_schedule.ipynb rename to nbplain/00_schedule.ipynb index 5073411..baeaa6f 100644 --- a/notebooks_plain/00_schedule.ipynb +++ b/nbplain/00_schedule.ipynb @@ -25,7 +25,7 @@ } }, "source": [ - "## Prerequisites\n", + "## Prerequisites:\n", "\n", "In order to complete the lesson you should be familar with the content of the course:\n", "* Introduction to Python" @@ -39,7 +39,7 @@ } }, "source": [ - "## Schedule\n", + "## Schedule:\n", "Approximate timings for the lesson:\n", "\n", "| Time | Episode | Description |\n", @@ -66,20 +66,7 @@ "source": [ "## Setup:\n", "\n", - "Log on to the server for today's course at https://rss.jupyterhub.bath.ac.uk. Data and files for the research software courses is available in the folder `RS50001`. We will need to tale copy the `data-plotting` folder, which will also use for the notebooks we will generate during the lesson. You can do this by creating a new notebook and executing the cell:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "outputs": [], - "source": [ - "!cp -r -n RS50001/data-plotting ." + "Log on to the server for today's course at https://rss.jupyterhub.bath.ac.uk. We will need to make copy the `data-plotting` folder, which will also use for the notebooks we will generate during the lesson. DO this by opening the Welcome.ipynb notebook and following the set up for Data and plotting. Data and files for the research software courses is available in the folder `data-plotting`." ] } ], @@ -100,7 +87,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/notebooks_plain/01_jupyter.ipynb b/nbplain/01_jupyter.ipynb similarity index 96% rename from notebooks_plain/01_jupyter.ipynb rename to nbplain/01_jupyter.ipynb index cceb231..47a6ead 100644 --- a/notebooks_plain/01_jupyter.ipynb +++ b/nbplain/01_jupyter.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Overview\n", + "## Overview:\n", "- **Teaching:** 10 min\n", "- **Exercises:** 10 min\n", "\n", @@ -36,7 +36,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: Trying things out\n", + "## Information: Trying things out\n", "In this lesson you can have a Python 3 jupyter notebook open to try out any of the commands you see here and reproduce the results." ] }, @@ -132,7 +132,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: Out of order execution\n", + "## Information: Out of order execution\n", "The ability to go back and change only small snippets of code is very useful, but also very dangerous form a coding point of view. If you edit a code cell and don't run _all_ the code cells after it, then any cell that isn't re-executed is still using the old code. Jupyter allows you to keep track of this by numbering its input, `In [3]` for instance means this block was executed third.\n", "\n", "If you get in a complete mess you can also clear all output, without removing the input and re-execute the code blocks in order." @@ -204,7 +204,9 @@ "\n", "Execute the cells. This should print out `Hello Notebook!`.\n", "\n", - "Now go back and change the values of `a` and `b` to `Goodbye` and `Everyone`. Re-execute your cells. This should now print out `Goodbye Everyone`." + "Now go back and change the values of `a` and `b` to `Goodbye` and `Everyone`. Re-execute your cells. This should now print out `Goodbye Everyone`.\n", + "\n", + "[Solution]()" ] }, { @@ -221,7 +223,9 @@ "source": [ "## Exercise: Getting help\n", "\n", - "Use the interactive Python help to get help about the `open` function for reading and writing files." + "Use the interactive Python help to get help about the `open` function for reading and writing files.\n", + "\n", + "[Solution]()" ] }, { @@ -278,7 +282,9 @@ "source": [ "## Exercise: Mastering markdown\n", "\n", - "Now change the type of the cell in your notebook to markdown. Type in some markdown in the cell and experiment with adding in headings and hyperlinks. Take a look through the [markdown cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) and see if you can add bullet point lists, images, or code blocks to your cell." + "Now change the type of the cell in your notebook to markdown. Type in some markdown in the cell and experiment with adding in headings and hyperlinks. Take a look through the [markdown cheat sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) and see if you can add bullet point lists, images, or code blocks to your cell.\n", + "\n", + "[Solution]()" ] }, { @@ -314,7 +320,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Key Points\n", + "## Key Points:\n", "* Jupyter notebooks allow you to write any Python code into a web interface.\n", "* Cell contents can be easily modified.\n", "* You need to be wary of out of order execution.\n", @@ -338,7 +344,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/nbplain/02_numpy_pt1.ipynb b/nbplain/02_numpy_pt1.ipynb new file mode 100644 index 0000000..449806c --- /dev/null +++ b/nbplain/02_numpy_pt1.ipynb @@ -0,0 +1,758 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction to NumPy" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview:\n", + "- **Teaching:** 15 min\n", + "- **Exercises:** 10 min\n", + "\n", + "**Questions**\n", + "* What is NumPy?\n", + "* Why should I use it?\n", + "\n", + "**Objectives**\n", + "* Use NumPy to convert lists to NumPy arrays.\n", + "* Use NumPy to create arrays from scratch.\n", + "* Manipulate and reshape NumPy arrays." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NumPy ('Numerical Python') is **the** standard module for doing numerical work in Python. Its main feature is its array data type which allows very compact and efficient storage of homogenous (of the same type) data\n", + "\n", + "There is a standard convention for importing `numpy`, and that is as `np`:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have access to the `numpy` package we can start using its features." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Information: Documentation\n", + "As you go through this material, you may find it useful to refer to the [NumPy documentation](https://docs.scipy.org/doc/numpy/), particularly the [array objects](https://docs.scipy.org/doc/numpy/reference/arrays.html) section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating arrays from lists\n", + "\n", + "In many ways a NumPy array can be treated like a standard Python `list` and much of the way you interact with it is identical. Given a list, you can create an array as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1 2 3 4 5 6 7 8]\n" + ] + } + ], + "source": [ + "python_list = [1, 2, 3, 4, 5, 6, 7, 8]\n", + "numpy_array = np.array(python_list)\n", + "print(numpy_array)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# ndim give the number of dimensions\n", + "print(numpy_array.ndim)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(8,)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# the shape of an array is a tuple of its length in each dimension. In this case it is only 1-dimensional\n", + "print(numpy_array.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# as in standard Python, len() gives a sensible answer\n", + "print(len(numpy_array))" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[1 2 3]\n", + " [4 5 6]]\n" + ] + } + ], + "source": [ + "nested_list = [[1, 2, 3], [4, 5, 6]]\n", + "two_dim_array = np.array(nested_list)\n", + "print(two_dim_array)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print(two_dim_array.ndim)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 3)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print(two_dim_array.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating arrays from scratch\n", + "\n", + "It's very common when working with data to not have it already in a Python list but rather to want to create some data from scratch. `numpy` comes with a whole suite of functions for creating arrays. We will now run through some of the most commonly used.\n", + "\n", + "The first is `np.arange` (meaning \"array range\") which works in a vary similar fashion the the standard Python `range()` function, including how it defaults to starting from zero, doesn't include the number at the top of the range and how it allows you to specify a 'step:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.arange(10) #0 .. n-1 (!)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 3, 5, 7])" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.arange(1, 9, 2) # start, end (exclusive), step" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next up is the `np.linspace` (meaning \"linear space\") which generates a given floating point numbers starting from the first argument up to the second argument. The third argument defines how many numbers to create:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.linspace(0, 1, 6) # start, end, num-points" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note how it included the end point unlike `arange()`. You can change this feature by using the `endpoint` argument:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0. , 0.2, 0.4, 0.6, 0.8])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.linspace(0, 1, 5, endpoint=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`np.ones` creates an n-dimensional array filled with the value `1.0`. The argument you give to the function defines the shape of the array:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1., 1., 1.],\n", + " [1., 1., 1.],\n", + " [1., 1., 1.]])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.ones((3, 3)) # reminder: (3, 3) is a tuple" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Likewise, you can create an array of any size filled with zeros:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0., 0.],\n", + " [0., 0.]])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.zeros((2, 2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `np.eye` (referring to the matematical identity matrix, commonly labelled as `I`) creates a square matrix of a given size with `1.0` on the diagonal and `0.0` elsewhere:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1., 0., 0.],\n", + " [0., 1., 0.],\n", + " [0., 0., 1.]])" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.eye(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `np.diag` creates a square matrix with the given values on the diagonal and `0.0` elsewhere:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 0, 0, 0],\n", + " [0, 2, 0, 0],\n", + " [0, 0, 3, 0],\n", + " [0, 0, 0, 4]])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.diag([1, 2, 3, 4])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, you can fill an array with random numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0.10694928, 0.88985274, 0.63606749, 0.59386516])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.random.rand(4) # uniform in [0, 1]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([-1.81972346, -0.13515826, 1.95490428, 0.70545204])" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.random.randn(4) # Gaussian or normally distributed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Try executing these cells multiple times and notice how you get a different result each time." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: print()\n", + "In each of these examples we have omitted the `print()`. How does including it change the output of the cell?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Different arrays\n", + "- Create at least one one dimensional array with each of `arange`, `linspace` and `ones`.\n", + "- Create at least one two dimensional array with each of `zeros`, `eye` and `diag`.\n", + "- Create at least two arrays with different types of random numbers (eg. uniform and Gaussian random numbers).\n", + "- Look at the function `np.empty`. What does it do? When might this be useful?\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Different arrays\n", + "\n", + "* We will make each array one dimensional, with three values for `arange`, `linspace` and `ones`:\n", + "```python\n", + "np.arange(3)\n", + "np.linspace(0,1,3)\n", + "np.ones(3)\n", + "```\n", + "* We will make each array two dimensional, with three values in each dimension for `zeros`, `eye` and `diag`:\n", + "```python\n", + "np.zeros((3,3))\n", + "np.eye(3)\n", + "np.diag(np.arange(1,4))\n", + "```\n", + "* We will make each array one dimensional, with three values for `random.rand` (uniform random numbers) and `random.randn` (Gaussian):\n", + "```python\n", + "np.random.rand(3)\n", + "np.random.randn(3)\n", + "```\n", + "* `np.empty` creates an array of given size eg: `np.empty(3)` with uninitialised memory (seemingly random values). This is **NOT** useful and can cause errors if these uninitialised values are used accidentally in a calculation. If you wish to allocate a NumPy array, but not set numerical values, you might use `np.ones(3)*np.nan` to fill an appropriately sized array with `np.nan` the not a number value. This will now cause errors if the value is not set correctly, or at least be obvious if it is used in a calculation. See [NaN](https://en.wikipedia.org/wiki/NaN) for detailed information.\n", + "\n", + "Notice if you put all these in the same cell you only see the last array, you can either put each array in its own cell, or print each one individually." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reshaping arrays\n", + "\n", + "Behind the scenes, a multi-dimensional NumPy `array` is just stored as a linear segment of memory. The fact that it is presented as having more than one dimension is simply a layer on top of that (sometimes called a *view*). This means that we can simply change that interpretive layer and change the shape of an array very quickly (i.e without NumPy having to copy any data around).\n", + "\n", + "This is mostly done with the `reshape()` method on the array object:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_array = np.arange(16)\n", + "my_array" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(16,)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_array.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0, 1, 2, 3, 4, 5, 6, 7],\n", + " [ 8, 9, 10, 11, 12, 13, 14, 15]])" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_array.reshape((2, 8))" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0, 1, 2, 3],\n", + " [ 4, 5, 6, 7],\n", + " [ 8, 9, 10, 11],\n", + " [12, 13, 14, 15]])" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_array.reshape((4, 4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that if you check, `my_array.shape` will still return `(16,)` as `reshaped` is simply a *view* on the original data, it hasn't actually *changed* it. If you want to edit the original object in-place then you can use the `resize()` method.\n", + "\n", + "You can also transpose an array using the `transpose()` method which mirrors the array along its diagonal:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0, 8],\n", + " [ 1, 9],\n", + " [ 2, 10],\n", + " [ 3, 11],\n", + " [ 4, 12],\n", + " [ 5, 13],\n", + " [ 6, 14],\n", + " [ 7, 15]])" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_array.reshape((2, 8)).transpose()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0, 4, 8, 12],\n", + " [ 1, 5, 9, 13],\n", + " [ 2, 6, 10, 14],\n", + " [ 3, 7, 11, 15]])" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_array.reshape((4,4)).transpose()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: An array puzzle\n", + "\n", + "Using the NumPy [documentation](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html), create, **in one line**, a NumPy array which looks like:\n", + "\n", + "```python\n", + "[10, 60, 20, 70, 30, 80, 40, 90, 50, 100]\n", + "```\n", + "\n", + "Hint: you might need to use `transpose()`, `reshape()` and `arange()` as well as other functions from the \"Shape manipulation\" section of the documentation. Can you find a method which uses fewer than 4 function calls?\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: An array puzzle\n", + "One solution using 4 founction calls is:\n", + "```python\n", + "np.arange(10,101,10).reshape(2,5).transpose().flatten()\n", + "```\n", + "\n", + "A one line solution which only uses one function call is:\n", + "```python\n", + "np.array([10, 60, 20, 70, 30, 80, 40, 90, 50, 100])\n", + "```\n", + "Although not in the spirit of the puzzle exercise, it if far easier to see what is happening for this small array.\n", + "Of course for larger arrays this would be impractical." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Points:\n", + "* `np.array` can convert Python lists to NumPy arrays.\n", + "* NumPy gives many functions for initialising arrays, like `arange`, `linspace`, `ones` and `zeros`.\n", + "* NumPy arrays can be reshaped and resized using the `reshape` and `resize` functions." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks_plain/03_interlude.ipynb b/nbplain/03_interlude.ipynb similarity index 98% rename from notebooks_plain/03_interlude.ipynb rename to nbplain/03_interlude.ipynb index e08af5c..2cf4e95 100644 --- a/notebooks_plain/03_interlude.ipynb +++ b/nbplain/03_interlude.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Overview\n", + "## Overview:\n", "- **Teaching:** 5 min\n", "- **Exercises:** 5 min\n", "\n", @@ -37,7 +37,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: Tuple\n", + "## Information: Tuple\n", "\n", "In the previous episode introducing `numpy` we also used a new data structure the `tuple`, which we will explore a little further. Let's create a new notebook, and import numpy: " ] @@ -140,7 +140,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: Dictionaries\n", + "## Information: Dictionaries\n", "\n", "* Python dicts are like Python lists, they can store anything!\n", "* Python dicts are different to Python lists in the way they are indexed. Lists are indexed with whole numbers, whereas dicts are indexed with some key, which can be anything. In many cases the key is a string.\n", @@ -220,7 +220,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Key Points\n", + "## Key Points:\n", "* Tuples are likke lists but immutable and declared with `(`, `)`.\n", "* Dictionaries are like lists but use a `key` to reference items rather than an `index`.\n", "* Enumerate provides a compact way of iterating index and value in lists.\n", @@ -244,7 +244,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.4" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/notebooks_plain/03_numpy_pt2.ipynb b/nbplain/03_numpy_pt2.ipynb similarity index 98% rename from notebooks_plain/03_numpy_pt2.ipynb rename to nbplain/03_numpy_pt2.ipynb index e689a80..218eb58 100644 --- a/notebooks_plain/03_numpy_pt2.ipynb +++ b/nbplain/03_numpy_pt2.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Overview\n", + "## Overview:\n", "- **Teaching:** 10 min\n", "- **Exercises:** 10 min\n", "\n", @@ -226,7 +226,9 @@ "source": [ "## Exercise: `dtype`s\n", "\n", - "Recreate some of the arrays we created in the previous lesson and look at what dtype they have. Try looking at the solutions to the exercise \"Different arrays\"." + "Recreate some of the arrays we created in the previous lesson and look at what dtype they have. Try looking at the solutions to the exercise \"Different arrays\".\n", + "\n", + "[Solution]()" ] }, { @@ -345,7 +347,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: `enumerate()`\n", + "## Information: `enumerate()`\n", "\n", "We briefly introduced the function `enumerate()` earlier. How can you find out what it does?" ] @@ -358,7 +360,9 @@ "\n", "Using `%timeit`, time how long finding the square roots of a list of numbers would take under both standard Python and NumPy.\n", "\n", - "Hint: Python's square root function is `math.sqrt`. NumPy's is `np.sqrt`." + "Hint: Python's square root function is `math.sqrt`. NumPy's is `np.sqrt`.\n", + "\n", + "[Solution]()" ] }, { @@ -543,7 +547,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Key Points\n", + "## Key Points:\n", "* NumPy arrays consist of values that are all the same type (or `dtype`).\n", "* Python lists do not have to be all the same type.\n", "* NumPy is more often faster than Python, partially due to arrays being of the same type, partialy due to running more optimised and compiled code.\n", @@ -567,7 +571,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.4" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/notebooks_plain/03_plotting.ipynb b/nbplain/03_plotting.ipynb similarity index 99% rename from notebooks_plain/03_plotting.ipynb rename to nbplain/03_plotting.ipynb index bca8719..eabe29e 100644 --- a/notebooks_plain/03_plotting.ipynb +++ b/nbplain/03_plotting.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Overview\n", + "## Overview:\n", "- **Teaching:** 5 min\n", "- **Exercises:** 10 min\n", "\n", @@ -66,7 +66,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: Plotting in notebooks\n", + "## Information: Plotting in notebooks\n", "You may have noticed the line\n", "```python\n", "%config InlineBackend.figure_format = 'svg'\n", @@ -751,7 +751,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: matplotlib in notebooks\n", + "## Information: matplotlib in notebooks\n", "Due to the nature of notebooks, you can see `[]` before the plot figure. Really here we should have called `plot.show()` to produce the plot, which is what we do in later examples. The interplay between notebooks and their content can be complex, but if you follow the guidlines here, you should at least be able to reproduce what you see." ] }, @@ -2686,7 +2686,9 @@ "metadata": {}, "source": [ "## Exercise: Another function\n", - "Another interesting class of functions, sometimes used in machine learning applications is the sigmoid class of functions. The hyberbolic tangent function is an example of a sigmoid function and `np.tanh` can be used to calculate its value. Following the steps above plot a graph of the hyperbolic tangent function. Dont forget to label your axes!" + "Another interesting class of functions, sometimes used in machine learning applications is the sigmoid class of functions. The hyberbolic tangent function is an example of a sigmoid function and `np.tanh` can be used to calculate its value. Following the steps above plot a graph of the hyperbolic tangent function. Dont forget to label your axes!\n", + "\n", + "[Solution]()" ] }, { @@ -2717,7 +2719,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Key Points\n", + "## Key Points:\n", "* We can plot a function quickly using `plot.plot(x, y)`.\n", "* We can (and should) add a title and axes labels to our plots." ] @@ -2739,7 +2741,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/nbplain/04_pandas_pt1.ipynb b/nbplain/04_pandas_pt1.ipynb new file mode 100644 index 0000000..55a349e --- /dev/null +++ b/nbplain/04_pandas_pt1.ipynb @@ -0,0 +1,1583 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# Introduction to pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview:\n", + "- **Teaching:** 20 min\n", + "- **Exercises:** 10 min\n", + "\n", + "**Questions**\n", + "* What is pandas?\n", + "* Why should I use series and data frames?\n", + "\n", + "**Objectives**\n", + "* Use pandas to convert lists to series.\n", + "* Learn about slicing and broadcasting series (and by extension NumPy arrays).\n", + "* Use pandas to convert dicts to data frames." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "Pandas is a library providing high-performance, easy-to-use data structures and data analysis tools. The core of pandas is its *dataframe* which is essentially a table of data. Pandas provides easy and powerful ways to import data from a variety of sources and export it to just as many. It is also explicitly designed to handle *missing data* elegantly which is a very common problem in data from the real world.\n", + "\n", + "The offical [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/) is very comprehensive and you will be answer a lot of questions in there, however, it can sometimes be hard to find the right page. Don't be afraid to use Google to find help.\n", + "\n", + "Just like numpy, pandas has a standard convention for importing it:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We also explicitly import `Series` and `DataFrame` as we will be using them a lot." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from pandas import Series, DataFrame" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Series\n", + "\n", + "The simplest of pandas' data structures is the `Series`. It is a one-dimensional list-like structure.\n", + "Let's create one from a `list`:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 14\n", + "1 7\n", + "2 3\n", + "3 -7\n", + "4 8\n", + "dtype: int64" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Series([14, 7, 3, -7, 8])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are three main components to this output.\n", + "The first column (`0`, `2`, etc.) is the index, by default this is numbers each row starting from zero.\n", + "The second column is our data, stored i the same order we entered it in our list.\n", + "Finally at the bottom there is the `dtype` which stands for 'data type' which is telling us that all our data is being stored as a 64-bit integer.\n", + "Usually you can ignore the `dtype` until you start doing more advanced things.\n", + "\n", + "We previously came across `dtype`s when learing about NumPy. This is because `pandas` uses NumPy as its underlying library. A `pandas.Series` is essentially a `np.array` with some extra features wrapped around it.\n", + "\n", + "In the first example above we allowed pandas to automatically create an index for our `Series` (this is the `0`, `1`, `2`, etc. in the left column) but often you will want to specify one yourself" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "a 14\n", + "b 7\n", + "c 3\n", + "d -7\n", + "e 8\n", + "dtype: int64\n" + ] + } + ], + "source": [ + "s = Series([14, 7, 3, -7, 8], index=['a', 'b', 'c', 'd', 'e'])\n", + "print(s)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use this index to retrieve individual rows" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "14" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s['a']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "to replace values in the series" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "s['c'] = -1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "or to get a set of rows" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "a 14\n", + "c -1\n", + "d -7\n", + "dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s[['a', 'c', 'd']]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Make a Series\n", + "\n", + "- Create a Pandas `Series` with 10 or so elements where the indices are years and the values are numbers.\n", + "- Experiment with retrieving elements from the `Series`.\n", + "- Try making another `Series` with duplicate values in the index, what happens when you access those elements?\n", + "- How does a Pandas `Series` differ from a Python `list` or `dict`?\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Make a Series\n", + "* Ten elements indexed by ten years:\n", + "\n", + "```python\n", + "my_series = Series(range(10), index=range(1920, 2020, 10))\n", + "print('My series:')\n", + "print(my_series)\n", + "print()\n", + "print('My series for 1990:')\n", + "print(my_series[1990])\n", + "print()\n", + "```\n", + "\n", + "Output:\n", + "```bash\n", + "My series:\n", + "1920 0\n", + "1930 1\n", + "1940 2\n", + "1950 3\n", + "1960 4\n", + "1970 5\n", + "1980 6\n", + "1990 7\n", + "2000 8\n", + "2010 9\n", + "dtype: int64\n", + "\n", + "My series for 1990:\n", + "7\n", + "```\n", + "\n", + "* Another series with a repeated index:\n", + "\n", + "```python\n", + "another_series = Series(range(5), index=['a', 'b', 'b', 'c', 'd'])\n", + "print('Another series, but with duplicated index:')\n", + "print(another_series)\n", + "print()\n", + "print('Another series accessing duplicated index:')\n", + "print(another_series['b'])\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "Another series, but with duplicated index:\n", + "a 0\n", + "b 1\n", + "b 2\n", + "c 3\n", + "d 4\n", + "dtype: int64\n", + "\n", + "Another series accessing duplicated index:\n", + "b 1\n", + "b 2\n", + "dtype: int64\n", + "```\n", + "\n", + "* Series are different to lists since they must contain all the same data type.\n", + "* Series are different to dicts since they can have keys with multiple values." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Series operations\n", + "\n", + "A `Series` is `list`-like in the sense that it is an ordered set of values. It is also `dict`-like since its entries can be accessed via key lookup. One very important way in which is differs is how it allows operations to be done over the whole `Series` in one go, a technique often referred to as 'broadcasting'. It should also be noted, that since these series objects are based on NumPy arrays, any slicing or bradcasting operation in this section can also be applied to a NumPy array, with the same result.\n", + "\n", + "A simple example is wanting to double the value of every entry in a set of data. In standard Python, you might have a list like" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [3, 6, 8, 4, 10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you wanted to double every entry you might try simply multiplying the list by `2`:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[3, 6, 8, 4, 10, 3, 6, 8, 4, 10]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list * 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "but as you can see, that simply duplicated the elements. Instead you would have to use a `for` loop or a list comprehension:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[6, 12, 16, 8, 20]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[i * 2 for i in my_list]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With a pandas `Series`, however, you can perform bulk mathematical operations to the whole series in one go:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 3\n", + "1 6\n", + "2 8\n", + "3 4\n", + "4 10\n", + "dtype: int64\n" + ] + } + ], + "source": [ + "my_series = Series(my_list)\n", + "print(my_series)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 6\n", + "1 12\n", + "2 16\n", + "3 8\n", + "4 20\n", + "dtype: int64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_series * 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As well as bulk modifications, you can perform bulk selections by putting more complex statements in the square brackets:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "c -1\n", + "d -7\n", + "dtype: int64" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s[s < 0] # All negative entries" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "a 14\n", + "b 7\n", + "e 8\n", + "dtype: int64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s[(s * 2) > 4] # All entries which, when doubled are greater than 4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These operations work because the `Series` index selection can be passed a series of `True` and `False` values which it then uses to filter the result:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "a True\n", + "b True\n", + "c False\n", + "d False\n", + "e True\n", + "dtype: bool" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(s * 2) > 4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here you can see that the rows `a`, `b` and `e` are `True` while the others are `False`. Passing this to `s[...]` will only show rows that are `True`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Multi-Series operations\n", + "\n", + "It is also possible to perform operations between two `Series` objects:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 16\n", + "1 -1\n", + "2 29\n", + "3 3\n", + "4 2\n", + "dtype: int64" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s2 = Series([23,5,34,7,5])\n", + "s3 = Series([7, 6, 5,4,3])\n", + "s2 - s3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Broadcasting\n", + "\n", + "- Create two `Series` objects of equal length with no specified index and containing any values you like. Perform some mathematical operations on them and experiment to make sure it works how you think.\n", + "- What happens then you perform an operation on two series which have different lengths? How does this change when you give the series some indices?\n", + "- Using the `Series` from the first exercise with the years for the index, Select all entries with even-numbered years. Also, select all those with odd-numbered years.\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Broadcasting\n", + "* Two series of the same size with no index broadcast together, element for element.\n", + "\n", + "```python\n", + "series_a = Series(range(5))\n", + "series_b = Series(range(5,10))\n", + "print(series_a*series_b)\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "0 0\n", + "1 6\n", + "2 14\n", + "3 24\n", + "4 36\n", + "dtype: int64\n", + "```\n", + "\n", + "* Two series of the different sizes with no index broadcast together, element for element, until one series runs out of elements, every element after that in the longer series is set to `NaN` (not a number).\n", + "\n", + "```python\n", + "series_c = Series(range(7))\n", + "print(series_a + series_c)\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "0 0.0\n", + "1 2.0\n", + "2 4.0\n", + "3 6.0\n", + "4 8.0\n", + "5 NaN\n", + "6 NaN\n", + "dtype: float64\n", + "```\n", + "\n", + "* Two series of the different sizes each with an index broadcast together, index for index, any elements that don't have a matching index are set to `NaN` (not a number).\n", + "\n", + "```python\n", + "series_d = Series(range(5), index=range(10,60,10))\n", + "series_e = Series(range(7), index=range(30,100,10))\n", + "print(series_d + series_e)\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "10 NaN\n", + "20 NaN\n", + "30 2.0\n", + "40 4.0\n", + "50 6.0\n", + "60 NaN\n", + "70 NaN\n", + "80 NaN\n", + "90 NaN\n", + "dtype: float64\n", + "```\n", + "." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DataFrame\n", + "\n", + "While you can think of the `Series` as a one-dimensional list of data, pandas' `DataFrame` is a two (or possibly more) dimensional table of data. You can think of each column in the table as being a `Series`." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "data = {'city': ['Paris', 'Paris', 'Paris', 'Paris',\n", + " 'London', 'London', 'London', 'London',\n", + " 'Rome', 'Rome', 'Rome', 'Rome'],\n", + " 'year': [2001, 2008, 2009, 2010,\n", + " 2001, 2006, 2011, 2015,\n", + " 2001, 2006, 2009, 2012],\n", + " 'pop': [2.148, 2.211, 2.234, 2.244,\n", + " 7.322, 7.657, 8.174, 8.615,\n", + " 2.547, 2.627, 2.734, 2.627]}\n", + "df = DataFrame(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This has created a `DataFrame` from the dictionary `data`. The keys will become the column headers and the values will be the values in each column. As with the `Series`, an index will be created automatically." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
cityyearpop
0Paris20012.148
1Paris20082.211
2Paris20092.234
3Paris20102.244
4London20017.322
5London20067.657
6London20118.174
7London20158.615
8Rome20012.547
9Rome20062.627
10Rome20092.734
11Rome20122.627
\n", + "
" + ], + "text/plain": [ + " city year pop\n", + "0 Paris 2001 2.148\n", + "1 Paris 2008 2.211\n", + "2 Paris 2009 2.234\n", + "3 Paris 2010 2.244\n", + "4 London 2001 7.322\n", + "5 London 2006 7.657\n", + "6 London 2011 8.174\n", + "7 London 2015 8.615\n", + "8 Rome 2001 2.547\n", + "9 Rome 2006 2.627\n", + "10 Rome 2009 2.734\n", + "11 Rome 2012 2.627" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or, if you just want a peek at the data, you can just grab the first few rows with:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
cityyearpop
0Paris20012.148
1Paris20082.211
2Paris20092.234
\n", + "
" + ], + "text/plain": [ + " city year pop\n", + "0 Paris 2001 2.148\n", + "1 Paris 2008 2.211\n", + "2 Paris 2009 2.234" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since we passed in a dictionary to the `DataFrame` constructor, the order of the columns will not necessarilly match the order in which you defined them. To enforce a certain order, you can pass a `columns` argument to the constructor giving a list of the columns in the order you want them:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
yearcitypop
02001Paris2.148
12008Paris2.211
22009Paris2.234
32010Paris2.244
42001London7.322
52006London7.657
62011London8.174
72015London8.615
82001Rome2.547
92006Rome2.627
102009Rome2.734
112012Rome2.627
\n", + "
" + ], + "text/plain": [ + " year city pop\n", + "0 2001 Paris 2.148\n", + "1 2008 Paris 2.211\n", + "2 2009 Paris 2.234\n", + "3 2010 Paris 2.244\n", + "4 2001 London 7.322\n", + "5 2006 London 7.657\n", + "6 2011 London 8.174\n", + "7 2015 London 8.615\n", + "8 2001 Rome 2.547\n", + "9 2006 Rome 2.627\n", + "10 2009 Rome 2.734\n", + "11 2012 Rome 2.627" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "DataFrame(data, columns=['year', 'city', 'pop'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When we accessed elements from a `Series` object, it would select an element by row. However, by default `DataFrame`s index primarily by column. You can access any column directly by using square brackets or by named attributes:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2001\n", + "1 2008\n", + "2 2009\n", + "3 2010\n", + "4 2001\n", + "5 2006\n", + "6 2011\n", + "7 2015\n", + "8 2001\n", + "9 2006\n", + "10 2009\n", + "11 2012\n", + "Name: year, dtype: int64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['year']" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Paris\n", + "1 Paris\n", + "2 Paris\n", + "3 Paris\n", + "4 London\n", + "5 London\n", + "6 London\n", + "7 London\n", + "8 Rome\n", + "9 Rome\n", + "10 Rome\n", + "11 Rome\n", + "Name: city, dtype: object" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.city" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Accessing a column like this returns a `Series` which will act in the same way as those we were using earlier.\n", + "\n", + "Note that there is one additional part to this output, `Name: city`. Pandas has remembered that this `Series` was created from the `'city'` column in the `DataFrame`." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pandas.core.series.Series" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(df.city)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 True\n", + "1 True\n", + "2 True\n", + "3 True\n", + "4 False\n", + "5 False\n", + "6 False\n", + "7 False\n", + "8 False\n", + "9 False\n", + "10 False\n", + "11 False\n", + "Name: city, dtype: bool" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.city == 'Paris'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This has created a new `Series` which has `True` set where the city is Paris and `False` elsewhere.\n", + "\n", + "We can use filtered `Series` like this to filter the `DataFrame` as a whole. `df.city == 'Paris'` has returned a `Series` containing booleans. Passing it back into `df` as an indexing operation will use it to filter based on the `'city'` column." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
cityyearpop
0Paris20012.148
1Paris20082.211
2Paris20092.234
3Paris20102.244
\n", + "
" + ], + "text/plain": [ + " city year pop\n", + "0 Paris 2001 2.148\n", + "1 Paris 2008 2.211\n", + "2 Paris 2009 2.234\n", + "3 Paris 2010 2.244" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.city == 'Paris']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can then carry on and grab another column after that filter:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 2001\n", + "1 2008\n", + "2 2009\n", + "3 2010\n", + "Name: year, dtype: int64" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df.city == 'Paris'].year" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to select a **row** from a `DataFrame` then you can use the `.loc` attribute which allows you to pass index values like:" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "city Paris\n", + "year 2009\n", + "pop 2.234\n", + "Name: 2, dtype: object" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[2]" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'Paris'" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[2]['city']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Adding new columns\n", + "\n", + "New columns can be added to a `DataFrame` simply by assigning them by index (as you would for a Python `dict`) and can be deleted with the `del` keyword in the same way:" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
cityyearpopcontinental
0Paris20012.148True
1Paris20082.211True
2Paris20092.234True
3Paris20102.244True
4London20017.322False
5London20067.657False
6London20118.174False
7London20158.615False
8Rome20012.547True
9Rome20062.627True
10Rome20092.734True
11Rome20122.627True
\n", + "
" + ], + "text/plain": [ + " city year pop continental\n", + "0 Paris 2001 2.148 True\n", + "1 Paris 2008 2.211 True\n", + "2 Paris 2009 2.234 True\n", + "3 Paris 2010 2.244 True\n", + "4 London 2001 7.322 False\n", + "5 London 2006 7.657 False\n", + "6 London 2011 8.174 False\n", + "7 London 2015 8.615 False\n", + "8 Rome 2001 2.547 True\n", + "9 Rome 2006 2.627 True\n", + "10 Rome 2009 2.734 True\n", + "11 Rome 2012 2.627 True" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['continental'] = (df.city != 'London')\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "del df['continental']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Making your own dataframe\n", + "- Create the `DataFrame` containing the census data for the three cities as we did above.\n", + "- Select the data for the year 2001. Which city had the smallest population that year?\n", + "- Find all the cities which had a population smaller than 2.6 million.\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Making your own dataframe\n", + "* To setup the dataframe as before:\n", + "\n", + "```python\n", + "data = {'city': ['Paris', 'Paris', 'Paris', 'Paris',\n", + " 'London', 'London', 'London', 'London',\n", + " 'Rome', 'Rome', 'Rome', 'Rome'],\n", + " 'year': [2001, 2008, 2009, 2010,\n", + " 2001, 2006, 2011, 2015,\n", + " 2001, 2006, 2009, 2012],\n", + " 'pop': [2.148, 2.211, 2.234, 2.244,\n", + " 7.322, 7.657, 8.174, 8.615,\n", + " 2.547, 2.627, 2.734, 2.627]}\n", + "df = DataFrame(data)\n", + "```\n", + "\n", + "Output: (no output)\n", + "\n", + "* To select the data for the year 2001:\n", + "\n", + "```python\n", + "print(df[df['year'] == 2001])\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + " city year pop\n", + "0 Paris 2001 2.148\n", + "4 London 2001 7.322\n", + "8 Rome 2001 2.547\n", + "```\n", + "* To find all cities with population less than 2.6 million:\n", + "\n", + "```python\n", + "print(df[df['pop'] < 2.6].city)\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "0 Paris\n", + "1 Paris\n", + "2 Paris\n", + "3 Paris\n", + "8 Rome\n", + "Name: city, dtype: object\n", + "```\n", + "." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Points:\n", + "* `Series` converts lists to pandas series.\n", + "* Series can be sliced and broadcast together.\n", + "* Pandas is a wrapper around NumPy.\n", + "* By extension NumPy arrays can be sliced and broadcast together in the same way.\n", + "* `DataFrame` converts dicts to pandas data frames." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/nbplain/05_pandas_pt2.ipynb b/nbplain/05_pandas_pt2.ipynb new file mode 100644 index 0000000..1a88974 --- /dev/null +++ b/nbplain/05_pandas_pt2.ipynb @@ -0,0 +1,741 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "# Reading a file with pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview:\n", + "- **Teaching:** 10 min\n", + "- **Exercises:** 5 min\n", + "\n", + "**Questions**\n", + "* How can I read my data file into pandas?\n", + "\n", + "**Objectives**\n", + "* Use pandas to read in a CSV file.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One of the most common situations is that you have some data file containing the data you want to read. Perhaps this is data you've produced yourself or maybe it's from a collegue. In an ideal world the file will be perfectly formatted and will be trivial to import into pandas but since this is so often not the case, it provides a number of features to make your ife easier.\n", + "\n", + "Full information on reading and writing is available in the pandas manual on [IO tools](http://pandas.pydata.org/pandas-docs/stable/io.html) but first it's worth noting the common formats that pandas can work with:\n", + "- Comma separated tables (or tab-separated or space-separated etc.)\n", + "- Excel spreadsheets\n", + "- HDF5 files\n", + "- SQL databases\n", + "\n", + "For this lesson we will focus on plain-text CSV files as they are perhaps the most common format. Imagine we have a CSV file like (if you are not running on notebooks.azure.com you will need to download this file from [city_pop.csv](../data/city_pop.csv)):" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This is an example CSV file\r\n", + "The text at the top here is not part of the data but instead is here\r\n", + "to describe the file. You'll see this quite often in real-world data.\r\n", + "A -1 signifies a missing value.\r\n", + "\r\n", + "year;London;Paris;Rome\r\n", + "2001;7.322;2.148;2.547\r\n", + "2006;7.652;;2.627\r\n", + "2008;-1;2.211;\r\n", + "2009;-1;2.234;2.734\r\n", + "2011;8.174;;\r\n", + "2012;-1;2.244;2.627\r\n", + "2015;8.615;;\r\n" + ] + } + ], + "source": [ + "!cat ../data/city_pop.csv # Uses the IPython 'magic' !cat to print the file" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use the pandas function `read_csv()` to read the file and convert it to a `DataFrame`. Full documentation for this function can be found in [the manual](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) or, as with any Python object, directly in the notebook by typing `help(pd.read_csv)`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
This is an example CSV file
0The text at the top here is not part of the da...
1to describe the file. You'll see this quite of...
2A -1 signifies a missing value.
3year;London;Paris;Rome
42001;7.322;2.148;2.547
52006;7.652;;2.627
62008;-1;2.211;
72009;-1;2.234;2.734
82011;8.174;;
92012;-1;2.244;2.627
102015;8.615;;
\n", + "
" + ], + "text/plain": [ + " This is an example CSV file\n", + "0 The text at the top here is not part of the da...\n", + "1 to describe the file. You'll see this quite of...\n", + "2 A -1 signifies a missing value.\n", + "3 year;London;Paris;Rome\n", + "4 2001;7.322;2.148;2.547\n", + "5 2006;7.652;;2.627\n", + "6 2008;-1;2.211;\n", + "7 2009;-1;2.234;2.734\n", + "8 2011;8.174;;\n", + "9 2012;-1;2.244;2.627\n", + "10 2015;8.615;;" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "csv_file = '../data/city_pop.csv'\n", + "pd.read_csv(csv_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see that by default it's done a fairly bad job of parsing the file (this is mostly because it has been construsted to be as obtuse as possible). It's making a lot of assumptions about the structure of the file but in general it's taking quite a naïve approach.\n", + "\n", + "The first this we notice is that it's treating the text at the top of the file as though it's data. Checking [the documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) we see that the simplest way to solve this is to use the `skiprows` argument to the function to which we give an integer giving the number of rows to skip:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
year;London;Paris;Rome
02001;7.322;2.148;2.547
12006;7.652;;2.627
22008;-1;2.211;
32009;-1;2.234;2.734
42011;8.174;;
52012;-1;2.244;2.627
62015;8.615;;
\n", + "
" + ], + "text/plain": [ + " year;London;Paris;Rome\n", + "0 2001;7.322;2.148;2.547\n", + "1 2006;7.652;;2.627\n", + "2 2008;-1;2.211;\n", + "3 2009;-1;2.234;2.734\n", + "4 2011;8.174;;\n", + "5 2012;-1;2.244;2.627\n", + "6 2015;8.615;;" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv(csv_file,\n", + " skiprows=5,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Information: Editing cells\n", + "If you are following along with this material in a notebook, don't forget you can edit a cell and execute it again.\n", + "In this lesson, you can just keep modifying the input to the `read_csv()` function and re-execute the cell, rather than making a new cell for each modification." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next most obvious problem is that it is not separating the columns at all. This is controlled by the `sep` argument which is set to `','` by default (hence *comma* separated values). We can simply set it to the appropriate semi-colon:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
yearLondonParisRome
020017.3222.1482.547
120067.652NaN2.627
22008-1.0002.211NaN
32009-1.0002.2342.734
420118.174NaNNaN
52012-1.0002.2442.627
620158.615NaNNaN
\n", + "
" + ], + "text/plain": [ + " year London Paris Rome\n", + "0 2001 7.322 2.148 2.547\n", + "1 2006 7.652 NaN 2.627\n", + "2 2008 -1.000 2.211 NaN\n", + "3 2009 -1.000 2.234 2.734\n", + "4 2011 8.174 NaN NaN\n", + "5 2012 -1.000 2.244 2.627\n", + "6 2015 8.615 NaN NaN" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv(csv_file,\n", + " skiprows=5,\n", + " sep=';'\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Reading the descriptive header of our data file we see that a value of `-1` signifies a missing reading so we should mark those too. This can be done after the fact but it is simplest to do it at import-time using the `na_values` argument:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
yearLondonParisRome
020017.3222.1482.547
120067.652NaN2.627
22008NaN2.211NaN
32009NaN2.2342.734
420118.174NaNNaN
52012NaN2.2442.627
620158.615NaNNaN
\n", + "
" + ], + "text/plain": [ + " year London Paris Rome\n", + "0 2001 7.322 2.148 2.547\n", + "1 2006 7.652 NaN 2.627\n", + "2 2008 NaN 2.211 NaN\n", + "3 2009 NaN 2.234 2.734\n", + "4 2011 8.174 NaN NaN\n", + "5 2012 NaN 2.244 2.627\n", + "6 2015 8.615 NaN NaN" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.read_csv(csv_file,\n", + " skiprows=5,\n", + " sep=';',\n", + " na_values='-1'\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The last this we want to do is use the `year` column as the index for the `DataFrame`. This can be done by passing the name of the column to the `index_col` argument:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
LondonParisRome
year
20017.3222.1482.547
20067.652NaN2.627
2008NaN2.211NaN
2009NaN2.2342.734
20118.174NaNNaN
2012NaN2.2442.627
20158.615NaNNaN
\n", + "
" + ], + "text/plain": [ + " London Paris Rome\n", + "year \n", + "2001 7.322 2.148 2.547\n", + "2006 7.652 NaN 2.627\n", + "2008 NaN 2.211 NaN\n", + "2009 NaN 2.234 2.734\n", + "2011 8.174 NaN NaN\n", + "2012 NaN 2.244 2.627\n", + "2015 8.615 NaN NaN" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df3 = pd.read_csv(csv_file,\n", + " skiprows=5,\n", + " sep=';',\n", + " na_values='-1',\n", + " index_col='year'\n", + " )\n", + "df3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Comma separated files\n", + "- There is another file called `cetml1659on.dat` (available from [here](../data/cetml1659on.dat)). This contains some historical weather data for a location in the UK. Import that file as a Pandas `DataFrame` using `read_csv()`, making sure that you cover all the NaN values. Be sure to look at the [documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv) for `read_csv()`.\n", + "- How many years had a negative average temperature in January?\n", + "- What was the average temperature in June over the years in the data set? Tip: look in the [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) for which method to call.\n", + "\n", + "We will come back to this data set at a later stage.\n", + "\n", + "Hints for the first part:\n", + "* The syntax for whitespace delimited data is `sep='\\s+'`, which is not immediately obvious from the documentation.\n", + "* The data is almost comlete (which is unusual for scientific data) and there are only two invalid entries. Look at the last row of the file and, given that the data is temperature data, deduce which values need to be `na_values`. (You can use a list to give multiple `na_values`)\n", + "* If you can't work out how to do the first part of this exercise, take a look at the solutions.\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Comma separated files\n", + "* Read in the CSV file, skipping the first 6 rows, using whitespace to separate data, invalid data -99.9 and -99.99:\n", + "\n", + "```python\n", + "import pandas as pd\n", + "\n", + "weather_csv = 'cetml1659on.dat'\n", + "weather_df = pd.read_csv(weather_csv,\n", + " skiprows=6,\n", + " sep='\\s+',\n", + " na_values=['-99.9', '-99.99']\n", + " )\n", + "print(weather_df.head())\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + " JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC YEAR\n", + "1659 3.0 4.0 6.0 7.0 11.0 13.0 16.0 16.0 13.0 10.0 5.0 2.0 8.87\n", + "1660 0.0 4.0 6.0 9.0 11.0 14.0 15.0 16.0 13.0 10.0 6.0 5.0 9.10\n", + "1661 5.0 5.0 6.0 8.0 11.0 14.0 15.0 15.0 13.0 11.0 8.0 6.0 9.78\n", + "1662 5.0 6.0 6.0 8.0 11.0 15.0 15.0 15.0 13.0 11.0 6.0 3.0 9.52\n", + "1663 1.0 1.0 5.0 7.0 10.0 14.0 15.0 15.0 13.0 10.0 7.0 5.0 8.63\n", + "\n", + "```\n", + "\n", + "* Select all data in the January column less that 0, use `len()` so we don't have to count the rows ourself.\n", + "\n", + "```python\n", + "weather_df[weather_df['JAN'] < 0] # Would output all the entries\n", + "len(weather_df[weather_df['JAN'] < 0]) # Just counts the number of rows\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "20\n", + "```\n", + "\n", + "* The average of the data can be found using the `.mean()` method:\n", + "\n", + "```python\n", + "weather_df['JUN'].mean()\n", + "```\n", + "\n", + "Output:\n", + "```brainfuck\n", + "14.325977653631282\n", + "```\n", + "." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Points:\n", + "* Pandas provides the `read_csv()` function for reading in CSV files.\n", + "* Although it saves us a lot of work the syntax can be quite tricky." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks_plain/06_matplotlib_pt1.ipynb b/nbplain/06_matplotlib_pt1.ipynb similarity index 99% rename from notebooks_plain/06_matplotlib_pt1.ipynb rename to nbplain/06_matplotlib_pt1.ipynb index 13342d7..d429ff5 100644 --- a/notebooks_plain/06_matplotlib_pt1.ipynb +++ b/nbplain/06_matplotlib_pt1.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Overview\n", + "## Overview:\n", "- **Teaching:** 5 min\n", "- **Exercises:** 10 min\n", "\n", @@ -1560,7 +1560,9 @@ "## Exercise: Summer climate\n", "* Try reproducing the plot above but for the month of June.\n", "* Try putting in two `plot()` calls with different months (both January and June for example) before calling `show()`.\n", - "* Add a legend to distinguish the two lines." + "* Add a legend to distinguish the two lines.\n", + "\n", + "[Solution]()" ] }, { @@ -1772,7 +1774,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Info: Lambda functions\n", + "## Information: Lambda functions\n", "You may be wondering how\n", "```python\n", "lambda x: x[:3]+'0'\n", @@ -3746,14 +3748,16 @@ "2. Plot a *histogram* of the average annual temperature. Make sure that the x-axis is labelled correctly.\n", "Hint: Look in the [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html) for the right command to run.\n", " \n", - "3. Plot a scatter plot of each year's February temperature plotted against that year's January temperature. Is there an obvious correlation?" + "3. Plot a scatter plot of each year's February temperature plotted against that year's January temperature. Is there an obvious correlation?\n", + "\n", + "[Solution]()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Solution: Other graphs\n", + "## Solution:+ Other graphs\n", "1.Code for a bar chart of average temperature per century:\n", "\n", "```python\n", @@ -3792,7 +3796,6 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Solution: Other graphs\n", "2.Code for a histogram of the average annual temperature:\n", "\n", "```python\n", @@ -3817,7 +3820,6 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Solution: Other graphs\n", "3.Code to plot a scatter diagram of each year's February temperature plotted against that year's January temperature:\n", "\n", "```python\n", @@ -3834,14 +3836,16 @@ "plt.show()\n", "```\n", "which produces the plot:\n", - "![](../images/other_graphs_3.png)" + "![](../images/other_graphs_3.png)\n", + "\n", + ":solution+" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Key Points\n", + "## Key Points:\n", "* We can plot a function quickly using `plot.plot(x, y)`.\n", "* We can (and should) add a title and axes labels to our plots.\n", "* The `bar()` function creates bar charts.\n", @@ -3866,7 +3870,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.4" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/nbplain/07_matplotlib_pt2.ipynb b/nbplain/07_matplotlib_pt2.ipynb new file mode 100644 index 0000000..1ab0d50 --- /dev/null +++ b/nbplain/07_matplotlib_pt2.ipynb @@ -0,0 +1,10134 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Customising matplotlib output" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview:\n", + "- **Teaching:** 10 min\n", + "- **Exercises:** 10 min\n", + "\n", + "**Questions**\n", + "* How can I customise my plot?\n", + "* How can I make the graph look the way I want it to?\n", + "* What other features does matplotlib have?\n", + "\n", + "**Objectives**\n", + "* Learn how to change the limits on plot axes.\n", + "* Change the line colours and styles.\n", + "* Change the ticks and tick labels on axes.\n", + "* See how a graph can be annotated and customised.\n", + "* Save a graph as an image." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Making it better\n", + "\n", + "Previous page we saw how we could plot the sinc function and from data in a dataframe, and we customised it by adding axes labels and a title. \n", + "\n", + "This is good practise, but what if we want to modify the graph itself? Maptplotlib has a rich feature set which we will explore in the following examples. First lets set up matplotlib:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "%config InlineBackend.figure_format = 'svg'\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can make use of the same feature set with numpy:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import numpy as np\n", + "\n", + "\n", + "x = np.linspace(-5, 5, 1000)\n", + "# sinc(x) is defined to be sin(pi*x)/(pi*x) in numpy\n", + "y_sinc = np.sinc(x)\n", + "\n", + "plt.plot(x, y_sinc, label='sinc(x)')\n", + "\n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x')\n", + "plt.ylabel('f(x)')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setting limits\n", + "\n", + "Sinc(x) is bounded above and below by the two functions\n", + "$$\\frac{1}{\\pi x} \\text{ and } \\frac{-1}{\\pi x}$$\n", + "\n", + "We can add these to the plot by making two further NumPy arrays. Notice we have to mask the arrays, this just prevents us plotting over the asymptote. If you don't like maths, bear with me, we'll be back to the plotting soon!" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# sinc(x) bounded by plus/minus 1/(pi*x)\n", + "y_above = 1/(np.pi*x)\n", + "y_below = -1/(np.pi*x)\n", + "\n", + "# mask out very large values\n", + "y_above = np.ma.masked_outside(y_above, -60, 60)\n", + "y_below = np.ma.masked_outside(y_below, -60, 60)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then plot all three sets of y values on the same axes as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(x, y_sinc, label='sinc(x)')\n", + "plt.plot(x, y_above, label='$1/\\pi x$')\n", + "plt.plot(x, y_below, label='$-1/\\pi x$')\n", + "\n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x')\n", + "plt.ylabel('f(x)')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "However, we seem to have lost the shape of the sinc function and all we can see is the new functions we have plotted. We can fix this by setting what the limits on the x and y axes are. This is done with the `xlim` and `ylim` functions respectively." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Information: Editing cells, again\n", + "If you are following along with this material in a notebook, recall you can edit a cell and execute it again.\n", + "In this lesson, you can just keep modifying the input to the code block you use for plotting and re-execute the cell, rather than making a new cell for each modification." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(x, y_sinc, label='sinc(x)')\n", + "plt.plot(x, y_above, label='$1/\\pi x$')\n", + "plt.plot(x, y_below, label='$-1/\\pi x$')\n", + "\n", + "# Set new limits\n", + "plt.xlim(-5, 5)\n", + "plt.ylim(-0.5, 1.2)\n", + "\n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x')\n", + "plt.ylabel('f(x)')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can see the sinc function again. Notice alse that the sinc function touces both edges of the graph and is no longer floating in the center." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Changing colours and line widths\n", + "\n", + "We can also change the colours and style of the lines we use in the plot. We can do this explicitly using the keyword arguments `color`, `linewidth` and `linestyle` or with format strings, which are documented [here](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot). Now the important sinc function is bolder than the bounding lines and the positive and negative bounds are different colours." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\")\n", + "plt.plot(x, y_above, 'k--',label='$1/\\pi x$')\n", + "plt.plot(x, y_below, 'r--', label='$-1/\\pi x$')\n", + "\n", + "# Set limits\n", + "plt.xlim(-5, 5)\n", + "plt.ylim(-0.5, 1.2)\n", + "\n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x')\n", + "plt.ylabel('f(x)')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setting ticks\n", + "\n", + "Currently there are only x ticks at the even integers, and the y ticks are quite dense. If we want more or fewer ticks we can use the `xticks` and `yticks` functions." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\")\n", + "plt.plot(x, y_above, 'k--',label='$1/\\pi x$')\n", + "plt.plot(x, y_below, 'r--', label='$-1/\\pi x$')\n", + "\n", + "# Set limits\n", + "plt.xlim(-5, 5)\n", + "plt.ylim(-0.5, 1.2)\n", + "\n", + "# Set ticks\n", + "plt.xticks(range(-5,6))\n", + "plt.yticks([-0.5, 0, 0.5, 1])\n", + "\n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x')\n", + "plt.ylabel('f(x)')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Information: Setting tick labels\n", + "\n", + "When we set tick values, we can also provide a corresponding label in the second argument list. We can even use LaTeX to allow for nice rendering of the label. This is useful for trigonometric functions where we might want axis labels that are multiples of $\\pi$. For example:\n", + "```python\n", + "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n", + " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n", + "```\n", + "." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Customising pandas plots\n", + "Pandas plots can be customised in exactly the same way, using your solution to the Summer Climate exercise on the previous page, make the following changes\n", + "\n", + "- Using the temperature dataset, set the colours of the July and January lines to a warm colour and a cool colour.\n", + "- Add in the yearly average column to the plot with a dashed line style.\n", + "- (Harder) Add an annotation to one of the spikes in the data. Make sure the label is placed nicely.\n", + "\n", + "Hint: you can get the year and temperature for a spike using:\n", + "```python\n", + "warm_winter_year = df['JAN'].idxmax()\n", + "warm_winter_temp = df['JAN'].max()\n", + "```\n", + "- Save the figure to a file and display it in your Jupyter notebook.\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Customising pandas plots\n", + "Full working code for this exercise is\n", + "```python\n", + "import pandas as pd\n", + "\n", + "# Import the data \n", + "csv_file = 'cetml1659on.dat'\n", + "df = pd.read_csv(csv_file, # file name\n", + " skiprows=6, # skip header\n", + " sep='\\s+', # whitespace separated\n", + " na_values=['-99.9', '-99.99'] # NaNs\n", + " )\n", + "\n", + "# Plot the January and June values\n", + "df['JAN'].plot(color='cyan')\n", + "df['JUN'].plot(color='orange')\n", + "df['YEAR'].plot(color='black', linestyle=':')\n", + "\n", + "# Add a title and axes labels\n", + "plt.title('Summer, Winter and average Climate Plots')\n", + "plt.xlabel('Year')\n", + "plt.ylabel('Temperature ($^\\circ$C)')\n", + "\n", + "# Add a legend\n", + "plt.legend()\n", + "\n", + "# Find warm winter year point\n", + "warm_winter_year = df['JAN'].idxmax()\n", + "warm_winter_temp = df['JAN'].max()\n", + "\n", + "# Annotate plot\n", + "plt.annotate('A warm winter',\n", + " xy=(warm_winter_year, warm_winter_temp),\n", + " xytext=(-150, -100), textcoords='offset points', fontsize=14,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n", + "\n", + "plt.savefig('fancy_summer_climate.png')\n", + "# display with ![](fancy_summer_climate.png)\n", + "```\n", + "This produces the figure:\n", + "![](../images/fancy_summer_climate.png)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Moving spines - Back to sinc ...\n", + "\n", + "Our plotting routine is begining to get more complicated. What follows is designed to show you what is possible and give a reference you can come back to. Feel free to follow along, or just read through the next part just to see what is possible.\n", + "\n", + "Spines are the lines connecting the axis tick marks and noting the boundaries of the data area. They can be placed at arbitrary positions and until now, they were on the border of the axis. Sometimes it is useful to have them in the middle. Since there are four of them (top/bottom/left/right), we’ll discard the top and right by setting their colour to none and we’ll move the bottom and left ones to coordinate 0 in data space coordinates." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\")\n", + "plt.plot(x, y_above, 'k--',label='$1/\\pi x$')\n", + "plt.plot(x, y_below, 'r--', label='$-1/\\pi x$')\n", + "\n", + "# Set limits\n", + "plt.xlim(-5, 5)\n", + "plt.ylim(-0.5, 1.2)\n", + "\n", + "# Set ticks\n", + "plt.xticks(range(-5,6))\n", + "plt.yticks([-0.5, 0, 0.5, 1])\n", + "\n", + "# Move the axis spines\n", + "ax = plt.gca() # gca stands for 'get current axis'\n", + "ax.spines['right'].set_color('none')\n", + "ax.spines['top'].set_color('none')\n", + "ax.xaxis.set_ticks_position('bottom')\n", + "ax.spines['bottom'].set_position(('data',0))\n", + "ax.yaxis.set_ticks_position('left')\n", + "ax.spines['left'].set_position(('data',0))\n", + "\n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x')\n", + "plt.ylabel('f(x)')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In our case the tick labels are now overlapping with the function we have plot, which is a bit of a problem. We can either change the plot back or modify things further.\n", + "\n", + "## Annotate some points\n", + "\n", + "We can annotate some interesting points on the graph using the `annotate` function. We choose the first positive x value where sinc(x) is equal to $1/\\pi x$ and $-1/\\pi x$. This is done by first drawing a marker on the curve as well as a straight dotted line. Then, we’ll use the annotate command to display some text with an arrow.\n", + "\n", + "We also fix our tick labels, by introducing the `zorder` keyword argument, which controls the order in which things are drawn (lower zorder means drawn underneath items with a higher zorder). To make the tick labels stand out even more, we can apply a semi-transparent background (so we can still see the lines passing underneath) and increase the font size.\n", + "\n", + "There are also a few more tweaks to tidy the plot up, like moving the axes labels." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# NB: We had to introduce a zorder parameter here\n", + "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\", zorder=0)\n", + "plt.plot(x, y_above, 'k--',label='$1/\\pi x$', zorder=0.1)\n", + "plt.plot(x, y_below, 'r--', label='$-1/\\pi x$', zorder=0.1)\n", + "\n", + "# Set limits\n", + "plt.xlim(-5, 5)\n", + "plt.ylim(-0.5, 1.2)\n", + "\n", + "# Set ticks\n", + "plt.xticks(range(-5,6))\n", + "plt.yticks([-0.5, 0, 0.5, 1])\n", + "\n", + "# Move the axis spines\n", + "ax = plt.gca() # gca stands for 'get current axis'\n", + "ax.spines['right'].set_color('none')\n", + "ax.spines['top'].set_color('none')\n", + "ax.xaxis.set_ticks_position('bottom')\n", + "ax.spines['bottom'].set_position(('data',0))\n", + "ax.yaxis.set_ticks_position('left')\n", + "ax.spines['left'].set_position(('data',0))\n", + "\n", + "# Annotate the graph\n", + "t = 0.5\n", + "plt.plot([t, t], [0, np.sinc(t)], color='black', linewidth=1, linestyle=\"--\")\n", + "plt.scatter([t], [np.sinc(t)], 50, color='black')\n", + "\n", + "plt.annotate(r'sinc$\\left(\\frac{1}{2}\\right)=\\frac{2}{\\pi}$',\n", + " xy=(t, np.sinc(t)), xycoords='data',\n", + " xytext=(50, 30), textcoords='offset points', fontsize=16,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n", + "\n", + "s = 1.5\n", + "plt.plot([s, s],[0, np.sinc(s)], color='red', linewidth=1, linestyle=\"--\")\n", + "plt.scatter([s],[np.sinc(s)], 50, color='red')\n", + "\n", + "plt.annotate(r'sinc$\\left(\\frac{3}{2}\\right)=\\frac{-2}{3\\pi}$',\n", + " xy=(s, np.sinc(s)), xycoords='data',\n", + " xytext=(30, -30), textcoords='offset points', fontsize=16,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=-.2\"))\n", + "\n", + "# Increase the size of the tick labels in both axes\n", + "# and apply semi-transparent background\n", + "for label in ax.get_xticklabels() + ax.get_yticklabels():\n", + " label.set_fontsize(12)\n", + " label.set_bbox(dict(facecolor='white', edgecolor='none', pad=0.2, alpha=0.7))\n", + " \n", + "# Set title and legend, then show plot\n", + "plt.title('The sinc function')\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x', labelpad=-20, x=1.05, fontsize=12)\n", + "plt.ylabel('f(x)', labelpad=-30)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you can really customise your plots!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Information: Fiddle until it is right\n", + "The original source of this section is inspired by Nicolas P. Rougier's [tutorial](http://www.labri.fr/perso/nrougier/teaching/matplotlib/), which is linked to from matplotlib's own website. The last few tricks for sorting out the spines on the graph no longer work in the latest versions of matplotlib. As a result in Chrys Woods' [tutorial](https://chryswoods.com/python_and_data/) this step is omitted and the graph they save has the spines on the outside, avoiding the issue.\n", + "\n", + "The way to get tick labels to draw on top of plot lines is to use the `zorder` keyword argument, but this isn't so obvious from documentation. We mention this here as a useful reference, and to also show that tweaking your plot to look just right can be tricky, but worth the perseverance." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Saving plot to a file\n", + "\n", + "You can take any plot you've created within jupyter and save it to a file on disk using the `plt.savefig()` function. You give the function the name of the file to create and it will use whatever format is specified by the name. This is useful if you want to use the plot outside of jupyter. It is also possible to generate plots like this in the terminal, where it may be preferable to save straight to disk." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# NB: We had to introduce a zorder parameter here\n", + "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\", zorder=0)\n", + "plt.plot(x, y_above, 'k--',label='$1/\\pi x$', zorder=0.1)\n", + "plt.plot(x, y_below, 'r--', label='$-1/\\pi x$', zorder=0.1)\n", + "\n", + "# Set limits\n", + "plt.xlim(-5, 5)\n", + "plt.ylim(-0.5, 1.2)\n", + "\n", + "# Set ticks\n", + "plt.xticks(range(-5,6))\n", + "plt.yticks([-0.5, 0, 0.5, 1])\n", + "\n", + "# Move the axis spines\n", + "ax = plt.gca() # gca stands for 'get current axis'\n", + "ax.spines['right'].set_color('none')\n", + "ax.spines['top'].set_color('none')\n", + "ax.xaxis.set_ticks_position('bottom')\n", + "ax.spines['bottom'].set_position(('data',0))\n", + "ax.yaxis.set_ticks_position('left')\n", + "ax.spines['left'].set_position(('data',0))\n", + "\n", + "# Annotate the graph\n", + "t = 0.5\n", + "plt.plot([t, t], [0, np.sinc(t)], color='black', linewidth=1, linestyle=\"--\")\n", + "plt.scatter([t], [np.sinc(t)], 50, color='black')\n", + "\n", + "plt.annotate(r'sinc$\\left(\\frac{1}{2}\\right)=\\frac{2}{\\pi}$',\n", + " xy=(t, np.sinc(t)), xycoords='data',\n", + " xytext=(50, 30), textcoords='offset points', fontsize=16,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n", + "\n", + "s = 1.5\n", + "plt.plot([s, s],[0, np.sinc(s)], color='red', linewidth=1, linestyle=\"--\")\n", + "plt.scatter([s],[np.sinc(s)], 50, color='red')\n", + "\n", + "plt.annotate(r'sinc$\\left(\\frac{3}{2}\\right)=\\frac{-2}{3\\pi}$',\n", + " xy=(s, np.sinc(s)), xycoords='data',\n", + " xytext=(30, -30), textcoords='offset points', fontsize=16,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=-.2\"))\n", + "\n", + "# Increase the size of the tick labels in both axes\n", + "# and apply semi-transparent background\n", + "for label in ax.get_xticklabels() + ax.get_yticklabels():\n", + " label.set_fontsize(12)\n", + " label.set_bbox(dict(facecolor='white', edgecolor='none', pad=0.2, alpha=0.7))\n", + " \n", + "# Set title and legend, then SAVE plot\n", + "plt.title('The sinc function', fontsize=20)\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x', labelpad=-20, x=1.05, fontsize=12)\n", + "plt.ylabel('f(x)', labelpad=-30, y=0.45, fontsize=12)\n", + "#plt.show()\n", + "\n", + "# Save final plot\n", + "plt.savefig('../images/sinc.png')\n", + "#You don't need to save in this folder you could just use:\n", + "#plt.savefig('sinc.png')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can then display the figure in jupyter with `![](sinc.png)`\n", + "\n", + "![](../images/sinc.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise: Sine and cosine\n", + "\n", + "Recreate a similar plot to the one above, but using the sine and cosine functions plotted over the range $-\\pi$ to $\\pi$, available in NumPy as `np.sin` and `np.cos`.\n", + "\n", + "[Solution]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution: Sine and cosine\n", + "Complete code for this solution looks like:\n", + "```python\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Data to plot\n", + "x = np.linspace(-np.pi, np.pi, 700)\n", + "y_sin = np.sin(x)\n", + "y_cos = np.cos(x)\n", + "\n", + "# NB: We had to introduce a zorder parameter here\n", + "plt.plot(x, y_cos, label='sin(x)', color=\"blue\", linewidth=2.5, linestyle=\"-\", zorder=0)\n", + "plt.plot(x, y_sin, label='cos(x)', color=\"red\", linewidth=2.5, linestyle=\"-\", zorder=0)\n", + "\n", + "# Set limits\n", + "plt.xlim(-np.pi, np.pi)\n", + "plt.ylim(-1.1, 1.1)\n", + "\n", + "# Set ticks and labels\n", + "plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n", + " [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n", + "\n", + "plt.yticks([-1, 0, +1],\n", + " [r'$-1$', r'$0$', r'$+1$'])\n", + "\n", + "# Move the spines\n", + "ax = plt.gca() # gca stands for 'get current axis'\n", + "ax.spines['right'].set_color('none')\n", + "ax.spines['top'].set_color('none')\n", + "ax.xaxis.set_ticks_position('bottom')\n", + "ax.spines['bottom'].set_position(('data',0))\n", + "ax.yaxis.set_ticks_position('left')\n", + "ax.spines['left'].set_position(('data',0))\n", + "\n", + "# Annotate the graph\n", + "t = 2 * np.pi / 3\n", + "plt.plot([t, t], [0, np.cos(t)], color='blue', linewidth=2.5, linestyle=\"--\")\n", + "plt.scatter([t, ], [np.cos(t), ], 50, color='blue')\n", + "\n", + "plt.annotate(r'$cos(\\frac{2\\pi}{3})=-\\frac{1}{2}$',\n", + " xy=(t, np.cos(t)), xycoords='data',\n", + " xytext=(-90, -50), textcoords='offset points', fontsize=16,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n", + "\n", + "plt.plot([t, t],[0, np.sin(t)], color='red', linewidth=2.5, linestyle=\"--\")\n", + "plt.scatter([t, ],[np.sin(t), ], 50, color='red')\n", + "\n", + "plt.annotate(r'$sin(\\frac{2\\pi}{3})=\\frac{\\sqrt{3}}{2}$',\n", + " xy=(t, np.sin(t)), xycoords='data',\n", + " xytext=(+30, 0), textcoords='offset points', fontsize=16,\n", + " arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n", + "\n", + "# Increase the size of the tick labels in both axes\n", + "# and apply semi-transparent background\n", + "for label in ax.get_xticklabels() + ax.get_yticklabels():\n", + " label.set_fontsize(12)\n", + " label.set_bbox(dict(facecolor='white', edgecolor='none', pad=0.2, alpha=0.7))\n", + "\n", + "# Set title and legend, then SAVE plot\n", + "plt.title('The sine and cosine functions', fontsize=20)\n", + "plt.legend(loc='upper left')\n", + "plt.xlabel('x', labelpad=-20, x=1.05, fontsize=12)\n", + "plt.ylabel('f(x)', labelpad=-20, y=0.7, fontsize=12)\n", + "#plt.show()\n", + "\n", + "plt.savefig('cos_sin.png')\n", + "```\n", + "The code produces the figure:\n", + "![](../images/cos_sin.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Points:\n", + "* The limits on plot axes can be changed with `xlim` and `ylim`.\n", + "* Keyword arguments can be used to change line colours and styles.\n", + "* Alternatively format strings can be used as a shortcut.\n", + "* Ticks and tick labels are changed with `xticks` and `yticks`.\n", + "* A graph can be annotated and almost every element moved.\n", + "* `savefig` saves the figure that we generate as an image." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks_plain/02_numpy_pt1.ipynb b/notebooks_plain/02_numpy_pt1.ipynb deleted file mode 100644 index a70a78c..0000000 --- a/notebooks_plain/02_numpy_pt1.ipynb +++ /dev/null @@ -1,557 +0,0 @@ -{ - "cells": [ - { - "metadata": {}, - "cell_type": "markdown", - "source": "# Introduction to NumPy" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Overview\n- **Teaching:** 15 min\n- **Exercises:** 10 min\n\n**Questions**\n* What is NumPy?\n* Why should I use it?\n\n**Objectives**\n* Use NumPy to convert lists to NumPy arrays.\n* Use NumPy to create arrays from scratch.\n* Manipulate and reshape NumPy arrays." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "NumPy ('Numerical Python') is **the** standard module for doing numerical work in Python. Its main feature is its array data type which allows very compact and efficient storage of homogenous (of the same type) data\n\nThere is a standard convention for importing `numpy`, and that is as `np`:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "import numpy as np", - "execution_count": 1, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Now that we have access to the `numpy` package we can start using its features." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Info: Documentation\nAs you go through this material, you may find it useful to refer to the [NumPy documentation](https://docs.scipy.org/doc/numpy/), particularly the [array objects](https://docs.scipy.org/doc/numpy/reference/arrays.html) section." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Creating arrays from lists\n\nIn many ways a NumPy array can be treated like a standard Python `list` and much of the way you interact with it is identical. Given a list, you can create an array as follows:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "python_list = [1, 2, 3, 4, 5, 6, 7, 8]\nnumpy_array = np.array(python_list)\nprint(numpy_array)", - "execution_count": 2, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": "[1 2 3 4 5 6 7 8]\n" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "# ndim give the number of dimensions\nprint(numpy_array.ndim)", - "execution_count": 3, - "outputs": [ - { - "data": { - "text/plain": "1" - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "# the shape of an array is a tuple of its length in each dimension. In this case it is only 1-dimensional\nprint(numpy_array.shape)", - "execution_count": 4, - "outputs": [ - { - "data": { - "text/plain": "(8,)" - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "# as in standard Python, len() gives a sensible answer\nprint(len(numpy_array))", - "execution_count": 5, - "outputs": [ - { - "data": { - "text/plain": "8" - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "nested_list = [[1, 2, 3], [4, 5, 6]]\ntwo_dim_array = np.array(nested_list)\nprint(two_dim_array)", - "execution_count": 6, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": "[[1 2 3]\n [4 5 6]]\n" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "print(two_dim_array.ndim)", - "execution_count": 7, - "outputs": [ - { - "data": { - "text/plain": "2" - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "print(two_dim_array.shape)", - "execution_count": 8, - "outputs": [ - { - "data": { - "text/plain": "(2, 3)" - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Creating arrays from scratch\n\nIt's very common when working with data to not have it already in a Python list but rather to want to create some data from scratch. `numpy` comes with a whole suite of functions for creating arrays. We will now run through some of the most commonly used.\n\nThe first is `np.arange` (meaning \"array range\") which works in a vary similar fashion the the standard Python `range()` function, including how it defaults to starting from zero, doesn't include the number at the top of the range and how it allows you to specify a 'step:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.arange(10) #0 .. n-1 (!)", - "execution_count": 9, - "outputs": [ - { - "data": { - "text/plain": "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.arange(1, 9, 2) # start, end (exclusive), step", - "execution_count": 10, - "outputs": [ - { - "data": { - "text/plain": "array([1, 3, 5, 7])" - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Next up is the `np.linspace` (meaning \"linear space\") which generates a given floating point numbers starting from the first argument up to the second argument. The third argument defines how many numbers to create:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.linspace(0, 1, 6) # start, end, num-points", - "execution_count": 11, - "outputs": [ - { - "data": { - "text/plain": "array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])" - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Note how it included the end point unlike `arange()`. You can change this feature by using the `endpoint` argument:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.linspace(0, 1, 5, endpoint=False)", - "execution_count": 12, - "outputs": [ - { - "data": { - "text/plain": "array([0. , 0.2, 0.4, 0.6, 0.8])" - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "`np.ones` creates an n-dimensional array filled with the value `1.0`. The argument you give to the function defines the shape of the array:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.ones((3, 3)) # reminder: (3, 3) is a tuple", - "execution_count": 13, - "outputs": [ - { - "data": { - "text/plain": "array([[1., 1., 1.],\n [1., 1., 1.],\n [1., 1., 1.]])" - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Likewise, you can create an array of any size filled with zeros:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.zeros((2, 2))", - "execution_count": 14, - "outputs": [ - { - "data": { - "text/plain": "array([[0., 0.],\n [0., 0.]])" - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "The `np.eye` (referring to the matematical identity matrix, commonly labelled as `I`) creates a square matrix of a given size with `1.0` on the diagonal and `0.0` elsewhere:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.eye(3)", - "execution_count": 15, - "outputs": [ - { - "data": { - "text/plain": "array([[1., 0., 0.],\n [0., 1., 0.],\n [0., 0., 1.]])" - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "The `np.diag` creates a square matrix with the given values on the diagonal and `0.0` elsewhere:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.diag([1, 2, 3, 4])", - "execution_count": 16, - "outputs": [ - { - "data": { - "text/plain": "array([[1, 0, 0, 0],\n [0, 2, 0, 0],\n [0, 0, 3, 0],\n [0, 0, 0, 4]])" - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Finally, you can fill an array with random numbers:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.random.rand(4) # uniform in [0, 1]", - "execution_count": 17, - "outputs": [ - { - "data": { - "text/plain": "array([0.10694928, 0.88985274, 0.63606749, 0.59386516])" - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "np.random.randn(4) # Gaussian or normally distributed", - "execution_count": 18, - "outputs": [ - { - "data": { - "text/plain": "array([-1.81972346, -0.13515826, 1.95490428, 0.70545204])" - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Try executing these cells multiple times and notice how you get a different result each time." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Pen: print()\nIn each of these examples we have omitted the `print()`. How does including it change the output of the cell?" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Different arrays\n- Create at least one one dimensional array with each of `arange`, `linspace` and `ones`.\n- Create at least one two dimensional array with each of `zeros`, `eye` and `diag`.\n- Create at least two arrays with different types of random numbers (eg. uniform and Gaussian random numbers).\n- Look at the function `np.empty`. What does it do? When might this be useful?" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Different arrays\n\n* We will make each array one dimensional, with three values for `arange`, `linspace` and `ones`:\n```python\nnp.arange(3)\nnp.linspace(0,1,3)\nnp.ones(3)\n```\n* We will make each array two dimensional, with three values in each dimension for `zeros`, `eye` and `diag`:\n```python\nnp.zeros((3,3))\nnp.eye(3)\nnp.diag(np.arange(1,4))\n```\n* We will make each array one dimensional, with three values for `random.rand` (uniform random numbers) and `random.randn` (Gaussian):\n```python\nnp.random.rand(3)\nnp.random.randn(3)\n```\n* `np.empty` creates an array of given size eg: `np.empty(3)` with uninitialised memory (seemingly random values). This is **NOT** useful and can cause errors if these uninitialised values are used accidentally in a calculation. If you wish to allocate a NumPy array, but not set numerical values, you might use `np.ones(3)*np.nan` to fill an appropriately sized array with `np.nan` the not a number value. This will now cause errors if the value is not set correctly, or at least be obvious if it is used in a calculation. See [NaN](https://en.wikipedia.org/wiki/NaN) for detailed information.\n\nNotice if you put all these in the same cell you only see the last array, you can either put each array in its own cell, or print each one individually." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Reshaping arrays\n\nBehind the scenes, a multi-dimensional NumPy `array` is just stored as a linear segment of memory. The fact that it is presented as having more than one dimension is simply a layer on top of that (sometimes called a *view*). This means that we can simply change that interpretive layer and change the shape of an array very quickly (i.e without NumPy having to copy any data around).\n\nThis is mostly done with the `reshape()` method on the array object:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_array = np.arange(16)\nmy_array", - "execution_count": 19, - "outputs": [ - { - "data": { - "text/plain": "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])" - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_array.shape", - "execution_count": 20, - "outputs": [ - { - "data": { - "text/plain": "(16,)" - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_array.reshape((2, 8))", - "execution_count": 21, - "outputs": [ - { - "data": { - "text/plain": "array([[ 0, 1, 2, 3, 4, 5, 6, 7],\n [ 8, 9, 10, 11, 12, 13, 14, 15]])" - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_array.reshape((4, 4))", - "execution_count": 22, - "outputs": [ - { - "data": { - "text/plain": "array([[ 0, 1, 2, 3],\n [ 4, 5, 6, 7],\n [ 8, 9, 10, 11],\n [12, 13, 14, 15]])" - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Note that if you check, `my_array.shape` will still return `(16,)` as `reshaped` is simply a *view* on the original data, it hasn't actually *changed* it. If you want to edit the original object in-place then you can use the `resize()` method.\n\nYou can also transpose an array using the `transpose()` method which mirrors the array along its diagonal:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_array.reshape((2, 8)).transpose()", - "execution_count": 23, - "outputs": [ - { - "data": { - "text/plain": "array([[ 0, 8],\n [ 1, 9],\n [ 2, 10],\n [ 3, 11],\n [ 4, 12],\n [ 5, 13],\n [ 6, 14],\n [ 7, 15]])" - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_array.reshape((4,4)).transpose()", - "execution_count": 24, - "outputs": [ - { - "data": { - "text/plain": "array([[ 0, 4, 8, 12],\n [ 1, 5, 9, 13],\n [ 2, 6, 10, 14],\n [ 3, 7, 11, 15]])" - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: An array puzzle\n\nUsing the NumPy [documentation](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html), create, **in one line**, a NumPy array which looks like:\n\n```python\n[10, 60, 20, 70, 30, 80, 40, 90, 50, 100]\n```\n\nHint: you might need to use `transpose()`, `reshape()` and `arange()` as well as other functions from the \"Shape manipulation\" section of the documentation. Can you find a method which uses fewer than 4 function calls?" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: An array puzzle\nOne solution using 4 founction calls is:\n```python\nnp.arange(10,101,10).reshape(2,5).transpose().flatten()\n```\n\nA one line solution which only uses one function call is:\n```python\nnp.array([10, 60, 20, 70, 30, 80, 40, 90, 50, 100])\n```\nAlthough not in the spirit of the puzzle exercise, it if far easier to see what is happening for this small array.\nOf course for larger arrays this would be impractical." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Key Points\n* `np.array` can convert Python lists to NumPy arrays.\n* NumPy gives many functions for initialising arrays, like `arange`, `linspace`, `ones` and `zeros`.\n* NumPy arrays can be reshaped and resized using the `reshape` and `resize` functions." - } - ], - "metadata": { - "kernelspec": { - "name": "python3", - "display_name": "Python 3", - "language": "python" - }, - "language_info": { - "mimetype": "text/x-python", - "nbconvert_exporter": "python", - "name": "python", - "pygments_lexer": "ipython3", - "version": "3.5.4", - "file_extension": ".py", - "codemirror_mode": { - "version": 3, - "name": "ipython" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/notebooks_plain/04_pandas_pt1.ipynb b/notebooks_plain/04_pandas_pt1.ipynb deleted file mode 100644 index 087aa5b..0000000 --- a/notebooks_plain/04_pandas_pt1.ipynb +++ /dev/null @@ -1,687 +0,0 @@ -{ - "cells": [ - { - "metadata": { - "collapsed": true - }, - "cell_type": "markdown", - "source": "# Introduction to pandas" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Overview\n- **Teaching:** 20 min\n- **Exercises:** 10 min\n\n**Questions**\n* What is pandas?\n* Why should I use series and data frames?\n\n**Objectives**\n* Use pandas to convert lists to series.\n* Learn about slicing and broadcasting series (and by extension NumPy arrays).\n* Use pandas to convert dicts to data frames." - }, - { - "metadata": { - "collapsed": true - }, - "cell_type": "markdown", - "source": "Pandas is a library providing high-performance, easy-to-use data structures and data analysis tools. The core of pandas is its *dataframe* which is essentially a table of data. Pandas provides easy and powerful ways to import data from a variety of sources and export it to just as many. It is also explicitly designed to handle *missing data* elegantly which is a very common problem in data from the real world.\n\nThe offical [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/) is very comprehensive and you will be answer a lot of questions in there, however, it can sometimes be hard to find the right page. Don't be afraid to use Google to find help.\n\nJust like numpy, pandas has a standard convention for importing it:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "import pandas as pd", - "execution_count": 1, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "We also explicitly import `Series` and `DataFrame` as we will be using them a lot." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "from pandas import Series, DataFrame", - "execution_count": 2, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Series\n\nThe simplest of pandas' data structures is the `Series`. It is a one-dimensional list-like structure.\nLet's create one from a `list`:" - }, - { - "metadata": { - "scrolled": true, - "trusted": false - }, - "cell_type": "code", - "source": "Series([14, 7, 3, -7, 8])", - "execution_count": 3, - "outputs": [ - { - "data": { - "text/plain": "0 14\n1 7\n2 3\n3 -7\n4 8\ndtype: int64" - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "There are three main components to this output.\nThe first column (`0`, `2`, etc.) is the index, by default this is numbers each row starting from zero.\nThe second column is our data, stored i the same order we entered it in our list.\nFinally at the bottom there is the `dtype` which stands for 'data type' which is telling us that all our data is being stored as a 64-bit integer.\nUsually you can ignore the `dtype` until you start doing more advanced things.\n\nWe previously came across `dtype`s when learing about NumPy. This is because `pandas` uses NumPy as its underlying library. A `pandas.Series` is essentially a `np.array` with some extra features wrapped around it.\n\nIn the first example above we allowed pandas to automatically create an index for our `Series` (this is the `0`, `1`, `2`, etc. in the left column) but often you will want to specify one yourself" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "s = Series([14, 7, 3, -7, 8], index=['a', 'b', 'c', 'd', 'e'])\nprint(s)", - "execution_count": 4, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": "a 14\nb 7\nc 3\nd -7\ne 8\ndtype: int64\n" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "We can use this index to retrieve individual rows" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "s['a']", - "execution_count": 5, - "outputs": [ - { - "data": { - "text/plain": "14" - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "to replace values in the series" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "s['c'] = -1", - "execution_count": 6, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "or to get a set of rows" - }, - { - "metadata": { - "scrolled": true, - "trusted": false - }, - "cell_type": "code", - "source": "s[['a', 'c', 'd']]", - "execution_count": 7, - "outputs": [ - { - "data": { - "text/plain": "a 14\nc -1\nd -7\ndtype: int64" - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Make a Series\n\n- Create a Pandas `Series` with 10 or so elements where the indices are years and the values are numbers.\n- Experiment with retrieving elements from the `Series`.\n- Try making another `Series` with duplicate values in the index, what happens when you access those elements?\n- How does a Pandas `Series` differ from a Python `list` or `dict`?" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Make a Series\n* Ten elements indexed by ten years:\n\n```python\nmy_series = Series(range(10), index=range(1920, 2020, 10))\nprint('My series:')\nprint(my_series)\nprint()\nprint('My series for 1990:')\nprint(my_series[1990])\nprint()\n```\n\nOutput:\n```brainfuck\nMy series:\n1920 0\n1930 1\n1940 2\n1950 3\n1960 4\n1970 5\n1980 6\n1990 7\n2000 8\n2010 9\ndtype: int64\n\nMy series for 1990:\n7\n```\n\n* Another series with a repeated index:\n\n```python\nanother_series = Series(range(5), index=['a', 'b', 'b', 'c', 'd'])\nprint('Another series, but with duplicated index:')\nprint(another_series)\nprint()\nprint('Another series accessing duplicated index:')\nprint(another_series['b'])\n```\n\nOutput:\n```brainfuck\nAnother series, but with duplicated index:\na 0\nb 1\nb 2\nc 3\nd 4\ndtype: int64\n\nAnother series accessing duplicated index:\nb 1\nb 2\ndtype: int64\n```\n\n* Series are different to lists since they must contain all the same data type.\n* Series are different to dicts since they can have keys with multiple values." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Series operations\n\nA `Series` is `list`-like in the sense that it is an ordered set of values. It is also `dict`-like since its entries can be accessed via key lookup. One very important way in which is differs is how it allows operations to be done over the whole `Series` in one go, a technique often referred to as 'broadcasting'. It should also be noted, that since these series objects are based on NumPy arrays, any slicing or bradcasting operation in this section can also be applied to a NumPy array, with the same result.\n\nA simple example is wanting to double the value of every entry in a set of data. In standard Python, you might have a list like" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_list = [3, 6, 8, 4, 10]", - "execution_count": 12, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "If you wanted to double every entry you might try simply multiplying the list by `2`:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_list * 2", - "execution_count": 13, - "outputs": [ - { - "data": { - "text/plain": "[3, 6, 8, 4, 10, 3, 6, 8, 4, 10]" - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "but as you can see, that simply duplicated the elements. Instead you would have to use a `for` loop or a list comprehension:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "[i * 2 for i in my_list]", - "execution_count": 14, - "outputs": [ - { - "data": { - "text/plain": "[6, 12, 16, 8, 20]" - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "With a pandas `Series`, however, you can perform bulk mathematical operations to the whole series in one go:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_series = Series(my_list)\nprint(my_series)", - "execution_count": 15, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": "0 3\n1 6\n2 8\n3 4\n4 10\ndtype: int64\n" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "my_series * 2", - "execution_count": 16, - "outputs": [ - { - "data": { - "text/plain": "0 6\n1 12\n2 16\n3 8\n4 20\ndtype: int64" - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "As well as bulk modifications, you can perform bulk selections by putting more complex statements in the square brackets:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "s[s < 0] # All negative entries", - "execution_count": 17, - "outputs": [ - { - "data": { - "text/plain": "c -1\nd -7\ndtype: int64" - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "s[(s * 2) > 4] # All entries which, when doubled are greater than 4", - "execution_count": 18, - "outputs": [ - { - "data": { - "text/plain": "a 14\nb 7\ne 8\ndtype: int64" - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "These operations work because the `Series` index selection can be passed a series of `True` and `False` values which it then uses to filter the result:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "(s * 2) > 4", - "execution_count": 19, - "outputs": [ - { - "data": { - "text/plain": "a True\nb True\nc False\nd False\ne True\ndtype: bool" - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Here you can see that the rows `a`, `b` and `e` are `True` while the others are `False`. Passing this to `s[...]` will only show rows that are `True`." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "### Multi-Series operations\n\nIt is also possible to perform operations between two `Series` objects:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "s2 = Series([23,5,34,7,5])\ns3 = Series([7, 6, 5,4,3])\ns2 - s3", - "execution_count": 20, - "outputs": [ - { - "data": { - "text/plain": "0 16\n1 -1\n2 29\n3 3\n4 2\ndtype: int64" - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Broadcasting\n\n- Create two `Series` objects of equal length with no specified index and containing any values you like. Perform some mathematical operations on them and experiment to make sure it works how you think.\n- What happens then you perform an operation on two series which have different lengths? How does this change when you give the series some indices?\n- Using the `Series` from the first exercise with the years for the index, Select all entries with even-numbered years. Also, select all those with odd-numbered years." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Broadcasting\n* Two series of the same size with no index broadcast together, element for element.\n\n```python\nseries_a = Series(range(5))\nseries_b = Series(range(5,10))\nprint(series_a*series_b)\n```\n\nOutput:\n```brainfuck\n0 0\n1 6\n2 14\n3 24\n4 36\ndtype: int64\n```\n\n* Two series of the different sizes with no index broadcast together, element for element, until one series runs out of elements, every element after that in the longer series is set to `NaN` (not a number).\n\n```python\nseries_c = Series(range(7))\nprint(series_a + series_c)\n```\n\nOutput:\n```brainfuck\n0 0.0\n1 2.0\n2 4.0\n3 6.0\n4 8.0\n5 NaN\n6 NaN\ndtype: float64\n```\n\n* Two series of the different sizes each with an index broadcast together, index for index, any elements that don't have a matching index are set to `NaN` (not a number).\n\n```python\nseries_d = Series(range(5), index=range(10,60,10))\nseries_e = Series(range(7), index=range(30,100,10))\nprint(series_d + series_e)\n```\n\nOutput:\n```brainfuck\n10 NaN\n20 NaN\n30 2.0\n40 4.0\n50 6.0\n60 NaN\n70 NaN\n80 NaN\n90 NaN\ndtype: float64\n```\n." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## DataFrame\n\nWhile you can think of the `Series` as a one-dimensional list of data, pandas' `DataFrame` is a two (or possibly more) dimensional table of data. You can think of each column in the table as being a `Series`." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "data = {'city': ['Paris', 'Paris', 'Paris', 'Paris',\n 'London', 'London', 'London', 'London',\n 'Rome', 'Rome', 'Rome', 'Rome'],\n 'year': [2001, 2008, 2009, 2010,\n 2001, 2006, 2011, 2015,\n 2001, 2006, 2009, 2012],\n 'pop': [2.148, 2.211, 2.234, 2.244,\n 7.322, 7.657, 8.174, 8.615,\n 2.547, 2.627, 2.734, 2.627]}\ndf = DataFrame(data)", - "execution_count": 23, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "This has created a `DataFrame` from the dictionary `data`. The keys will become the column headers and the values will be the values in each column. As with the `Series`, an index will be created automatically." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df", - "execution_count": 24, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cityyearpop
0Paris20012.148
1Paris20082.211
2Paris20092.234
3Paris20102.244
4London20017.322
5London20067.657
6London20118.174
7London20158.615
8Rome20012.547
9Rome20062.627
10Rome20092.734
11Rome20122.627
\n
", - "text/plain": " city year pop\n0 Paris 2001 2.148\n1 Paris 2008 2.211\n2 Paris 2009 2.234\n3 Paris 2010 2.244\n4 London 2001 7.322\n5 London 2006 7.657\n6 London 2011 8.174\n7 London 2015 8.615\n8 Rome 2001 2.547\n9 Rome 2006 2.627\n10 Rome 2009 2.734\n11 Rome 2012 2.627" - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Or, if you just want a peek at the data, you can just grab the first few rows with:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df.head(3)", - "execution_count": 25, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cityyearpop
0Paris20012.148
1Paris20082.211
2Paris20092.234
\n
", - "text/plain": " city year pop\n0 Paris 2001 2.148\n1 Paris 2008 2.211\n2 Paris 2009 2.234" - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Since we passed in a dictionary to the `DataFrame` constructor, the order of the columns will not necessarilly match the order in which you defined them. To enforce a certain order, you can pass a `columns` argument to the constructor giving a list of the columns in the order you want them:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "DataFrame(data, columns=['year', 'city', 'pop'])", - "execution_count": 26, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
yearcitypop
02001Paris2.148
12008Paris2.211
22009Paris2.234
32010Paris2.244
42001London7.322
52006London7.657
62011London8.174
72015London8.615
82001Rome2.547
92006Rome2.627
102009Rome2.734
112012Rome2.627
\n
", - "text/plain": " year city pop\n0 2001 Paris 2.148\n1 2008 Paris 2.211\n2 2009 Paris 2.234\n3 2010 Paris 2.244\n4 2001 London 7.322\n5 2006 London 7.657\n6 2011 London 8.174\n7 2015 London 8.615\n8 2001 Rome 2.547\n9 2006 Rome 2.627\n10 2009 Rome 2.734\n11 2012 Rome 2.627" - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "When we accessed elements from a `Series` object, it would select an element by row. However, by default `DataFrame`s index primarily by column. You can access any column directly by using square brackets or by named attributes:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df['year']", - "execution_count": 27, - "outputs": [ - { - "data": { - "text/plain": "0 2001\n1 2008\n2 2009\n3 2010\n4 2001\n5 2006\n6 2011\n7 2015\n8 2001\n9 2006\n10 2009\n11 2012\nName: year, dtype: int64" - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df.city", - "execution_count": 28, - "outputs": [ - { - "data": { - "text/plain": "0 Paris\n1 Paris\n2 Paris\n3 Paris\n4 London\n5 London\n6 London\n7 London\n8 Rome\n9 Rome\n10 Rome\n11 Rome\nName: city, dtype: object" - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Accessing a column like this returns a `Series` which will act in the same way as those we were using earlier.\n\nNote that there is one additional part to this output, `Name: city`. Pandas has remembered that this `Series` was created from the `'city'` column in the `DataFrame`." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "type(df.city)", - "execution_count": 29, - "outputs": [ - { - "data": { - "text/plain": "pandas.core.series.Series" - }, - "execution_count": 29, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df.city == 'Paris'", - "execution_count": 30, - "outputs": [ - { - "data": { - "text/plain": "0 True\n1 True\n2 True\n3 True\n4 False\n5 False\n6 False\n7 False\n8 False\n9 False\n10 False\n11 False\nName: city, dtype: bool" - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "This has created a new `Series` which has `True` set where the city is Paris and `False` elsewhere.\n\nWe can use filtered `Series` like this to filter the `DataFrame` as a whole. `df.city == 'Paris'` has returned a `Series` containing booleans. Passing it back into `df` as an indexing operation will use it to filter based on the `'city'` column." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df[df.city == 'Paris']", - "execution_count": 31, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cityyearpop
0Paris20012.148
1Paris20082.211
2Paris20092.234
3Paris20102.244
\n
", - "text/plain": " city year pop\n0 Paris 2001 2.148\n1 Paris 2008 2.211\n2 Paris 2009 2.234\n3 Paris 2010 2.244" - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "You can then carry on and grab another column after that filter:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df[df.city == 'Paris'].year", - "execution_count": 32, - "outputs": [ - { - "data": { - "text/plain": "0 2001\n1 2008\n2 2009\n3 2010\nName: year, dtype: int64" - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "If you want to select a **row** from a `DataFrame` then you can use the `.loc` attribute which allows you to pass index values like:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df.loc[2]", - "execution_count": 33, - "outputs": [ - { - "data": { - "text/plain": "city Paris\nyear 2009\npop 2.234\nName: 2, dtype: object" - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "scrolled": true, - "trusted": false - }, - "cell_type": "code", - "source": "df.loc[2]['city']", - "execution_count": 34, - "outputs": [ - { - "data": { - "text/plain": "'Paris'" - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Adding new columns\n\nNew columns can be added to a `DataFrame` simply by assigning them by index (as you would for a Python `dict`) and can be deleted with the `del` keyword in the same way:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df['continental'] = (df.city != 'London')\ndf", - "execution_count": 38, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cityyearpopcontinental
0Paris20012.148True
1Paris20082.211True
2Paris20092.234True
3Paris20102.244True
4London20017.322False
5London20067.657False
6London20118.174False
7London20158.615False
8Rome20012.547True
9Rome20062.627True
10Rome20092.734True
11Rome20122.627True
\n
", - "text/plain": " city year pop continental\n0 Paris 2001 2.148 True\n1 Paris 2008 2.211 True\n2 Paris 2009 2.234 True\n3 Paris 2010 2.244 True\n4 London 2001 7.322 False\n5 London 2006 7.657 False\n6 London 2011 8.174 False\n7 London 2015 8.615 False\n8 Rome 2001 2.547 True\n9 Rome 2006 2.627 True\n10 Rome 2009 2.734 True\n11 Rome 2012 2.627 True" - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "del df['continental']", - "execution_count": 39, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Making your own dataframe\n- Create the `DataFrame` containing the census data for the three cities as we did above.\n- Select the data for the year 2001. Which city had the smallest population that year?\n- Find all the cities which had a population smaller than 2.6 million." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Making your own dataframe\n* To setup the dataframe as before:\n\n```python\ndata = {'city': ['Paris', 'Paris', 'Paris', 'Paris',\n 'London', 'London', 'London', 'London',\n 'Rome', 'Rome', 'Rome', 'Rome'],\n 'year': [2001, 2008, 2009, 2010,\n 2001, 2006, 2011, 2015,\n 2001, 2006, 2009, 2012],\n 'pop': [2.148, 2.211, 2.234, 2.244,\n 7.322, 7.657, 8.174, 8.615,\n 2.547, 2.627, 2.734, 2.627]}\ndf = DataFrame(data)\n```\n\nOutput: (no output)\n\n* To select the data for the year 2001:\n\n```python\nprint(df[df['year'] == 2001])\n```\n\nOutput:\n```brainfuck\n city year pop\n0 Paris 2001 2.148\n4 London 2001 7.322\n8 Rome 2001 2.547\n```\n* To find all cities with population less than 2.6 million:\n\n```python\nprint(df[df['pop'] < 2.6].city)\n```\n\nOutput:\n```brainfuck\n0 Paris\n1 Paris\n2 Paris\n3 Paris\n8 Rome\nName: city, dtype: object\n```\n." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Key Points\n* `Series` converts lists to pandas series.\n* Series can be sliced and broadcast together.\n* Pandas is a wrapper around NumPy.\n* By extension NumPy arrays can be sliced and broadcast together in the same way.\n* `DataFrame` converts dicts to pandas data frames." - } - ], - "metadata": { - "kernelspec": { - "name": "python3", - "display_name": "Python 3", - "language": "python" - }, - "language_info": { - "mimetype": "text/x-python", - "nbconvert_exporter": "python", - "name": "python", - "file_extension": ".py", - "version": "3.5.4", - "pygments_lexer": "ipython3", - "codemirror_mode": { - "version": 3, - "name": "ipython" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/notebooks_plain/05_pandas_pt2.ipynb b/notebooks_plain/05_pandas_pt2.ipynb deleted file mode 100644 index 1331b4c..0000000 --- a/notebooks_plain/05_pandas_pt2.ipynb +++ /dev/null @@ -1,197 +0,0 @@ -{ - "cells": [ - { - "metadata": { - "collapsed": true - }, - "cell_type": "markdown", - "source": "# Reading a file with pandas" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Overview\n- **Teaching:** 10 min\n- **Exercises:** 5 min\n\n**Questions**\n* How can I read my data file into pandas?\n\n**Objectives**\n* Use pandas to read in a CSV file.\n" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "One of the most common situations is that you have some data file containing the data you want to read. Perhaps this is data you've produced yourself or maybe it's from a collegue. In an ideal world the file will be perfectly formatted and will be trivial to import into pandas but since this is so often not the case, it provides a number of features to make your ife easier.\n\nFull information on reading and writing is available in the pandas manual on [IO tools](http://pandas.pydata.org/pandas-docs/stable/io.html) but first it's worth noting the common formats that pandas can work with:\n- Comma separated tables (or tab-separated or space-separated etc.)\n- Excel spreadsheets\n- HDF5 files\n- SQL databases\n\nFor this lesson we will focus on plain-text CSV files as they are perhaps the most common format. Imagine we have a CSV file like (if you are not running on notebooks.azure.com you will need to download this file from [city_pop.csv](../data/city_pop.csv)):" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "!cat ../data/city_pop.csv # Uses the IPython 'magic' !cat to print the file", - "execution_count": 1, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": "This is an example CSV file\r\nThe text at the top here is not part of the data but instead is here\r\nto describe the file. You'll see this quite often in real-world data.\r\nA -1 signifies a missing value.\r\n\r\nyear;London;Paris;Rome\r\n2001;7.322;2.148;2.547\r\n2006;7.652;;2.627\r\n2008;-1;2.211;\r\n2009;-1;2.234;2.734\r\n2011;8.174;;\r\n2012;-1;2.244;2.627\r\n2015;8.615;;\r\n" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "We can use the pandas function `read_csv()` to read the file and convert it to a `DataFrame`. Full documentation for this function can be found in [the manual](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) or, as with any Python object, directly in the notebook by typing `help(pd.read_csv)`." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "import pandas as pd\n\ncsv_file = '../data/city_pop.csv'\npd.read_csv(csv_file)", - "execution_count": 4, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
This is an example CSV file
0The text at the top here is not part of the da...
1to describe the file. You'll see this quite of...
2A -1 signifies a missing value.
3year;London;Paris;Rome
42001;7.322;2.148;2.547
52006;7.652;;2.627
62008;-1;2.211;
72009;-1;2.234;2.734
82011;8.174;;
92012;-1;2.244;2.627
102015;8.615;;
\n
", - "text/plain": " This is an example CSV file\n0 The text at the top here is not part of the da...\n1 to describe the file. You'll see this quite of...\n2 A -1 signifies a missing value.\n3 year;London;Paris;Rome\n4 2001;7.322;2.148;2.547\n5 2006;7.652;;2.627\n6 2008;-1;2.211;\n7 2009;-1;2.234;2.734\n8 2011;8.174;;\n9 2012;-1;2.244;2.627\n10 2015;8.615;;" - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "We can see that by default it's done a fairly bad job of parsing the file (this is mostly because it has been construsted to be as obtuse as possible). It's making a lot of assumptions about the structure of the file but in general it's taking quite a naïve approach.\n\nThe first this we notice is that it's treating the text at the top of the file as though it's data. Checking [the documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) we see that the simplest way to solve this is to use the `skiprows` argument to the function to which we give an integer giving the number of rows to skip:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "pd.read_csv(csv_file,\n skiprows=5,\n )", - "execution_count": 6, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
year;London;Paris;Rome
02001;7.322;2.148;2.547
12006;7.652;;2.627
22008;-1;2.211;
32009;-1;2.234;2.734
42011;8.174;;
52012;-1;2.244;2.627
62015;8.615;;
\n
", - "text/plain": " year;London;Paris;Rome\n0 2001;7.322;2.148;2.547\n1 2006;7.652;;2.627\n2 2008;-1;2.211;\n3 2009;-1;2.234;2.734\n4 2011;8.174;;\n5 2012;-1;2.244;2.627\n6 2015;8.615;;" - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Info: Editing cells\nIf you are following along with this material in a notebook, don't forget you can edit a cell and execute it again.\nIn this lesson, you can just keep modifying the input to the `read_csv()` function and re-execute the cell, rather than making a new cell for each modification." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "The next most obvious problem is that it is not separating the columns at all. This is controlled by the `sep` argument which is set to `','` by default (hence *comma* separated values). We can simply set it to the appropriate semi-colon:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "pd.read_csv(csv_file,\n skiprows=5,\n sep=';'\n )", - "execution_count": 7, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
yearLondonParisRome
020017.3222.1482.547
120067.652NaN2.627
22008-1.0002.211NaN
32009-1.0002.2342.734
420118.174NaNNaN
52012-1.0002.2442.627
620158.615NaNNaN
\n
", - "text/plain": " year London Paris Rome\n0 2001 7.322 2.148 2.547\n1 2006 7.652 NaN 2.627\n2 2008 -1.000 2.211 NaN\n3 2009 -1.000 2.234 2.734\n4 2011 8.174 NaN NaN\n5 2012 -1.000 2.244 2.627\n6 2015 8.615 NaN NaN" - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Reading the descriptive header of our data file we see that a value of `-1` signifies a missing reading so we should mark those too. This can be done after the fact but it is simplest to do it at import-time using the `na_values` argument:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "pd.read_csv(csv_file,\n skiprows=5,\n sep=';',\n na_values='-1'\n )", - "execution_count": 9, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
yearLondonParisRome
020017.3222.1482.547
120067.652NaN2.627
22008NaN2.211NaN
32009NaN2.2342.734
420118.174NaNNaN
52012NaN2.2442.627
620158.615NaNNaN
\n
", - "text/plain": " year London Paris Rome\n0 2001 7.322 2.148 2.547\n1 2006 7.652 NaN 2.627\n2 2008 NaN 2.211 NaN\n3 2009 NaN 2.234 2.734\n4 2011 8.174 NaN NaN\n5 2012 NaN 2.244 2.627\n6 2015 8.615 NaN NaN" - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "The last this we want to do is use the `year` column as the index for the `DataFrame`. This can be done by passing the name of the column to the `index_col` argument:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "df3 = pd.read_csv(csv_file,\n skiprows=5,\n sep=';',\n na_values='-1',\n index_col='year'\n )\ndf3", - "execution_count": 10, - "outputs": [ - { - "data": { - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
LondonParisRome
year
20017.3222.1482.547
20067.652NaN2.627
2008NaN2.211NaN
2009NaN2.2342.734
20118.174NaNNaN
2012NaN2.2442.627
20158.615NaNNaN
\n
", - "text/plain": " London Paris Rome\nyear \n2001 7.322 2.148 2.547\n2006 7.652 NaN 2.627\n2008 NaN 2.211 NaN\n2009 NaN 2.234 2.734\n2011 8.174 NaN NaN\n2012 NaN 2.244 2.627\n2015 8.615 NaN NaN" - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Comma separated files\n- There is another file called `cetml1659on.dat` (available from [here](../data/cetml1659on.dat)). This contains some historical weather data for a location in the UK. Import that file as a Pandas `DataFrame` using `read_csv()`, making sure that you cover all the NaN values. Be sure to look at the [documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv) for `read_csv()`.\n- How many years had a negative average temperature in January?\n- What was the average temperature in June over the years in the data set? Tip: look in the [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) for which method to call.\n\nWe will come back to this data set at a later stage.\n\nHints for the first part:\n* The syntax for whitespace delimited data is `sep='\\s+'`, which is not immediately obvious from the documentation.\n* The data is almost comlete (which is unusual for scientific data) and there are only two invalid entries. Look at the last row of the file and, given that the data is temperature data, deduce which values need to be `na_values`. (You can use a list to give multiple `na_values`)\n* If you can't work out how to do the first part of this exercise, take a look at the solutions." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Comma separated files\n* Read in the CSV file, skipping the first 6 rows, using whitespace to separate data, invalid data -99.9 and -99.99:\n\n```python\nimport pandas as pd\n\nweather_csv = 'cetml1659on.dat'\nweather_df = pd.read_csv(weather_csv,\n skiprows=6,\n sep='\\s+',\n na_values=['-99.9', '-99.99']\n )\nprint(weather_df.head())\n```\n\nOutput:\n```brainfuck\n JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC YEAR\n1659 3.0 4.0 6.0 7.0 11.0 13.0 16.0 16.0 13.0 10.0 5.0 2.0 8.87\n1660 0.0 4.0 6.0 9.0 11.0 14.0 15.0 16.0 13.0 10.0 6.0 5.0 9.10\n1661 5.0 5.0 6.0 8.0 11.0 14.0 15.0 15.0 13.0 11.0 8.0 6.0 9.78\n1662 5.0 6.0 6.0 8.0 11.0 15.0 15.0 15.0 13.0 11.0 6.0 3.0 9.52\n1663 1.0 1.0 5.0 7.0 10.0 14.0 15.0 15.0 13.0 10.0 7.0 5.0 8.63\n\n```\n\n* Select all data in the January column less that 0, use `len()` so we don't have to count the rows ourself.\n\n```python\nweather_df[weather_df['JAN'] < 0] # Would output all the entries\nlen(weather_df[weather_df['JAN'] < 0]) # Just counts the number of rows\n```\n\nOutput:\n```brainfuck\n20\n```\n\n* The average of the data can be found using the `.mean()` method:\n\n```python\nweather_df['JUN'].mean()\n```\n\nOutput:\n```brainfuck\n14.325977653631282\n```\n." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Key Points\n* Pandas provides the `read_csv()` function for reading in CSV files.\n* Although it saves us a lot of work the syntax can be quite tricky." - } - ], - "metadata": { - "kernelspec": { - "name": "python3", - "display_name": "Python 3", - "language": "python" - }, - "language_info": { - "mimetype": "text/x-python", - "nbconvert_exporter": "python", - "name": "python", - "pygments_lexer": "ipython3", - "version": "3.5.4", - "file_extension": ".py", - "codemirror_mode": { - "version": 3, - "name": "ipython" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/notebooks_plain/07_matplotlib_pt2.ipynb b/notebooks_plain/07_matplotlib_pt2.ipynb deleted file mode 100644 index f453b6b..0000000 --- a/notebooks_plain/07_matplotlib_pt2.ipynb +++ /dev/null @@ -1,302 +0,0 @@ -{ - "cells": [ - { - "metadata": {}, - "cell_type": "markdown", - "source": "# Customising matplotlib output" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Overview\n- **Teaching:** 10 min\n- **Exercises:** 10 min\n\n**Questions**\n* How can I customise my plot?\n* How can I make the graph look the way I want it to?\n* What other features does matplotlib have?\n\n**Objectives**\n* Learn how to change the limits on plot axes.\n* Change the line colours and styles.\n* Change the ticks and tick labels on axes.\n* See how a graph can be annotated and customised.\n* Save a graph as an image." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Making it better\n\nPrevious page we saw how we could plot the sinc function and from data in a dataframe, and we customised it by adding axes labels and a title. \n\nThis is good practise, but what if we want to modify the graph itself? Maptplotlib has a rich feature set which we will explore in the following examples. First lets set up matplotlib:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "%config InlineBackend.figure_format = 'svg'\nimport matplotlib.pyplot as plt", - "execution_count": 1, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "We can make use of the same feature set with numpy:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "import numpy as np\n\n\nx = np.linspace(-5, 5, 1000)\n# sinc(x) is defined to be sin(pi*x)/(pi*x) in numpy\ny_sinc = np.sinc(x)\n\nplt.plot(x, y_sinc, label='sinc(x)')\n\n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x')\nplt.ylabel('f(x)')\nplt.show()", - "execution_count": 2, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Setting limits\n\nSinc(x) is bounded above and below by the two functions\n$$\\frac{1}{\\pi x} \\text{ and } \\frac{-1}{\\pi x}$$\n\nWe can add these to the plot by making two further NumPy arrays. Notice we have to mask the arrays, this just prevents us plotting over the asymptote. If you don't like maths, bear with me, we'll be back to the plotting soon!" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "# sinc(x) bounded by plus/minus 1/(pi*x)\ny_above = 1/(np.pi*x)\ny_below = -1/(np.pi*x)\n\n# mask out very large values\ny_above = np.ma.masked_outside(y_above, -60, 60)\ny_below = np.ma.masked_outside(y_below, -60, 60)", - "execution_count": 3, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "We can then plot all three sets of y values on the same axes as follows:" - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "plt.plot(x, y_sinc, label='sinc(x)')\nplt.plot(x, y_above, label='$1/\\pi x$')\nplt.plot(x, y_below, label='$-1/\\pi x$')\n\n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x')\nplt.ylabel('f(x)')\nplt.show()", - "execution_count": 4, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "However, we seem to have lost the shape of the sinc function and all we can see is the new functions we have plotted. We can fix this by setting what the limits on the x and y axes are. This is done with the `xlim` and `ylim` functions respectively." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Info: Editing cells, again\nIf you are following along with this material in a notebook, recall you can edit a cell and execute it again.\nIn this lesson, you can just keep modifying the input to the code block you use for plotting and re-execute the cell, rather than making a new cell for each modification." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "plt.plot(x, y_sinc, label='sinc(x)')\nplt.plot(x, y_above, label='$1/\\pi x$')\nplt.plot(x, y_below, label='$-1/\\pi x$')\n\n# Set new limits\nplt.xlim(-5, 5)\nplt.ylim(-0.5, 1.2)\n\n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x')\nplt.ylabel('f(x)')\nplt.show()", - "execution_count": 5, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Now we can see the sinc function again. Notice alse that the sinc function touces both edges of the graph and is no longer floating in the center." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Changing colours and line widths\n\nWe can also change the colours and style of the lines we use in the plot. We can do this explicitly using the keyword arguments `color`, `linewidth` and `linestyle` or with format strings, which are documented [here](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot). Now the important sinc function is bolder than the bounding lines and the positive and negative bounds are different colours." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\")\nplt.plot(x, y_above, 'k--',label='$1/\\pi x$')\nplt.plot(x, y_below, 'r--', label='$-1/\\pi x$')\n\n# Set limits\nplt.xlim(-5, 5)\nplt.ylim(-0.5, 1.2)\n\n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x')\nplt.ylabel('f(x)')\nplt.show()", - "execution_count": 6, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Setting ticks\n\nCurrently there are only x ticks at the even integers, and the y ticks are quite dense. If we want more or fewer ticks we can use the `xticks` and `yticks` functions." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\")\nplt.plot(x, y_above, 'k--',label='$1/\\pi x$')\nplt.plot(x, y_below, 'r--', label='$-1/\\pi x$')\n\n# Set limits\nplt.xlim(-5, 5)\nplt.ylim(-0.5, 1.2)\n\n# Set ticks\nplt.xticks(range(-5,6))\nplt.yticks([-0.5, 0, 0.5, 1])\n\n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x')\nplt.ylabel('f(x)')\nplt.show()", - "execution_count": 7, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Info: Setting tick labels\n\nWhen we set tick values, we can also provide a corresponding label in the second argument list. We can even use LaTeX to allow for nice rendering of the label. This is useful for trigonometric functions where we might want axis labels that are multiples of $\\pi$. For example:\n```python\nplt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n```\n." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Customising pandas plots\nPandas plots can be customised in exactly the same way, using your solution to the Summer Climate exercise on the previous page, make the following changes\n\n- Using the temperature dataset, set the colours of the July and January lines to a warm colour and a cool colour.\n- Add in the yearly average column to the plot with a dashed line style.\n- (Harder) Add an annotation to one of the spikes in the data. Make sure the label is placed nicely.\n\nHint: you can get the year and temperature for a spike using:\n```python\nwarm_winter_year = df['JAN'].idxmax()\nwarm_winter_temp = df['JAN'].max()\n```\n- Save the figure to a file and display it in your Jupyter notebook." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Customising pandas plots\nFull working code for this exercise is\n```python\nimport pandas as pd\n\n# Import the data \ncsv_file = 'cetml1659on.dat'\ndf = pd.read_csv(csv_file, # file name\n skiprows=6, # skip header\n sep='\\s+', # whitespace separated\n na_values=['-99.9', '-99.99'] # NaNs\n )\n\n# Plot the January and June values\ndf['JAN'].plot(color='cyan')\ndf['JUN'].plot(color='orange')\ndf['YEAR'].plot(color='black', linestyle=':')\n\n# Add a title and axes labels\nplt.title('Summer, Winter and average Climate Plots')\nplt.xlabel('Year')\nplt.ylabel('Temperature ($^\\circ$C)')\n\n# Add a legend\nplt.legend()\n\n# Find warm winter year point\nwarm_winter_year = df['JAN'].idxmax()\nwarm_winter_temp = df['JAN'].max()\n\n# Annotate plot\nplt.annotate('A warm winter',\n xy=(warm_winter_year, warm_winter_temp),\n xytext=(-150, -100), textcoords='offset points', fontsize=14,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n\nplt.savefig('fancy_summer_climate.png')\n# display with ![](fancy_summer_climate.png)\n```\nThis produces the figure:\n![](../images/fancy_summer_climate.png)\n" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Moving spines - Back to sinc ...\n\nOur plotting routine is begining to get more complicated. What follows is designed to show you what is possible and give a reference you can come back to. Feel free to follow along, or just read through the next part just to see what is possible.\n\nSpines are the lines connecting the axis tick marks and noting the boundaries of the data area. They can be placed at arbitrary positions and until now, they were on the border of the axis. Sometimes it is useful to have them in the middle. Since there are four of them (top/bottom/left/right), we’ll discard the top and right by setting their colour to none and we’ll move the bottom and left ones to coordinate 0 in data space coordinates." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "plt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\")\nplt.plot(x, y_above, 'k--',label='$1/\\pi x$')\nplt.plot(x, y_below, 'r--', label='$-1/\\pi x$')\n\n# Set limits\nplt.xlim(-5, 5)\nplt.ylim(-0.5, 1.2)\n\n# Set ticks\nplt.xticks(range(-5,6))\nplt.yticks([-0.5, 0, 0.5, 1])\n\n# Move the axis spines\nax = plt.gca() # gca stands for 'get current axis'\nax.spines['right'].set_color('none')\nax.spines['top'].set_color('none')\nax.xaxis.set_ticks_position('bottom')\nax.spines['bottom'].set_position(('data',0))\nax.yaxis.set_ticks_position('left')\nax.spines['left'].set_position(('data',0))\n\n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x')\nplt.ylabel('f(x)')\nplt.show()", - "execution_count": 8, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "In our case the tick labels are now overlapping with the function we have plot, which is a bit of a problem. We can either change the plot back or modify things further.\n\n## Annotate some points\n\nWe can annotate some interesting points on the graph using the `annotate` function. We choose the first positive x value where sinc(x) is equal to $1/\\pi x$ and $-1/\\pi x$. This is done by first drawing a marker on the curve as well as a straight dotted line. Then, we’ll use the annotate command to display some text with an arrow.\n\nWe also fix our tick labels, by introducing the `zorder` keyword argument, which controls the order in which things are drawn (lower zorder means drawn underneath items with a higher zorder). To make the tick labels stand out even more, we can apply a semi-transparent background (so we can still see the lines passing underneath) and increase the font size.\n\nThere are also a few more tweaks to tidy the plot up, like moving the axes labels." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "# NB: We had to introduce a zorder parameter here\nplt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\", zorder=0)\nplt.plot(x, y_above, 'k--',label='$1/\\pi x$', zorder=0.1)\nplt.plot(x, y_below, 'r--', label='$-1/\\pi x$', zorder=0.1)\n\n# Set limits\nplt.xlim(-5, 5)\nplt.ylim(-0.5, 1.2)\n\n# Set ticks\nplt.xticks(range(-5,6))\nplt.yticks([-0.5, 0, 0.5, 1])\n\n# Move the axis spines\nax = plt.gca() # gca stands for 'get current axis'\nax.spines['right'].set_color('none')\nax.spines['top'].set_color('none')\nax.xaxis.set_ticks_position('bottom')\nax.spines['bottom'].set_position(('data',0))\nax.yaxis.set_ticks_position('left')\nax.spines['left'].set_position(('data',0))\n\n# Annotate the graph\nt = 0.5\nplt.plot([t, t], [0, np.sinc(t)], color='black', linewidth=1, linestyle=\"--\")\nplt.scatter([t], [np.sinc(t)], 50, color='black')\n\nplt.annotate(r'sinc$\\left(\\frac{1}{2}\\right)=\\frac{2}{\\pi}$',\n xy=(t, np.sinc(t)), xycoords='data',\n xytext=(50, 30), textcoords='offset points', fontsize=16,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n\ns = 1.5\nplt.plot([s, s],[0, np.sinc(s)], color='red', linewidth=1, linestyle=\"--\")\nplt.scatter([s],[np.sinc(s)], 50, color='red')\n\nplt.annotate(r'sinc$\\left(\\frac{3}{2}\\right)=\\frac{-2}{3\\pi}$',\n xy=(s, np.sinc(s)), xycoords='data',\n xytext=(30, -30), textcoords='offset points', fontsize=16,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=-.2\"))\n\n# Increase the size of the tick labels in both axes\n# and apply semi-transparent background\nfor label in ax.get_xticklabels() + ax.get_yticklabels():\n label.set_fontsize(12)\n label.set_bbox(dict(facecolor='white', edgecolor='none', pad=0.2, alpha=0.7))\n \n# Set title and legend, then show plot\nplt.title('The sinc function')\nplt.legend(loc='upper left')\nplt.xlabel('x', labelpad=-20, x=1.05, fontsize=12)\nplt.ylabel('f(x)', labelpad=-30)\nplt.show()", - "execution_count": 9, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "Now you can really customise your plots!" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Info: Fiddle until it is right\nThe original source of this section is inspired by Nicolas P. Rougier's [tutorial](http://www.labri.fr/perso/nrougier/teaching/matplotlib/), which is linked to from matplotlib's own website. The last few tricks for sorting out the spines on the graph no longer work in the latest versions of matplotlib. As a result in Chrys Woods' [tutorial](https://chryswoods.com/python_and_data/) this step is ommited and the graph they save has the spines on the outside, avoiding the issue.\n\nThe way to get tick labels to draw on top of plot lines is to use the `zorder` keyword argument, but this isn't so obvious from documentation. We mention this here as a useful reference, and to also show that tweaking your plot to look just right can be tricky, but worth the perseverance." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Saving plot to a file\n\nYou can take any plot you've created within jupyter and save it to a file on disk using the `plt.savefig()` function. You give the function the name of the file to create and it will use whatever format is specified by the name. This is useful if you want to use the plot outside of jupyter. It is also possible to generate plots like this in the terminal, where it may be preferable to save straight to disk." - }, - { - "metadata": { - "trusted": false - }, - "cell_type": "code", - "source": "# NB: We had to introduce a zorder parameter here\nplt.plot(x, y_sinc, label='sinc(x)', color=\"orange\", linewidth=2.5, linestyle=\"-\", zorder=0)\nplt.plot(x, y_above, 'k--',label='$1/\\pi x$', zorder=0.1)\nplt.plot(x, y_below, 'r--', label='$-1/\\pi x$', zorder=0.1)\n\n# Set limits\nplt.xlim(-5, 5)\nplt.ylim(-0.5, 1.2)\n\n# Set ticks\nplt.xticks(range(-5,6))\nplt.yticks([-0.5, 0, 0.5, 1])\n\n# Move the axis spines\nax = plt.gca() # gca stands for 'get current axis'\nax.spines['right'].set_color('none')\nax.spines['top'].set_color('none')\nax.xaxis.set_ticks_position('bottom')\nax.spines['bottom'].set_position(('data',0))\nax.yaxis.set_ticks_position('left')\nax.spines['left'].set_position(('data',0))\n\n# Annotate the graph\nt = 0.5\nplt.plot([t, t], [0, np.sinc(t)], color='black', linewidth=1, linestyle=\"--\")\nplt.scatter([t], [np.sinc(t)], 50, color='black')\n\nplt.annotate(r'sinc$\\left(\\frac{1}{2}\\right)=\\frac{2}{\\pi}$',\n xy=(t, np.sinc(t)), xycoords='data',\n xytext=(50, 30), textcoords='offset points', fontsize=16,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n\ns = 1.5\nplt.plot([s, s],[0, np.sinc(s)], color='red', linewidth=1, linestyle=\"--\")\nplt.scatter([s],[np.sinc(s)], 50, color='red')\n\nplt.annotate(r'sinc$\\left(\\frac{3}{2}\\right)=\\frac{-2}{3\\pi}$',\n xy=(s, np.sinc(s)), xycoords='data',\n xytext=(30, -30), textcoords='offset points', fontsize=16,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=-.2\"))\n\n# Increase the size of the tick labels in both axes\n# and apply semi-transparent background\nfor label in ax.get_xticklabels() + ax.get_yticklabels():\n label.set_fontsize(12)\n label.set_bbox(dict(facecolor='white', edgecolor='none', pad=0.2, alpha=0.7))\n \n# Set title and legend, then SAVE plot\nplt.title('The sinc function', fontsize=20)\nplt.legend(loc='upper left')\nplt.xlabel('x', labelpad=-20, x=1.05, fontsize=12)\nplt.ylabel('f(x)', labelpad=-30, y=0.45, fontsize=12)\n#plt.show()\n\n# Save final plot\nplt.savefig('../images/sinc.png')\n#You don't need to save in this folder you could just use:\n#plt.savefig('sinc.png')", - "execution_count": 10, - "outputs": [ - { - "data": { - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", - "text/plain": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "You can then display the figure in jupyter with `![](sinc.png)`\n\n![](../images/sinc.png)" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Exercise: Sine and cosine\n\nRecreate a similar plot to the one above, but using the sine and cosine functions plotted over the range $-\\pi$ to $\\pi$, available in NumPy as `np.sin` and `np.cos`." - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Solution: Sine and cosine\nComplete code for this solution looks like:\n```python\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# Data to plot\nx = np.linspace(-np.pi, np.pi, 700)\ny_sin = np.sin(x)\ny_cos = np.cos(x)\n\n# NB: We had to introduce a zorder parameter here\nplt.plot(x, y_cos, label='sin(x)', color=\"blue\", linewidth=2.5, linestyle=\"-\", zorder=0)\nplt.plot(x, y_sin, label='cos(x)', color=\"red\", linewidth=2.5, linestyle=\"-\", zorder=0)\n\n# Set limits\nplt.xlim(-np.pi, np.pi)\nplt.ylim(-1.1, 1.1)\n\n# Set ticks and labels\nplt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],\n [r'$-\\pi$', r'$-\\pi/2$', r'$0$', r'$+\\pi/2$', r'$+\\pi$'])\n\nplt.yticks([-1, 0, +1],\n [r'$-1$', r'$0$', r'$+1$'])\n\n# Move the spines\nax = plt.gca() # gca stands for 'get current axis'\nax.spines['right'].set_color('none')\nax.spines['top'].set_color('none')\nax.xaxis.set_ticks_position('bottom')\nax.spines['bottom'].set_position(('data',0))\nax.yaxis.set_ticks_position('left')\nax.spines['left'].set_position(('data',0))\n\n# Annotate the graph\nt = 2 * np.pi / 3\nplt.plot([t, t], [0, np.cos(t)], color='blue', linewidth=2.5, linestyle=\"--\")\nplt.scatter([t, ], [np.cos(t), ], 50, color='blue')\n\nplt.annotate(r'$cos(\\frac{2\\pi}{3})=-\\frac{1}{2}$',\n xy=(t, np.cos(t)), xycoords='data',\n xytext=(-90, -50), textcoords='offset points', fontsize=16,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n\nplt.plot([t, t],[0, np.sin(t)], color='red', linewidth=2.5, linestyle=\"--\")\nplt.scatter([t, ],[np.sin(t), ], 50, color='red')\n\nplt.annotate(r'$sin(\\frac{2\\pi}{3})=\\frac{\\sqrt{3}}{2}$',\n xy=(t, np.sin(t)), xycoords='data',\n xytext=(+30, 0), textcoords='offset points', fontsize=16,\n arrowprops=dict(arrowstyle=\"->\", connectionstyle=\"arc3,rad=.2\"))\n\n# Increase the size of the tick labels in both axes\n# and apply semi-transparent background\nfor label in ax.get_xticklabels() + ax.get_yticklabels():\n label.set_fontsize(12)\n label.set_bbox(dict(facecolor='white', edgecolor='none', pad=0.2, alpha=0.7))\n\n# Set title and legend, then SAVE plot\nplt.title('The sine and cosine functions', fontsize=20)\nplt.legend(loc='upper left')\nplt.xlabel('x', labelpad=-20, x=1.05, fontsize=12)\nplt.ylabel('f(x)', labelpad=-20, y=0.7, fontsize=12)\n#plt.show()\n\nplt.savefig('cos_sin.png')\n```\nThe code produces the figure:\n![](../images/cos_sin.png)" - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": "## Key Points\n* The limits on plot axes can be changed with `xlim` and `ylim`.\n* Keyword arguments can be used to change line colours and styles.\n* Alternatively format strings can be used as a shortcut.\n* Ticks and tick labels are changed with `xticks` and `yticks`.\n* A graph can be annotated and almost every element moved.\n* `savefig` saves the figure that we generate as an image." - } - ], - "metadata": { - "kernelspec": { - "name": "python3", - "display_name": "Python 3", - "language": "python" - }, - "language_info": { - "mimetype": "text/x-python", - "nbconvert_exporter": "python", - "name": "python", - "file_extension": ".py", - "version": "3.5.4", - "pygments_lexer": "ipython3", - "codemirror_mode": { - "version": 3, - "name": "ipython" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file