{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Starting Computations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{admonition} Lesson Content\n", ":class: note, dropdown\n", "\n", "- Dataset\n", "- Some computation\n", "- Filtering and Masking values\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Context\n", "\n", "Yesterday we explored the data structures that `xarray` uses to organize data. Today we are going to use those datastructres to manipulate data!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset\n", "\n", "The dataset of the day today is NOAA OISST. It is a sea surface temperature dataset that goes back to the 1980s.\n", "\n", "- [NOAA NCEI Data listing](https://www.ncei.noaa.gov/products/optimum-interpolation-sst)\n", "- [THREDDS Catalog](https://www.ncei.noaa.gov/thredds/catalog/OisstBase/NetCDF/V2.1/AVHRR/198210/catalog.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different from yesterday, where we downloaded a copy of the dataset locally, we will access this data by URL. That means that we won't be downloading it directly. Instead of giving a filepath on our local computer, we are giving and URL from what is called a THREDDS Catalog, and xarray is able to read that." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:  (lat: 720, lon: 1440, time: 1, zlev: 1)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n",
       "  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00\n",
       "  * zlev     (zlev) float32 0.0\n",
       "Data variables:\n",
       "    anom     (time, zlev, lat, lon) float32 ...\n",
       "    err      (time, zlev, lat, lon) float32 ...\n",
       "    ice      (time, zlev, lat, lon) float32 ...\n",
       "    sst      (time, zlev, lat, lon) float32 ...\n",
       "Attributes: (12/38)\n",
       "    title:                           NOAA/NCEI 1/4 Degree Daily Optimum Inter...\n",
       "    source:                          ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pa...\n",
       "    id:                              oisst-avhrr-v02r01.19821007.nc\n",
       "    naming_authority:                gov.noaa.ncei\n",
       "    summary:                         NOAAs 1/4-degree Daily Optimum Interpola...\n",
       "    cdm_data_type:                   Grid\n",
       "    ...                              ...\n",
       "    ncei_template_version:           NCEI_NetCDF_Grid_Template_v2.0\n",
       "    comment:                         Data was converted from NetCDF-3 to NetC...\n",
       "    sensor:                          Thermometer, AVHRR\n",
       "    Conventions:                     CF-1.6, ACDD-1.3\n",
       "    references:                      Reynolds, et al.(2007) Daily High-Resolu...\n",
       "    DODS_EXTRA.Unlimited_Dimension:  time
" ], "text/plain": [ "\n", "Dimensions: (lat: 720, lon: 1440, time: 1, zlev: 1)\n", "Coordinates:\n", " * lat (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n", " * lon (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n", " * time (time) datetime64[ns] 1982-10-07T12:00:00\n", " * zlev (zlev) float32 0.0\n", "Data variables:\n", " anom (time, zlev, lat, lon) float32 ...\n", " err (time, zlev, lat, lon) float32 ...\n", " ice (time, zlev, lat, lon) float32 ...\n", " sst (time, zlev, lat, lon) float32 ...\n", "Attributes: (12/38)\n", " title: NOAA/NCEI 1/4 Degree Daily Optimum Inter...\n", " source: ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pa...\n", " id: oisst-avhrr-v02r01.19821007.nc\n", " naming_authority: gov.noaa.ncei\n", " summary: NOAAs 1/4-degree Daily Optimum Interpola...\n", " cdm_data_type: Grid\n", " ... ...\n", " ncei_template_version: NCEI_NetCDF_Grid_Template_v2.0\n", " comment: Data was converted from NetCDF-3 to NetC...\n", " sensor: Thermometer, AVHRR\n", " Conventions: CF-1.6, ACDD-1.3\n", " references: Reynolds, et al.(2007) Daily High-Resolu...\n", " DODS_EXTRA.Unlimited_Dimension: time" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sst = xr.open_dataset(\"https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/198210/oisst-avhrr-v02r01.19821007.nc\")\n", "\n", "sst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{admonition} 📝 Check your understanding\n", ":class: tip\n", "\n", "What type of data structure is the `sst` object? What are the dimensions, how big is each one, and how many variables are there?\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we go on with our examples for the day, I'm going to do a bit of pre-processing on this data. I'm going to 1) take just the `sst` DataArray (`sst['sst']`), 2) get rid of the vertical depth dimension, `zlev`, since there is just 1 level for sst (the surface)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "sst = sst['sst'].squeeze(dim='zlev', drop=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some Computation\n", "\n", "### Arithmetic" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "sst_kelvin = sst + 273" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{admonition} 🌀 More Info: Broadcasting during arithmetic\n", ":class: note, dropdown\n", "\n", "The reason that this works is because numpy (and therefore xarray) uses a technique called **broadcasting**. You can read more about it [here](https://xarray-contrib.github.io/xarray-tutorial/scipy-tutorial/03_computation_with_xarray.html#Broadcasting)\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Aggregations\n", "\n", "There are a lot (a lot) of built in **methods** that manipulate data. Some common ones are:\n", "\n", "| Function | Description |\n", "| ----------- | ----------- |\n", "| `.max()` | Maximum |\n", "| `.min()` | Minimum |\n", "| `.std()` | Standard deviation |\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' ()>\n",
       "array(33.21, dtype=float32)
" ], "text/plain": [ "\n", "array(33.21, dtype=float32)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sst.max()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' ()>\n",
       "array(-1.8, dtype=float32)
" ], "text/plain": [ "\n", "array(-1.8, dtype=float32)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sst.min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In an earlier lesson we talked about functions/methods as verbs and attributes as adjectives when describing a data object. The methods listed above are examples of these for xarray DataArray objects!\n", "\n", "There are a lot of methods for DataArrays. One list (of many on the internet) is [here](https://www.pythonprogramming.in/numpy-aggregate-and-statistical-functions.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{note}\n", "One way that programming langauges grow is when people build new tools by starting with the tools that someone else already built. This accelerates progress!\n", "\n", "You'll notice that the link above lists functions that are part of the `numpy` library. `xarray` builds on top of `numpy`, so we can use resources that others have made for numpy to help us with xarray. While not every numpy function, a lot of the numpy functions are available for xarray.\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{admonition} 📝 Check your understanding\n", ":class: tip\n", "\n", "What is the mean value of the sst DataArray?\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading documentation\n", "\n", "To practice looking at documentation, let's look at [the docs page](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.max.html) for the `xarray.DataArray.max()` method.\n", "\n", "We notice a few optional arguments - `dim` and `axis`. `dim` takes an integer as an argument and `axis` takes a string. Let's try them out." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' (time: 1, lat: 720)>\n",
       "array([[            nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan,             nan,             nan,\n",
       "                    nan, -1.30999994e+00, -1.32999992e+00,\n",
       "        -1.36000001e+00, -1.36000001e+00, -1.38000000e+00,\n",
       "        -1.15999997e+00, -1.07999992e+00, -1.10000002e+00,\n",
       "        -1.02999997e+00, -1.22000003e+00, -1.33999991e+00,\n",
       "        -1.33999991e+00, -1.36000001e+00, -1.32999992e+00,\n",
       "...\n",
       "         7.00000000e+00,  6.34999990e+00,  5.42000008e+00,\n",
       "         5.27999973e+00,  5.30999994e+00,  5.38999987e+00,\n",
       "         4.42999983e+00,  3.74000001e+00,  3.42999983e+00,\n",
       "         3.13999987e+00,  3.01999998e+00,  2.95000005e+00,\n",
       "         3.09999990e+00,  3.52999997e+00,  3.51999998e+00,\n",
       "         3.34999990e+00,  3.24000001e+00,  2.88999987e+00,\n",
       "         2.49000001e+00,  2.54999995e+00,  1.89999998e+00,\n",
       "         1.14999998e+00,  1.05999994e+00,  7.99999952e-01,\n",
       "         4.29999977e-01,  9.99999978e-03, -2.09999993e-01,\n",
       "        -3.59999985e-01, -4.79999989e-01, -5.50000012e-01,\n",
       "        -6.39999986e-01, -6.99999988e-01, -7.29999959e-01,\n",
       "        -7.59999990e-01, -7.50000000e-01, -7.19999969e-01,\n",
       "        -6.99999988e-01, -6.89999998e-01, -6.80000007e-01,\n",
       "        -6.80000007e-01, -7.29999959e-01, -7.79999971e-01,\n",
       "        -8.19999993e-01, -8.59999955e-01, -8.99999976e-01,\n",
       "        -9.39999998e-01, -9.69999969e-01, -1.00000000e+00,\n",
       "        -1.02999997e+00, -1.05999994e+00, -1.07999992e+00,\n",
       "        -1.10000002e+00, -1.12000000e+00, -1.13999999e+00,\n",
       "        -1.14999998e+00, -1.16999996e+00, -1.17999995e+00,\n",
       "        -1.17999995e+00, -1.18999994e+00, -1.18999994e+00]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00
" ], "text/plain": [ "\n", "array([[ nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, nan, nan,\n", " nan, -1.30999994e+00, -1.32999992e+00,\n", " -1.36000001e+00, -1.36000001e+00, -1.38000000e+00,\n", " -1.15999997e+00, -1.07999992e+00, -1.10000002e+00,\n", " -1.02999997e+00, -1.22000003e+00, -1.33999991e+00,\n", " -1.33999991e+00, -1.36000001e+00, -1.32999992e+00,\n", "...\n", " 7.00000000e+00, 6.34999990e+00, 5.42000008e+00,\n", " 5.27999973e+00, 5.30999994e+00, 5.38999987e+00,\n", " 4.42999983e+00, 3.74000001e+00, 3.42999983e+00,\n", " 3.13999987e+00, 3.01999998e+00, 2.95000005e+00,\n", " 3.09999990e+00, 3.52999997e+00, 3.51999998e+00,\n", " 3.34999990e+00, 3.24000001e+00, 2.88999987e+00,\n", " 2.49000001e+00, 2.54999995e+00, 1.89999998e+00,\n", " 1.14999998e+00, 1.05999994e+00, 7.99999952e-01,\n", " 4.29999977e-01, 9.99999978e-03, -2.09999993e-01,\n", " -3.59999985e-01, -4.79999989e-01, -5.50000012e-01,\n", " -6.39999986e-01, -6.99999988e-01, -7.29999959e-01,\n", " -7.59999990e-01, -7.50000000e-01, -7.19999969e-01,\n", " -6.99999988e-01, -6.89999998e-01, -6.80000007e-01,\n", " -6.80000007e-01, -7.29999959e-01, -7.79999971e-01,\n", " -8.19999993e-01, -8.59999955e-01, -8.99999976e-01,\n", " -9.39999998e-01, -9.69999969e-01, -1.00000000e+00,\n", " -1.02999997e+00, -1.05999994e+00, -1.07999992e+00,\n", " -1.10000002e+00, -1.12000000e+00, -1.13999999e+00,\n", " -1.14999998e+00, -1.16999996e+00, -1.17999995e+00,\n", " -1.17999995e+00, -1.18999994e+00, -1.18999994e+00]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n", " * time (time) datetime64[ns] 1982-10-07T12:00:00" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sst.max(axis=2)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' (time: 1, lon: 1440)>\n",
       "array([[25.96    , 25.99    , 26.029999, ..., 26.17    , 26.099998,\n",
       "        26.029999]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00
" ], "text/plain": [ "\n", "array([[25.96 , 25.99 , 26.029999, ..., 26.17 , 26.099998,\n", " 26.029999]], dtype=float32)\n", "Coordinates:\n", " * lon (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n", " * time (time) datetime64[ns] 1982-10-07T12:00:00" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sst.max(dim='lat')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that when we use these arguments instead of smushing the all the data together and taking the maximum value, we are taking the maximum value along a particular axis of data. Whatever axis we specify in the argument disappears after we take the maximum." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{note}\n", "While we won't cover it here, you can use this paradigm of applying a function on a full dataset or along an axis almost indefinetly in xarray. Even if the function you want to apply isn't a built-in function (maybe it's an algorithm you wrote yourself!), you can apply it using `DataArray.reduce()`.\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{admonition} 📝 Check your understanding\n", ":class: tip\n", "\n", "Look at [the documentation page](https://xarray.pydata.org/en/v2022.03.0/generated/xarray.DataArray.std.html) for the `.std()` function in xarray and [the documentation page](https://numpy.org/doc/stable/reference/generated/numpy.std.html) for `.std()` in numpy.\n", "\n", "- What does the function do? (Use the numpy page)\n", "- Name 1 argument to the function and describe what it does.\n", "- What type of object does the function return?\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering or masking values\n", "\n", "Let's start by looking at how we use booleans with data arrays. We saw previously how we could take single values and compare them with comparisons." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# mask the values with ice or err set" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 < 10" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 'hello'\n", "x == 'hola'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can we make boolean comparisons with xarray data? Turns out we can! We can use the same comparisons (>, <, ==, >=, <=), and it compares every value in the DataArray." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# import numpy as np\n", "\n", "# np.printoptions(threshold=20)\n", "# xr.set_options(display_expand_data=True)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' (time: 1, lat: 720, lon: 1440)>\n",
       "array([[[False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        ...,\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False]]])\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n",
       "  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00
" ], "text/plain": [ "\n", "array([[[False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False],\n", " ...,\n", " [False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False]]])\n", "Coordinates:\n", " * lat (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n", " * lon (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n", " * time (time) datetime64[ns] 1982-10-07T12:00:00" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sst > 15" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What did we get? An array of the same size where each value is a boolean True/False telling us if the condition was true.\n", "\n", "We can even use `and` and `or` like we talked about earlier in the week, but we have to change the syntax:\n", "\n", "* and -> `&`\n", "* or -> `|`" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' (time: 1, lat: 720, lon: 1440)>\n",
       "array([[[False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        ...,\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False],\n",
       "        [False, False, False, ..., False, False, False]]])\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n",
       "  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00
" ], "text/plain": [ "\n", "array([[[False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False],\n", " ...,\n", " [False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False],\n", " [False, False, False, ..., False, False, False]]])\n", "Coordinates:\n", " * lat (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n", " * lon (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n", " * time (time) datetime64[ns] 1982-10-07T12:00:00" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(sst > 15) & (sst < 20)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(sst > 15).isel(time=0).plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{admonition} 📝 Check your understanding\n", ":class: tip\n", "\n", "Create a plot that shows where sst is above 10 degrees.\n", "\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Masking values with `xr.where()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another common kind of data manipulation is to want to give data cells new values based on their old values. For that we will use [`xr.where()`](https://xarray.pydata.org/en/stable/generated/xarray.where.html).\n", "\n", "`xr.where()` takes at least three arguments:\n", "\n", "> `xr.where(condition, true, false)`\n", "\n", "- `condition` should be any type of boolean statement like above that returns a bunch of True/False\n", "- `true` is what xarray should put into any place that has a True value.\n", "\n", "Optionally, you can also add a third argument describing what would happen should happen to the places where the `condition` array is False.\n", "\n", "**Note** better to go with the method version of .where ? https://xarray.pydata.org/en/v2022.03.0/generated/xarray.Dataset.where.html \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we see that anywhere that `sst` is greater than 20, xarray will put the word \"warm\". All other places it will put the word \"cold\"." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' (time: 1, lat: 720, lon: 1440)>\n",
       "array([[['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n",
       "        ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n",
       "        ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n",
       "        ...,\n",
       "        ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n",
       "        ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n",
       "        ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold']]],\n",
       "      dtype='<U4')\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n",
       "  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00
" ], "text/plain": [ "\n", "array([[['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n", " ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n", " ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n", " ...,\n", " ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n", " ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold'],\n", " ['cold', 'cold', 'cold', ..., 'cold', 'cold', 'cold']]],\n", " dtype=' 20, \"warm\", \"cold\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'sst' (time: 1, lat: 720, lon: 1440)>\n",
       "array([[[nan, nan, nan, ..., nan, nan, nan],\n",
       "        [nan, nan, nan, ..., nan, nan, nan],\n",
       "        [nan, nan, nan, ..., nan, nan, nan],\n",
       "        ...,\n",
       "        [nan, nan, nan, ..., nan, nan, nan],\n",
       "        [nan, nan, nan, ..., nan, nan, nan],\n",
       "        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)\n",
       "Coordinates:\n",
       "  * lat      (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n",
       "  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[ns] 1982-10-07T12:00:00
" ], "text/plain": [ "\n", "array([[[nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " ...,\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan],\n", " [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n", " * lon (lon) float32 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9\n", " * time (time) datetime64[ns] 1982-10-07T12:00:00" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xr.where(sst < 0, np.nan, sst)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the conditional doesn't have to be data from the original dataset - it could be one of its coordinates, or even a totally different dataset of the same shape." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "masked = xr.where(sst.lat > 60, 0, sst)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "masked.isel(time=0).plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "interpreter": { "hash": "6092879fba406c8c6ca22f91e04d2ebf6b536b44c8b2e1d9154b002fdf6ee7b3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }