<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Outsider Data Science</title>
<link>https://outsiderdata.netlify.app/</link>
<atom:link href="https://outsiderdata.netlify.app/index.xml" rel="self" type="application/rss+xml"/>
<description>Putting what&#39;s in there, out there. With R!</description>
<generator>quarto-1.6.40</generator>
<lastBuildDate>Thu, 05 Dec 2024 05:00:00 GMT</lastBuildDate>
<item>
  <title>Predicting Water Quality in New York Harbor</title>
  <dc:creator>Art Steinmetz</dc:creator>
  <link>https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster.html</link>
  <description><![CDATA[ 




<section id="motivation" class="level2">
<h2 class="anchored" data-anchor-id="motivation">Motivation</h2>
<p>This is an exercise in using machine learning to predict the level of harmful bacteria in New York Harbor based on environmental factors like tidal conditions, rainfall and location. Among the reasons this is useful is understanding how to rebuild a marine life ecosystem in the harbor, where oysters were a keystone species. We use the R data science language, <a href="https://www.tidymodels.org/">Tidymodel</a> tools from Posit.co to build the model and public data provided by <a href="https://www.billionoysterproject.org/">The Billion Oyster Project</a> using volunteer water sampling.</p>
<p>The model shows some ability to predict “safe” and “unacceptable” bacterial concentrations but false predictions are quite high.</p>
</section>
<section id="oyster-farm-to-the-world" class="level2">
<h2 class="anchored" data-anchor-id="oyster-farm-to-the-world">Oyster Farm to the World</h2>
<p>New York Harbor is one of the world’s greatest natural harbors. Those of us who live in New York City today see it mainly as an obstacle that we travel over or under. In our more reflective moods we see the history of what was in the piers. We can easily imagine the enormous ship traffic the piers once supported. What we can’t see are the echoes of the vibrant ecosystem of marine life that once thrived beneath the surface. The harbor sustained a rich diversity of marine life that provided abundant sustenance for the indigenous people before the arrival of Europeans and later for the colonists and migrants who arrived in their millions.</p>
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/img/George_Schlegel_-_George_Degen_-_New_York_1873.jpg" class="img-fluid"></p>
<p>Oysters were the keystone species for this ecosystem. They were so abundant that they, as filter feeders, filtered the entire volume of the harbor every few days. They provided habitat for other marine life and helped to stabilize the shoreline. At one point fully a third of all the World’s oyster harvest came from New York Harbor. Pearl Street in Manhattan was the site of a giant oyster shell midden left by the Lenape people. The oyster was the food of rich and poor alike. Stories of the colorful characters in Gilded Age New York are replete with oyster orgies at Delmonico’s restaurant. Oysters were so important to the economy of the city that before it became “The Big Apple” it was called “The Big Oyster.”</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/img/delmonicos_1899.png" class="img-fluid figure-img"></p>
<figcaption>Delmonico’s 1899. Oysters Have Pride of Place.(25 cents, not dollars!)</figcaption>
</figure>
</div>
<p>Alas, by the late 19th century, the oyster beds were so depleted that the oyster industry collapsed. The risk of over-harvesting was recognized as early as 1715 when “An Act for Preserving of Oysters” was passed but the law was suspended in 1807 and harvesting became relentless. This was also the harbinger of the harbor’s decline as species up the food chain suffered in parallel. Pollution was the final blow. Industry along the Hudson River flushed toxins downstream and, in 1856, the <a href="https://www.nyc.gov/site/dep/water/combined-sewer-overflows.page">“Combined Sewer Outflow”</a> system was established. This combines storm water drainage with raw sewage sent to treatment plants. During heavy rain, the system overflows and raw sewage is pumped into the harbor. After the passage of the Clean Water Act in 1972 industrial pollution declined but, incredibly, the CSO remains and is the major source of pollutants around the city.</p>
<p>Today we understand the need to support keystone species as vital to an ecosystem. <a href="https://www.billionoysterproject.org/">The Billion Oyster Project (BOP)</a> is a non-profit organization whose mission is to restore oyster reefs to New York. They are doing this work in collaboration collaboration with schools, universities, and government agencies. I have been a supporter of their mission for several years now. Getting the public to understand the importance of clean water is an important part of this effort. The project has been using volunteers to collect water quality data from the harbor since 2014 and the data are available to the public in a <a href="https://docs.google.com/spreadsheets/d/1813b2nagaxZ80xRfyMZNNKySZOitro5Nt7W4E9WNQDA/edit?usp=sharing">water quality spreadsheet</a>.</p>
<p>Downloading the full spreadsheet is a bit slow and there is some cleaning involved so we’ll use data files we previously created. If you want to replicate this the code for downloading and processing the spreadsheet is below.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Click on the “► Code” buttons throughout this document to see the source code for the project.
</div>
</div>
<div class="callout-body-container callout-body">

</div>
</div>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(here)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(googlesheets4)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(rvest)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(duckplyr)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(arrow)</span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(leaflet)</span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(htmltools)</span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sf)</span>
<span id="cb1-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span>
<span id="cb1-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span>
<span id="cb1-12"></span>
<span id="cb1-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># enterococci thresholds</span></span>
<span id="cb1-14">SAFE <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">34</span></span>
<span id="cb1-15">CAUTION <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">104</span></span></code></pre></div>
</details>
</div>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get Water Quality Data from BOP ----------------------------------</span></span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># no need for a google account to access this sheet</span></span>
<span id="cb2-4">googlesheets4<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gs4_deauth</span>()</span>
<span id="cb2-5"></span>
<span id="cb2-6">wq_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb2-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://docs.google.com/spreadsheets/d/1813b2nagaxZ80xRfyMZNNKySZOitro5Nt7W4E9WNQDA/edit?usp=sharing"</span></span>
<span id="cb2-8"></span>
<span id="cb2-9"></span>
<span id="cb2-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># DOWNLOAD meta data worksheet</span></span>
<span id="cb2-11">wq_meta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gs4_get</span>(wq_url)</span>
<span id="cb2-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># take 400 rows of metadata to accomodate future growth in testing stations (up to 400 &lt;grin&gt;)</span></span>
<span id="cb2-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># assumes row 10 is column names, column A is site ID which is duplicated in column D</span></span>
<span id="cb2-14">wq_meta_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_sheet</span>(wq_url,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Information"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">range =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B10:BA400"</span>)</span>
<span id="cb2-15"></span>
<span id="cb2-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Clean Meta Station Data ----------------------------------</span></span>
<span id="cb2-17"></span>
<span id="cb2-18">wq_meta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_meta_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-19">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove empty columns by selecting only columns where names starts with a letter</span></span>
<span id="cb2-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matches</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">."</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-21">  janitor<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">clean_names</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-23">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove empty rows</span></span>
<span id="cb2-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(site)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">site_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(site_id)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-26">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># why these come in as a list of 1 is beyond me and one value is NULL</span></span>
<span id="cb2-27">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this is some complicated dplyr-fu.</span></span>
<span id="cb2-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">district_council_number =</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(</span>
<span id="cb2-29">    district_council_number,  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(.x), <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, .x)</span>
<span id="cb2-30">  )))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-31">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">harmonic_noaa_tide_stations =</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(</span>
<span id="cb2-32">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(harmonic_noaa_tide_stations,  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(.x), <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, .x))</span>
<span id="cb2-33">  ))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-34">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># FIXED as of Aug 2024.  some longitudes are erroneously positive</span></span>
<span id="cb2-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">longitude =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(longitude <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>longitude, longitude)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-36">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">currently_testing =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.logical</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(currently_testing), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-37">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename_with</span>( <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nyc_dep_wrrf_or_sewershed"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"associated"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-38">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># make NA entries in character columns actual NA so there is only one kind of NA</span></span>
<span id="cb2-39">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N/A"</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>, x))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-40">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># some columns are NA because they are in NJ. Make "NJ" the value</span></span>
<span id="cb2-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-42">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">district_council_number =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(</span>
<span id="cb2-43">      district_council_number <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N/A"</span>,</span>
<span id="cb2-44">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NJ"</span>,</span>
<span id="cb2-45">      district_council_number</span>
<span id="cb2-46">    )</span>
<span id="cb2-47">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nyc_dep_wrrf_or_sewershed =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(</span>
<span id="cb2-49">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(nyc_dep_wrrf_or_sewershed),</span>
<span id="cb2-50">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NJ"</span>,</span>
<span id="cb2-51">    nyc_dep_wrrf_or_sewershed</span>
<span id="cb2-52">  )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-53">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-54">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nys_dec_water_body_classification =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(</span>
<span id="cb2-55">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(nys_dec_water_body_classification),</span>
<span id="cb2-56">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NJ"</span>,</span>
<span id="cb2-57">      nys_dec_water_body_classification</span>
<span id="cb2-58">    )</span>
<span id="cb2-59">  )</span>
<span id="cb2-60"></span>
<span id="cb2-61"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># save meta data as parquet file</span></span>
<span id="cb2-62"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># arrow::write_parquet(wq_meta,here("data/wq_meta.parquet"))</span></span>
<span id="cb2-63"></span>
<span id="cb2-64"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># DOWNLOAD water quality data worksheet ---------------------------------------------</span></span>
<span id="cb2-65">wq_data_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_sheet</span>(wq_url,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Data"</span>)</span>
<span id="cb2-66"></span>
<span id="cb2-67">data_names <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site_id"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"date"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"year"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"month"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"high_tide"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sample_time"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bacteria"</span>,</span>
<span id="cb2-68">                <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t0"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t1"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t2"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t3"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t4"</span>,</span>
<span id="cb2-69">                <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t5"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_t6"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"notes"</span>)</span>
<span id="cb2-70"></span>
<span id="cb2-71"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get average sample time and use that for NA sample times</span></span>
<span id="cb2-72">sample_time_avg <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data_raw<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Sample Time</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-73">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-74">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.POSIXct</span>()</span>
<span id="cb2-75"></span>
<span id="cb2-76"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># function to extract day of week from date</span></span>
<span id="cb2-77">day_of_week <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) {</span>
<span id="cb2-78">  x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-79">    lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">wday</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">week_start =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-80">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sun"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Mon"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tue"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Wed"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Thu"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fri"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sat"</span>))</span>
<span id="cb2-81">}</span>
<span id="cb2-82"></span>
<span id="cb2-83"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># clean up data</span></span>
<span id="cb2-84">wq_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-85">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_names</span>(data_names) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-86">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_date</span>(date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-87">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># change NA sample times to average of all sample times. Good idea?</span></span>
<span id="cb2-88">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(sample_time),sample_time_avg,sample_time)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-89">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_time =</span> hms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_hms</span>(sample_time)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-90">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_day =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">day_of_week</span>(date),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.before =</span> bacteria) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-91">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">high_tide =</span> hms<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_hms</span>(high_tide)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-92">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add date to sample time</span></span>
<span id="cb2-93">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_time =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ymd_hms</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(date,sample_time),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tz=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"America/New_York"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-94">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">high_tide =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ymd_hms</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(date,high_tide),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">tz=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"America/New_York"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-95">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.list), as.character)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-96">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># We chose to make "Trace" and "&lt;10" into zero.</span></span>
<span id="cb2-97">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.fns =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(.x, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;10"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-98">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.fns =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(.x, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Trace"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"0"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-99">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># &gt; 24196 test limit?  This value is so far out of the range of the other values</span></span>
<span id="cb2-100">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># that it might as well be infinity.</span></span>
<span id="cb2-101">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.fns =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(.x, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-102">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get rid of snow inches next to precip as water</span></span>
<span id="cb2-103">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.fns =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(.x, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">(.+</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-104">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.fns =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">na_if</span>(.x, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"N/A"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-105">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">contains</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip"</span>), as.numeric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-106">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bacteria =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(bacteria)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-107">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">notes =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replace_na</span>(notes, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-108">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fix some typos</span></span>
<span id="cb2-109">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">site =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(site, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Daylighted Section"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"daylighted section"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-110">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">site =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace</span>(site, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Govenors"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Governors"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-111">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># classify bacteria levels according to NY DEP standards</span></span>
<span id="cb2-112">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">site =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(site)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-113">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">site_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(site_id))</span>
<span id="cb2-114"></span>
<span id="cb2-115">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># write to parquet</span></span>
<span id="cb2-116">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># arrow::write_parquet(wq_data,here("data/wq_data.parquet"))</span></span></code></pre></div>
</details>
</div>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># load data files</span></span>
<span id="cb3-2">water_body_classifications <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/NYDEC_water_classifications.csv"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_types =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fcc"</span>)</span>
<span id="cb3-3">wq_meta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> arrow<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_parquet</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/wq_meta.parquet"</span>)</span>
<span id="cb3-4">wq_data_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> arrow<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_parquet</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/wq_data.parquet"</span>)</span></code></pre></div>
</details>
</div>
</section>
<section id="data-cleaning" class="level2">
<h2 class="anchored" data-anchor-id="data-cleaning">Data Cleaning</h2>
<p>While we cleaned up the data in the previous section, we still might want to do some feature engineering to get the data ready for modeling. In particular, the bacterial levels are classified according to NY DEP standards as “SAFE” or “UNSAFE” with a another category in between that indicates heightened vigilance which we’ll call “CAUTION,” so let’s make a column for that.</p>
<p>The data contains rainfall amounts for each day in the week preceding the sample date. The NY DEP standard is to use the 48-hour rainfall amount as a predictor of bacterial levels. We’ll aggregate up the rainfall columns into three non-overlapping intervals, same day, previous 48 hours and earlier days in the week. It’s worth noting that rainfall for the sample day is for the entire 24-hour period and not just preceding the time of the sample.</p>
<p>While we have the precise location of each sampling site it might be useful to have general location information. This will include the water body classification, the sewershed, the name of the body of water and the lab processing the samples.</p>
<p>The month is a cyclical feature. December, month number 12, is closer in climate to month number 1 than it is to month number 6, so we’ll recode the month numbers to reflect that, setting August as the hottest month.</p>
<p>Finally, we’ll convert all NAs in the factor columns to a “missing” level. This will let us keep rows with NA in the models but all missing will have the same value. This is a choice that could be revisited later.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get all levels of lab_analysis columns matched with year</span></span>
<span id="cb4-2">all_labs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_meta <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">contains</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lab_analysis"</span>),site_id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">contains</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>),</span>
<span id="cb4-5">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"year"</span>,</span>
<span id="cb4-6">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lab"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_extract</span>(year,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"[0-9]+"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(lab)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(lab <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NA"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(lab))</span>
<span id="cb4-11">  </span>
<span id="cb4-12"></span>
<span id="cb4-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># convert all N/As in factor columns to a "missing" level</span></span>
<span id="cb4-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This will let us keep rows with NA in the models but all missing will have the same value</span></span>
<span id="cb4-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># good or bad?</span></span>
<span id="cb4-16">wq_meta <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_meta <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.factor), <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_na_value_to_level</span>(.x,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">level =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"missing"</span>)))</span>
<span id="cb4-18"></span>
<span id="cb4-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># feature engineering</span></span>
<span id="cb4-20">wq_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-21">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># use of dplyr pipe needed for duckplyr, not native "|&gt;" pipe</span></span>
<span id="cb4-22">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># make 2-day precip column since 48-hour precip is a DEP standard</span></span>
<span id="cb4-23">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># since observation time is typically in the morning don't include the current day's precip</span></span>
<span id="cb4-24">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># since we don't know if it came before, during or after collection</span></span>
<span id="cb4-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">precip_week =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowSums</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(., <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip"</span>)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.after=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bacteria"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">precip_48 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowSums</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(., precip_t1,precip_t2), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.after=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bacteria"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">precip_earlier =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowSums</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(., precip_t3,precip_t4,precip_t5,,precip_t6), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.after=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"precip_48"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>precip_t1,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>precip_t2,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>precip_t3,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>precip_t4,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>precip_t5,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>precip_t6) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-29">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># categorize bacteria levels as quality levels</span></span>
<span id="cb4-30">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">quality =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cut</span>(</span>
<span id="cb4-31">    bacteria,</span>
<span id="cb4-32">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,SAFE, CAUTION, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">Inf</span>),</span>
<span id="cb4-33">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Caution"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unsafe"</span>)</span>
<span id="cb4-34">  ))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-35">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># compute time between sample time and high tide</span></span>
<span id="cb4-36">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time_since_high_tide =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">difftime</span>(sample_time,high_tide,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">units =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hours"</span>)),</span>
<span id="cb4-37">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.after =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sample_time"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-38">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>()</span>
<span id="cb4-39"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add some meta data that might be interesting in prediction</span></span>
<span id="cb4-40"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add water body type,  water body and sewershed from metadata</span></span>
<span id="cb4-41">wq_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site_id"</span>,<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(wq_meta,</span>
<span id="cb4-43">                   site_id,</span>
<span id="cb4-44">                   water_body,</span>
<span id="cb4-45">                   nys_dec_water_body_classification,</span>
<span id="cb4-46">                   nyc_dep_wrrf_or_sewershed)</span>
<span id="cb4-47">            ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-48">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># change all character columns to factors</span></span>
<span id="cb4-49">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.character), as.factor)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-50">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># rename columns</span></span>
<span id="cb4-51">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">water_class =</span> nys_dec_water_body_classification,</span>
<span id="cb4-52">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sewershed =</span> nyc_dep_wrrf_or_sewershed)</span>
<span id="cb4-53"></span>
<span id="cb4-54"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a column that corresponds the relative temperature of the month</span></span>
<span id="cb4-55"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># July is the hottest month in NYC</span></span>
<span id="cb4-56">wq_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-57">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">season =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">month</span>(date), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))))</span>
<span id="cb4-58"></span>
<span id="cb4-59">wq_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-60">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(all_labs,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site_id"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"year"</span>)) </span>
<span id="cb4-61"></span>
<span id="cb4-62"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># convert all N/As in factor columns to a "missing" level</span></span>
<span id="cb4-63"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This will let us keep rows with NA in the models but all missing will have the same value</span></span>
<span id="cb4-64"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># good or bad?</span></span>
<span id="cb4-65"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#wq_data &lt;- wq_data |&gt;</span></span>
<span id="cb4-66"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   mutate(across(where(is.factor), ~ fct_na_value_to_level(.x,level = "missing")))</span></span>
<span id="cb4-67">wq_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-68">   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lab =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_na_value_to_level</span>(lab,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">level =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"missing"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-69">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>()</span></code></pre></div>
</details>
</div>
</section>
<section id="exploratory-data-analysis" class="level2">
<h2 class="anchored" data-anchor-id="exploratory-data-analysis">Exploratory Data Analysis</h2>
<p>Let’s start by looking at a map of all the sampling sites. Many sites (in red, below) are no longer being tested. The upper half of Manhattan, both on the Hudson side and the East River side are not included in the most recent data. We have to be careful about drawing conclusions about changes over time for the whole sampling set because the sites are not constant.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># convert wq_meta into a simple features object</span></span>
<span id="cb5-2">wq_meta_sf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_meta <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(water_body_classifications,</span>
<span id="cb5-4">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nys_dec_water_body_classification"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"water_body_class"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_as_sf</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">coords =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"longitude"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"latitude"</span>),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">crs =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4326</span>)</span>
<span id="cb5-6"></span>
<span id="cb5-7"></span>
<span id="cb5-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># display a leaflet map --------------------------------------------------------</span></span>
<span id="cb5-9">map_labels <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;strong&gt;{wq_meta_sf$site}&lt;/strong&gt;&lt;br/&gt;</span></span>
<span id="cb5-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                          Currently Testing? &lt;/strong&gt;{wq_meta_sf$currently_testing}&lt;br/&gt;</span></span>
<span id="cb5-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                          Sewershed: {wq_meta_sf$nyc_dep_wrrf_or_sewershed}&lt;br/&gt;</span></span>
<span id="cb5-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">                         Use: {wq_meta_sf$best_uses}"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lapply</span>(htmltools<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>HTML)</span>
<span id="cb5-14"></span>
<span id="cb5-15">wq_meta_sf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leaflet</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addTiles</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addCircleMarkers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">radius =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb5-19">                            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>map_labels,</span>
<span id="cb5-20">                            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(currently_testing,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>))</span></code></pre></div>
</details>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item" id="htmlwidget-311c2242efcfd54e880b" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-311c2242efcfd54e880b">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"&copy; <a href=\"https://openstreetmap.org/copyright/\">OpenStreetMap<\/a>,  <a href=\"https://opendatacommons.org/licenses/odbl/\">ODbL<\/a>"}]},{"method":"addCircleMarkers","args":[[40.50138,40.797291,40.801921,40.862411,40.82449,40.8178447,40.8097,40.8144,40.832385,40.8321416,40.691375,40.678574,40.5835405,40.5844306,40.5810247,40.5825464,40.5791505,40.748051,40.706887,40.7252056,40.723181,40.7044059,40.77719,40.709882,40.743849,40.7168567,40.769516,40.81012,40.793278,40.73157,40.72191,40.718943,40.700789,40.699967,40.709208,40.709865,40.6968031,40.7103,40.7142915,40.7130241,40.710069,40.7333646,40.8091363,40.8048868,40.7055253,40.730086,40.771964,40.7634239,40.760445,40.7598868,40.760543,40.675486,40.6761467,40.6783319,40.676586,40.674423,40.6768139,40.5853425,40.60138,40.662968,40.763925,40.859316,40.807001,40.688829,40.842404,40.806018,40.806394,40.7910125,40.874568,40.864361,40.864388,40.856102,40.78917,40.9566318,40.9566318,40.7413476,40.739174,40.739183,40.740726,40.7391216,40.748265,40.8674926,40.9566318,40.751744,40.7215745,40.728249,40.752502,40.764275,40.771322,40.7735219,40.7799272,40.7553196,40.799369,40.818007,40.8206007,40.8283762,40.832214,40.8461433,40.781422,40.7859195,40.9379388,40.8634849,40.6278942,40.602409,40.6264122,40.5824832,40.64512,40.74727,40.7801708,40.7375594,40.7292914,40.7429327,40.7378927,40.7167533,40.7120691,40.739025,40.7355189,40.739534,40.7385105,40.7248845,40.736974,40.655305,40.571734,40.601792,40.599456,40.5422,40.572109,40.5665366,40.549642,40.579248,40.48334,40.48769,40.47483,40.54067,40.48826,40.50007,40.9356552,40.583419,40.6546621,40.6548329,40.645481,40.639326,40.8898985,40.728539],[-74.25416,-73.916026,-73.92408399999999,-73.874364,-73.88505000000001,-73.881072,-73.8682,-73.8708,-73.88282100000001,-73.88292800000001,-74.01253199999999,-74.018372,-73.9927194,-73.990388,-73.997941,-73.97483699999999,-73.98747,-73.954628,-74.001142,-73.959234,-73.962509,-73.990526,-73.94211,-73.989594,-73.960111,-73.96717700000001,-73.935593,-73.80231999999999,-73.848917,-73.96176,-73.96301,-73.96574099999999,-73.996922,-73.99807,-73.988505,-73.988629,-73.99888,-73.980614,-73.968192,-73.96871400000001,-73.96965,-73.973866,-73.80270299999999,-73.79435599999999,-73.970263,-73.961404,-73.85035000000001,-73.843532,-73.849571,-73.85146,-73.83645799999999,-73.990887,-73.992577,-73.98931399999999,-73.98961,-73.996425,-73.98991599999999,-73.99837890000001,-74.0125,-74.133458,-74.08723000000001,-74.032318,-74.05710000000001,-74.11227599999999,-73.929444,-73.930089,-73.9303903,-73.927415,-73.917539,-73.91591200000001,-73.915755,-73.922073,-73.93277,-73.89769,-73.89769,-74.025702,-74.010909,-74.011194,-74.009682,-74.010559,-74.023647,-73.93291499999999,-73.89769,-74.022864,-74.01355700000001,-74.013786,-74.00897999999999,-74.002129,-73.995991,-73.99377800000001,-73.989316,-74.026094,-73.97579500000001,-73.96212800000001,-73.959577,-73.954165,-73.95148500000001,-73.9464,-73.98835,-73.98522699999999,-73.903492,-73.82018600000001,-73.8837,-73.93171599999999,-73.904386,-73.920157,-74.12563,-73.80655,-73.768804,-73.83879,-73.93728,-73.939319,-73.945885,-73.92292500000001,-73.93129399999999,-73.952972,-73.94509100000001,-73.95224399999999,-73.95996100000001,-73.9258,-73.94683999999999,-73.96420999999999,-74.21247700000001,-74.256497,-74.26831799999999,-74.1254,-74.08794,-74.0914556,-74.11161800000001,-74.07495299999999,-74.2698,-74.38409,-74.35586000000001,-74.51219,-74.43384,-74.27719,-73.90179500000001,-73.945179,-74.01834700000001,-74.01824499999999,-74.074752,-74.038539,-73.891811,-73.83408],5,null,null,{"interactive":true,"className":"","stroke":true,"color":["green","green","green","red","green","green","green","green","green","red","green","green","green","red","green","red","red","red","green","green","green","green","green","green","green","green","green","green","red","green","green","green","red","green","green","green","green","red","red","green","green","green","green","green","green","green","green","green","green","red","green","red","red","red","red","red","green","green","green","green","green","green","green","green","red","red","green","green","red","red","red","red","green","red","red","green","green","green","green","green","green","red","green","green","green","green","green","green","green","green","red","green","red","red","red","red","red","green","red","red","green","red","red","red","green","red","green","green","green","green","green","green","red","green","green","red","red","green","green","red","red","green","green","green","green","green","green","green","green","red","green","green","green","green","green","green","green","red","green","green","red","green","red","red"],"weight":5,"opacity":0.5,"fill":true,"fillColor":["green","green","green","red","green","green","green","green","green","red","green","green","green","red","green","red","red","red","green","green","green","green","green","green","green","green","green","green","red","green","green","green","red","green","green","green","green","red","red","green","green","green","green","green","green","green","green","green","green","red","green","red","red","red","red","red","green","green","green","green","green","green","green","green","red","red","green","green","red","red","red","red","green","red","red","green","green","green","green","green","green","red","green","green","green","green","green","green","green","green","red","green","red","red","red","red","red","green","red","red","green","red","red","red","green","red","green","green","green","green","green","green","red","green","green","red","red","green","green","red","red","green","green","green","green","green","green","green","green","red","green","green","green","green","green","green","green","red","green","green","red","green","red","red"],"fillOpacity":0.2},null,null,null,null,["<strong>Arthur Kill, Conference House Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Oakwood Beach<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx Kill, east end<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx Kill, west end<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Bronx Botanical Gardens<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Concrete Plant Park Canoe Launch<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Hunts Point Riverside Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Soundview Park (mouth)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Soundview Park, HP009 CSO<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Starlight Park, North Dock<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Bronx River, Starlight Park, south of dam<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Buttermilk Channel, Pier 101, Governors Island<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Buttermilk Channel, Valentino Pier<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Coney Island Creek, Calvert Vaux Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Coney Island Creek, Calvert Vaux Park<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Coney Island Creek, Kaiser Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Coney Island Creek, Shell Road (head of creek)<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Coney Island Creek, West 21st Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>East River, Anable Basin<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Brooklyn Bridge Beach, Manhattan<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Bushwick Inlet<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Bushwick Inlet Park Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Dumbo Cove<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, E 90th St Ferry<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Esplanade (+Pool)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Gantry State Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Grand Ferry Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Hallets Cove<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Hammond Creek (HC)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Hermon A. MacNeil Park<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Tallmans Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, India Street<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Marsha P Johnson State Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, North 3rd Street<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Pier 1, Brooklyn Bridge Park<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Pier 2 Kayak Dock, Brooklyn Bridge Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Pier 35 (+POOL), End-Pier<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Pier 35 (+POOL), Mid-Pier<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Pier 4 Beach, Brooklyn Bridge Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Pier 42/Jackson Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, South 3rd Street/Domino Park<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, South 5th Street/Domino Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, South 8th St<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Stuy Cove Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, SUNY Maritime Campus Entrance (IT)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, SUNY Maritime Waterfront Center (MAR)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, Wallabout Channel, Brooklyn Navy Yard<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>East River, WNYC Transmitter Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Flushing Bay, 28th Avenue, Big Rock Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Tallmans Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Flushing Bay, World's Fair Marina Boat Ramp<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Flushing Bay, World's Fair Marina Pier 1 East<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Flushing Bay, World's Fair Marina Pier 1 West<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Flushing Creek<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Tallmans Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Gowanus Canal, 2nd Avenue Salt Lot<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Gowanus Canal, Bond Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Gowanus Canal, Carroll Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Gowanus Canal, Denton's Pond Outfall<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Gowanus Canal, Lowlands Nursery<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Gowanus Canal, Second Street Sponge Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Red Hook<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Gravesend Bay, Calvert Vaux Cove<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Gravesend Bay, OH-015<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Hackensack River, Bayonne City Park, Bayonne NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Hackensack River, Laurel Hill Park, Secaucus NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Hackensack River, Ridgefield Park, NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Hackensack River, River Barge Park, Carlstadt NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Hackensack River, Rutkowski Park, Bayonne NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Harlem River, High Bridge<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, Lincoln Avenue 1<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, Lincoln Avenue 2<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, Little Hell Gate<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, Muscota Marsh<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, North Cove<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, North Cove Spring<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, Swindlers Cove/Peter J. Sharpe Boathouse<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Harlem River, Water's Edge Garden Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Wards Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Beczak Beach #1<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Beczak Beach #2<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Frank Sinatra Park, Hoboken NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Gansevoort Peninsula South – Middle<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Gansevoort Peninsula South – Ramp<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Gansevoort Peninsula, North<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Gansevoort Peninsula, South<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Hoboken Cove Beach, Hoboken NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Inwood Canoe Club<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, JFK Marina<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 13, Hoboken, NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 26<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 40<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 66<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 84<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 96<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier 99 Boat Launch<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Pier I/70th Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Weehawken Cove, Hoboken NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 100th Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 129th Street, St. Clair Place<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 133rd Street/Piers Park Kayak Dock<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 145th Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 154th Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 172nd Street, Riverside Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 72nd Street Kayak Launch<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, West 79th Street Boat Basin<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: North River<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hudson River, Yonkers Paddling and Rowing Club<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Hutchinson River, Amtrak Bridge<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Hunts Point<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Jamaica Bay, Canarsie Pier<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Jamaica Bay, Gerritsen Creek Kayak Launch (Marine Park Salt Marsh)<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Jamaica Bay, Paerdegat Basin, Sebago Canoe Club<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Jamaica Bay, Plumb Beach<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Kill Van Kull, Brady’s Dock, Bayonne NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Kissena Lake<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Tallmans Island<br/>\nUse: Swimming and other recreation and fishing","<strong>Little Neck Bay<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Tallmans Island<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Meadow Lake, Flushing Meadows-Corona Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: Swimming and other recreation and fishing","<strong>Newtown Creek, Apollo Street<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, Dutch Kills (head)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, Dutch Kills (mouth)<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, East Branch<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, English Kills<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, North Brooklyn Community Boathouse/Pulaski Bridge<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, North Henry Street<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, Pulaski Bridge, Queens<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, Second Street Kayak Launch<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Bowery Bay<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, Turning Basin<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Newtown Creek, Whale Creek<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Newtown Creek<br/>\nUse: (marine waters) Fishing but these waters may not support fish propagation","<strong>Prospect Park Lake<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Owls Head<br/>\nUse: Swimming and other recreation and fishing","<strong>Rahway River, Carteret Waterfront Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Rahway River, Rahway Valley Sewerage Authority<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Rahway River, Riverfront Park<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Raritan Bay, Great Kills Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Oakwood Beach<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Raritan Bay, Midlland Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Oakwood Beach<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Raritan Bay, New Dorp Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Oakwood Beach<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Raritan Bay, Oakwood Beach<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Oakwood Beach<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Raritan Bay, South Beach<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Oakwood Beach<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Raritan Bay, Waterfront Park, South Amboy, NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Raritan River, Edison Boat Basin, Edison NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Raritan River, Ken Buchanan Park, Sayreville NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Raritan River, Riverside Park, Piscataway NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Raritan River, Rutgers Boathouse, New Brunswick NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Raritan River, Second Street Park, Perth Amboy NJ<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: NA","<strong>Saw Mill River, daylighted section<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: NJ<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Sheepshead Bay<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Coney Island<br/>\nUse: (marine waters) Secondary contact recreation and fishing","<strong>Upper Harbor, Bush Terminal Park (Inner Lagoon)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Upper Harbor, Bush Terminal Park (North Embayment)<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Upper Harbor, North Shore Esplanade<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Port Richmond<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Upper Harbor, Pier 69<\/strong><br/>\n Currently Testing? <\/strong>TRUE<br/>\n Sewershed: Owls Head<br/>\nUse: (marine waters) Swimming and other recreation and fishing","<strong>Van Cortlandt Park Lake<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Wards Island<br/>\nUse: Swimming and other recreation and fishing","<strong>Willow Lake, Flushing Meadows-Corona Park<\/strong><br/>\n Currently Testing? <\/strong>FALSE<br/>\n Sewershed: Bowery Bay<br/>\nUse: Swimming and other recreation and fishing"],{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[40.47483,40.9566318],"lng":[-74.51219,-73.768804]}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>The data contains the “sewershed” for each site. Sewersheds are areas of the city that drain into the rivers from a particular section of the sewer system and treatment plant. It may be that particular sewersheds are more likely to overflow during heavy rain and thus more prone to high bacteria levels. We shall see.</p>
<iframe width="100%" height="520" frameborder="0" src="https://openseweratlas.tumblr.com/dryweathermap">
</iframe>
<p>A problem we often face with prediction models is the data is not nicely distributed. Bacteria levels in our dataset are very skewed. We get a more balanced distribution if we bin the levels into DEP quality categories. We will use this as our target variable for prediction.</p>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ggplot histogram of bacteria levels</span></span>
<span id="cb6-2">wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> bacteria)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Histogram of Bacteria Levels"</span>,</span>
<span id="cb6-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bacteria Levels"</span>,</span>
<span id="cb6-7">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span>
<span id="cb6-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ggplot histogram of quality levels</span></span>
<span id="cb6-10">wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> quality)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_bar</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Histogram of Bacteria Quality Levels"</span>,</span>
<span id="cb6-14">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Quality Levels"</span>,</span>
<span id="cb6-15">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
</details>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="cell quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/bacteria_hist-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<div class="cell quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/quality_hist-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
</div>
<p>Similarly, we can look at the distributions for the other numeric features in our dataset. We have already converted month numbers into a seasonality feature so we won’t use “month” as a predictor. We see that most of the observations happen in the warm months.</p>
<p>Precipitation shows a skewed distribution. This is an example of how EDA can help us think about the model features. Most days it does not rain but the wider time windows are slightly more evenly distributed, so let’s just use the precipitation for the entire prior week in the model.</p>
<p>Tide times are evenly distributed, as we would expect.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># distribution of numeric features</span></span>
<span id="cb7-2">wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>year,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>bacteria) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(is.numeric) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gather</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> value)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>key, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distribution of All Numeric Variables"</span>,</span>
<span id="cb7-11">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Count"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/feature_dist-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We might be interested in examining quality trends over time. This has to be done with caution because the number of reporting stations has increased over time, excepting during the COVID crisis. Let’s look at trends using only those stations that have been reporting for the last ten years. We add annual rainfall to see if there is any obvious correlation. Alas, water quality (defined as bacteria levels) has not been improving in the last decade.</p>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># show that reporting stations have increased over time</span></span>
<span id="cb8-2">wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(date)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(site_id)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Reporting Stations Have Steadily Increased"</span>,</span>
<span id="cb8-9">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Year"</span>,</span>
<span id="cb8-10">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of Stations"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span>
<span id="cb8-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get average daily rainfall by year</span></span>
<span id="cb8-13">annl_rain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">annual_rainfall =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(precip_week,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">52</span>)</span>
<span id="cb8-16"></span>
<span id="cb8-17"></span>
<span id="cb8-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot median bacteria levels over time only for sites that have been reporting for the all time periods</span></span>
<span id="cb8-19">wq_10 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-20">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># filter out stations that have been reporting for less than 10 years</span></span>
<span id="cb8-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_years =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(year)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(n_years <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(site) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(wq_data, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"site"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(date) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.Date</span>())<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span>)</span>
<span id="cb8-28"></span>
<span id="cb8-29">last_obs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>date <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>()</span>
<span id="cb8-30">first_obs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(last_obs)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb8-31">rain_axis <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb8-32"></span>
<span id="cb8-33">wq_10 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-34">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-35">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(year <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2011</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-36">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">median_bacteria =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(bacteria)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-37">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(annl_rain) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-38">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-39">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> median_bacteria)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-40">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> annual_rainfall), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-41">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">34</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2016</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">34</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'"Safe" Level = 34 Colonies'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vjust =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-43">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># label the y-axes</span></span>
<span id="cb8-44">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2000</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-45">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># put totat_rainfall on secondary y-axis</span></span>
<span id="cb8-46">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sec.axis =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sec_axis</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>.<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>rain_axis, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Annual Rainfall (Blue Line"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-47">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Median Bacteria Concentration (Blue Bar)"</span>,</span>
<span id="cb8-48">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_to_title</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"water is not getting cleaner over time"</span>),</span>
<span id="cb8-49">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sites in NY Harbor Reporting Continuously from {first_obs} to {last_obs}"</span>)</span>
<span id="cb8-50">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-51">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
</details>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="cell quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/stations-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<div class="cell quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/rain vs quality-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
</div>
<p>In addition to numeric variables we have several categorical (aka “factor”) variables in the dataset. Each sampling site has a unique identifier, an associated sewershed, the body of water it is in (e.g.&nbsp;Hudson River) and a classification of suitable usage of the water body (e.g.&nbsp;fishing, swimming, etc).</p>
<p>More recently, the dataset has included the lab that did the water analysis. It might be interesting to see if a lab bias exists but fewer than half the observations have this information, so we don’t expect much.</p>
<p>Are some sites cleaner or dirtier than others? Yes. As we would expect, sites in water bodies with more flow are cleaner.</p>
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># show boxplot of cleanest and dirtiest ----------------------------------------</span></span>
<span id="cb9-2">site_boxplots <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(wq_data, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cleanest"</span>) {</span>
<span id="cb9-3">  median_all <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(wq_data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bacteria)</span>
<span id="cb9-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># show a boxplot of bacteria concentration by site</span></span>
<span id="cb9-5">  selected_sites <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-6">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># avoid Inf log values</span></span>
<span id="cb9-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># mutate(bacteria = bacteria + .001) |&gt;</span></span>
<span id="cb9-8">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># mutate(bacteria = ifelse(bacteria == 0, 1, bacteria)) |&gt; </span></span>
<span id="cb9-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(site) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nest</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_obs =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(data)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(n_obs <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">median_bacteria =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bacteria)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span>
<span id="cb9-16"></span>
<span id="cb9-17">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (label <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cleanest"</span>) {</span>
<span id="cb9-18">    selected_sites <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_min</span>(selected_sites, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">order_by =</span> median_bacteria, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb9-19">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb9-20">    selected_sites <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(selected_sites, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">order_by =</span> median_bacteria, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb9-21">  }</span>
<span id="cb9-22"></span>
<span id="cb9-23">  selected_sites <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(data) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-25">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reorder</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(site), median_bacteria), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> bacteria)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-26">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_log10</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">oob =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>squish_infinite) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-27">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-28">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geom_violin(draw_quantiles = .5) +</span></span>
<span id="cb9-29">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geom_jitter(width = .1) +</span></span>
<span id="cb9-30">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># annotate("text", x = log(SAFE)-2, y = 1500, label = "Safe Levels", color = "darkgreen") +</span></span>
<span id="cb9-31"></span>
<span id="cb9-32">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb9-33">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> glue<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{label} Sites"</span>),</span>
<span id="cb9-34">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Based on Median Bacteria Count"</span>,</span>
<span id="cb9-35">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Site"</span>,</span>
<span id="cb9-36">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Boxplot of Enterococci Concentration</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">(Log Scale)"</span></span>
<span id="cb9-37">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-38">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_flip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-39">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_hline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yintercept =</span> SAFE,</span>
<span id="cb9-40">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>,</span>
<span id="cb9-41">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">linewidth =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-42">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">annotate</span>(</span>
<span id="cb9-43">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label"</span>,</span>
<span id="cb9-44">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>,</span>
<span id="cb9-45">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> SAFE,</span>
<span id="cb9-46">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Level"</span>,</span>
<span id="cb9-47">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span></span>
<span id="cb9-48">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-49">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span>
<span id="cb9-50">}</span>
<span id="cb9-51"></span>
<span id="cb9-52"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">site_boxplots</span>(wq_data, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cleanest"</span>)</span>
<span id="cb9-53"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">site_boxplots</span>(wq_data, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dirtiest"</span>)</span></code></pre></div>
</details>
<div class="quarto-layout-panel" data-layout-ncol="2">
<div class="quarto-layout-row">
<div class="cell quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/cleanest sites-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<div class="cell quarto-layout-cell" style="flex-basis: 50.0%;justify-content: flex-start;">
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/dirtest_sites-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="lets-build-a-model" class="level2">
<h2 class="anchored" data-anchor-id="lets-build-a-model">Let’s Build a Model!</h2>
<p>Next to linear regression, random forests are one of the most popular machine learning algorithms. Random forests are a type of learning method, where multiple decision trees are trained on different subsets of the data and then combined to make a prediction. Since random forests can handle a mix of categorical and numeric variables they are ideal for our dataset. You can learn more about this class of models <a href="https://www.ibm.com/topics/random-forest">here</a>. The <code>tidymodels</code> suite of packages from <a href="https://posit.co/">posit.co</a> makes it easy to build and test a random forest model in R.</p>
<p>Models can easily <u>describe</u> any data set but we are interested in <u>predicting</u> out of sample data. To do this we need to split our data into a training set and a testing set. We will use the training set to build the model and the testing set to evaluate the model’s ability to predict. We have over 13,000 individual samples across all dates and sites. Our training set is a sort-of random sample of 75% of the data. I say “sort of” because we need stratify the split so that the same proportion of “SAFE”, “CAUTION” and “UNSAFE” samples are in both the training and testing sets.</p>
<p>At this point we are ready to set up the model. The tidymodels framework involves creating a workflow that starts with a model, then a “recipe” for preproccessing the data and finally a “fit” stage where the model is trained on the data. This framework makes it easy to try out different models and different recipes. We’ll use the random forest model in the <code>ranger</code> package. The recipe will normalize all the numeric variables. This will help with the skewness of the precipitation data.</p>
<p>Select data, add model, add recipe and fit. That’s it? Well, the coding is the easy part. In truth, there is a lot to think about in deciding what machine learning algorithm is best suited to the task, how to transform the inputs for imbalanced data and how to tune the model. The good news is the tidymodels framework makes it easy to experiment with different approaches. Check out the <a href="https://www.tidymodels.org/start/">tidymodels site</a> for a doorway to the rabbit hole of machine learning in R.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Model ------------------------------------------------------------------------</span></span>
<span id="cb10-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># select only variables for model</span></span>
<span id="cb10-3">wq_subset <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># make year a category, rather than numeric</span></span>
<span id="cb10-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(year)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb10-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb10-7">    quality,</span>
<span id="cb10-8">    site_id,</span>
<span id="cb10-9">    year,</span>
<span id="cb10-10">    time_since_high_tide,</span>
<span id="cb10-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># precip_t0,</span></span>
<span id="cb10-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># precip_48,</span></span>
<span id="cb10-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># precip_earlier,</span></span>
<span id="cb10-14">    precip_week,</span>
<span id="cb10-15">    water_body,</span>
<span id="cb10-16">    water_class,</span>
<span id="cb10-17">    sewershed,</span>
<span id="cb10-18">    lab</span>
<span id="cb10-19">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-20">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove rows with missing values</span></span>
<span id="cb10-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>()</span>
<span id="cb10-22"></span>
<span id="cb10-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or...we could aggregate the precipitation data</span></span>
<span id="cb10-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># wq_subset &lt;- wq_subset |&gt;</span></span>
<span id="cb10-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   #sum across all precip columns</span></span>
<span id="cb10-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   mutate(precip_total = precip_t0 + precip_48 + precip_earlier) |&gt;</span></span>
<span id="cb10-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   select(-c(precip_t0, precip_48, precip_earlier))</span></span>
<span id="cb10-28"></span>
<span id="cb10-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># seeding the random number generator ensures reproducibility</span></span>
<span id="cb10-30"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb10-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># split data into training and testing sets</span></span>
<span id="cb10-32">wq_split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initial_split</span>(wq_subset, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> quality)</span>
<span id="cb10-33">wq_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(wq_split)</span>
<span id="cb10-34">wq_test <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(wq_split)</span>
<span id="cb10-35"></span>
<span id="cb10-36"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># choose the model</span></span>
<span id="cb10-37">wq_rf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand_forest</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-38">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ranger"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">importance =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"impurity"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-39">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span>
<span id="cb10-40"></span>
<span id="cb10-41"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a recipe</span></span>
<span id="cb10-42">wq_recipe <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(quality <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> wq_train) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-43">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>())</span>
<span id="cb10-44"></span>
<span id="cb10-45"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># combine into a workflow</span></span>
<span id="cb10-46">wq_wf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-47">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wq_rf) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-48">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wq_recipe)</span>
<span id="cb10-49"></span>
<span id="cb10-50"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fit with training data</span></span>
<span id="cb10-51">wq_fit<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_wf <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-52">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> wq_train)</span></code></pre></div>
</details>
</div>
<p>Unlike linear regression, random forests are not easily interpretable. We can’t just look at the coefficients to see how each variable contributes to the prediction. After fitting to our training set, we can look at the variable importance plot. This plot shows the relative contribution of each variable to the prediction. Still, we don’t know the direction of the relationship or the strength of the relationship. We can only say that the variable is important in predicting the outcome.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># show variable importance</span></span>
<span id="cb11-2">wq_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_fit_parsnip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-4">  vip<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vip</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aesthetics =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Water Quality Variable Importance"</span>,</span>
<span id="cb11-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Random Forest Classifier to Predict "Safe", "Caution" or "Unsafe '</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/variable importance-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Rainfall looks to be the most important variable, followed by tide time and site location. Surprisingly, the sewershed and water body are far down the list. The more specific location factor, <code>site_id</code> contributes more. Again, this is not like linear regression, but if we look at the correlation of just rainfall and quality, there is some relation. This reinforces our intuition that CSOs are a driver of pollution.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">wq_subset <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> quality, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">median_precip =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(precip_week)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(quality, median_precip)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Median Weekly Precipitation by Water Quality"</span>,</span>
<span id="cb12-6">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"More Rain, More Bacteria"</span>,</span>
<span id="cb12-7">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Water Quality"</span>,</span>
<span id="cb12-8">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Median Weekly Precipitation (inches)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb12-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/rain vs bacteria-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>As noted above, we expect to get a good fit in-sample since we are really just describing the training set. The output of the the model is expressed as the liklihood each observation is from the “Safe”, “Caution” or “Unsafe” class. We can plot the predicted probability of each class for each observation. The plot below shows the predicted probability of each class for each observation in the training set. The color of the points indicates the actual class. The diagonal line is where the predicted probability equals the actual probability. The further the points are from the line, the less confident the model is in its prediction.</p>
<p>For example, in the chart below, the “SAFE” panel contains all the observations where the bacteria level is in the “SAFE” range. The dots in the lower right corner are the observations where the model is nearly certain of it’s prediction, 100% likely that the water is “SAFE” and 0% likely the water is “UNSAFE.” The balance of the dots are in the correct quadrant so our description of the training set is pretty good.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot prediction confidence</span></span>
<span id="cb13-2">wq_pred_train <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> wq_train)</span>
<span id="cb13-4"></span>
<span id="cb13-5">wq_pred_train <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> .pred_Safe, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred_Unsafe, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> quality)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>quality) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"orange"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Prediction Confidence"</span>,</span>
<span id="cb13-12">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's easy to get a good fit in-sample."</span>,</span>
<span id="cb13-13">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Probablility Actual Is Safe"</span>,</span>
<span id="cb13-14">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Probablility Actual Is Unsafe"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/plot fit for training set-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="is-the-model-any-good" class="level2">
<h2 class="anchored" data-anchor-id="is-the-model-any-good">Is the Model Any Good?</h2>
<p>Now let’s fit the model to the testing set and see how well it predicts the out-of-sample data. While the larger mass of points seems to be in the correct quadrants there are many observations where the model is very confident, but wrong.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot prediction confidence</span></span>
<span id="cb14-2">wq_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> wq_test)</span>
<span id="cb14-4"></span>
<span id="cb14-5">xt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_pred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">conf_mat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> quality, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> .pred_class) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># return just the confusion matrix</span></span>
<span id="cb14-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pluck</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"table"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(Prediction) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.prop =</span> n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span>
<span id="cb14-13"></span>
<span id="cb14-14">safe_risk <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(Prediction <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe"</span>,Truth <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(.prop) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prod</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>()</span>
<span id="cb14-17"></span>
<span id="cb14-18">unsafe_risk <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(Prediction <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe"</span>,Truth <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unsafe"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(.prop) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prod</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>()</span>
<span id="cb14-21"></span>
<span id="cb14-22">wq_pred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> .pred_Safe, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .pred_Unsafe, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> quality)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>quality) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-25">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"orange"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-28">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Prediction Confidence"</span>,</span>
<span id="cb14-29">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Many predictions are confidently wrong."</span>,</span>
<span id="cb14-30">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Probablility Actual Is Safe"</span>,</span>
<span id="cb14-31">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Probablility Actual Is Unsafe"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/plot fit for test set-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can more precisely quantify the performance of the model with a “truth table.” Here we see the actual and predicted counts of each class. Given a prediction, how likely is it to be correct? 62% of the “SAFE” predictions were correct, but 18% of the times when the model said the water was “SAFE” if was actually “UNSAFE,” which is probably the most problematic result. Would I swim in “safe” water where there is a 18% chance I might get sick? No.&nbsp;The model also has very little ability to identify “CAUTION” conditions. In fairness, the range of bacteria levels between “SAFE” and “UNSAFE” is very narrow so we expect that only slight changes in conditions would tip the observation into another category.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># show a confusion matrix</span></span>
<span id="cb15-2">xt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_pred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">conf_mat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> quality, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> .pred_class) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># return just the confusion matrix</span></span>
<span id="cb15-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pluck</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"table"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(Prediction) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.prop =</span> n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span>
<span id="cb15-10"></span>
<span id="cb15-11">xt_count <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>.prop) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> Truth,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Safe,Caution,Unsafe)))</span>
<span id="cb15-16"></span>
<span id="cb15-17">gt_domain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xt_count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Total,<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Prediction) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>()</span>
<span id="cb15-18"></span>
<span id="cb15-19">xt_prop <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> Truth,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> .prop) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(Safe,Caution,Unsafe)))</span>
<span id="cb15-24"></span>
<span id="cb15-25"></span>
<span id="cb15-26">truth_table <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(xt,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"count"</span>)){</span>
<span id="cb15-27">  gt_xt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-28">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rowname_col =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Prediction"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-29">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Truth Table"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-30">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_spanner</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Truth"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-31">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add stub header label</span></span>
<span id="cb15-32">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_stubhead</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Prediction"</span>)</span>
<span id="cb15-33"></span>
<span id="cb15-34">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span>(type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prop"</span>){</span>
<span id="cb15-35">    gt_xt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> gt_xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_percent</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decimals =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb15-36">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb15-37">    gt_xt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> gt_xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_number</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decimals =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-38">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grand_summary_rows</span>(</span>
<span id="cb15-39">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(.)</span>
<span id="cb15-40"></span>
<span id="cb15-41">      ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-42">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-43">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weight =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bold"</span>),</span>
<span id="cb15-44">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_stub_grand_summary</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rows =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total"</span>))</span>
<span id="cb15-45">      )</span>
<span id="cb15-46"></span>
<span id="cb15-47">  }</span>
<span id="cb15-48">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fmt_number(columns = where(is.numeric),decimals = 0) |&gt;</span></span>
<span id="cb15-49">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fmt_percent(columns = where(is.numeric),decimals = 0) |&gt;</span></span>
<span id="cb15-50">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># color the cells with a heat map</span></span>
<span id="cb15-51">  gt_xt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> gt_xt <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-52">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data_color</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb15-53">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"row"</span>),</span>
<span id="cb15-54">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">domain =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"count"</span>,gt_domain,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)),</span>
<span id="cb15-55">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"numeric"</span>,</span>
<span id="cb15-56">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">palette =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Blues"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-57">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># color prediction labels</span></span>
<span id="cb15-58">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-59">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_fill</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>)),</span>
<span id="cb15-60">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_column_labels</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe"</span>),</span>
<span id="cb15-61">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_body</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">column =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb15-62">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-63">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-64">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_fill</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"yellow"</span>),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blaCk"</span>)),</span>
<span id="cb15-65">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_column_labels</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Caution"</span>),</span>
<span id="cb15-66">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_body</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">column =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb15-67">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-68">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-69">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_fill</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>)),</span>
<span id="cb15-70">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_column_labels</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unsafe"</span>),</span>
<span id="cb15-71">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_body</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">column =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb15-72">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-73">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># color Truth labels</span></span>
<span id="cb15-74">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-75">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_fill</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>)),</span>
<span id="cb15-76">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_stub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Safe"</span>)</span>
<span id="cb15-77">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-78">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-79">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_fill</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"yellow"</span>),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>)),</span>
<span id="cb15-80">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_stub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Caution"</span>)</span>
<span id="cb15-81">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-82">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-83">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_fill</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>)),</span>
<span id="cb15-84">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_stub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unsafe"</span>)</span>
<span id="cb15-85">    )  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-86">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb15-87">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weight =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bold"</span>),</span>
<span id="cb15-88">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_body</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb15-89">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_column_labels</span>(),</span>
<span id="cb15-90">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_column_spanners</span>(),</span>
<span id="cb15-91">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_title</span>(),</span>
<span id="cb15-92">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_stub</span>(),</span>
<span id="cb15-93">                       <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_stubhead</span>())</span>
<span id="cb15-94">    )</span>
<span id="cb15-95">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">return</span>(gt_xt)</span>
<span id="cb15-96">}</span>
<span id="cb15-97"></span>
<span id="cb15-98">roc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_pred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-99">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_auc</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> quality, .pred_Safe,.pred_Caution,.pred_Unsafe)</span>
<span id="cb15-100"></span>
<span id="cb15-101"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">truth_table</span>(xt_prop,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prop"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<div id="whyaylitys" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#whyaylitys table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#whyaylitys thead, #whyaylitys tbody, #whyaylitys tfoot, #whyaylitys tr, #whyaylitys td, #whyaylitys th {
  border-style: none;
}

#whyaylitys p {
  margin: 0;
  padding: 0;
}

#whyaylitys .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#whyaylitys .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#whyaylitys .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#whyaylitys .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#whyaylitys .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#whyaylitys .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#whyaylitys .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#whyaylitys .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#whyaylitys .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#whyaylitys .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#whyaylitys .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#whyaylitys .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#whyaylitys .gt_spanner_row {
  border-bottom-style: hidden;
}

#whyaylitys .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#whyaylitys .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#whyaylitys .gt_from_md > :first-child {
  margin-top: 0;
}

#whyaylitys .gt_from_md > :last-child {
  margin-bottom: 0;
}

#whyaylitys .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#whyaylitys .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#whyaylitys .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#whyaylitys .gt_row_group_first td {
  border-top-width: 2px;
}

#whyaylitys .gt_row_group_first th {
  border-top-width: 2px;
}

#whyaylitys .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#whyaylitys .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#whyaylitys .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#whyaylitys .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#whyaylitys .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#whyaylitys .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#whyaylitys .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#whyaylitys .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#whyaylitys .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#whyaylitys .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#whyaylitys .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#whyaylitys .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#whyaylitys .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#whyaylitys .gt_left {
  text-align: left;
}

#whyaylitys .gt_center {
  text-align: center;
}

#whyaylitys .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#whyaylitys .gt_font_normal {
  font-weight: normal;
}

#whyaylitys .gt_font_bold {
  font-weight: bold;
}

#whyaylitys .gt_font_italic {
  font-style: italic;
}

#whyaylitys .gt_super {
  font-size: 65%;
}

#whyaylitys .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#whyaylitys .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#whyaylitys .gt_indent_1 {
  text-indent: 5px;
}

#whyaylitys .gt_indent_2 {
  text-indent: 10px;
}

#whyaylitys .gt_indent_3 {
  text-indent: 15px;
}

#whyaylitys .gt_indent_4 {
  text-indent: 20px;
}

#whyaylitys .gt_indent_5 {
  text-indent: 25px;
}

#whyaylitys .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#whyaylitys div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>

<table class="gt_table caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
<colgroup>
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="gt_heading header">
<th colspan="5" class="gt_heading gt_title gt_font_normal gt_bottom_border" style="font-weight: bold">Truth Table</th>
</tr>
<tr class="gt_col_headings gt_spanner_row even">
<th rowspan="2" id="a::stub" class="gt_col_heading gt_columns_bottom_border gt_left" data-quarto-table-cell-role="th" style="font-weight: bold" scope="col">Prediction</th>
<th colspan="4" id="Truth" class="gt_center gt_columns_top_border gt_column_spanner_outer" data-quarto-table-cell-role="th" style="font-weight: bold" scope="colgroup"><div class="gt_column_spanner">
Truth
</div></th>
</tr>
<tr class="gt_col_headings header">
<th id="Safe" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" style="background-color: #00FF00; color: #000000; font-weight: bold" scope="col">Safe</th>
<th id="Caution" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" style="background-color: #FFFF00; color: #000000; font-weight: bold" scope="col">Caution</th>
<th id="Unsafe" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" style="background-color: #FF0000; color: #FFFFFF; font-weight: bold" scope="col">Unsafe</th>
<th id="Total" class="gt_col_heading gt_columns_bottom_border gt_right" data-quarto-table-cell-role="th" style="font-weight: bold" scope="col">Total</th>
</tr>
</thead>
<tbody class="gt_table_body">
<tr class="odd">
<td id="stub_1_1" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row" style="background-color: #00FF00; color: #000000; font-weight: bold">Safe</td>
<td class="gt_row gt_right" headers="stub_1_1 Safe" style="background-color: #4493C7; color: #FFFFFF">62%</td>
<td class="gt_row gt_right" headers="stub_1_1 Caution" style="background-color: #D0E2F2; color: #000000">20%</td>
<td class="gt_row gt_right" headers="stub_1_1 Unsafe" style="background-color: #D3E3F3; color: #000000">18%</td>
<td class="gt_row gt_right" headers="stub_1_1 Total">100%</td>
</tr>
<tr class="even">
<td id="stub_1_2" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row" style="background-color: #FFFF00; color: #000000; font-weight: bold">Caution</td>
<td class="gt_row gt_right" headers="stub_1_2 Safe" style="background-color: #94C4DF; color: #000000">40%</td>
<td class="gt_row gt_right" headers="stub_1_2 Caution" style="background-color: #BAD6EB; color: #000000">29%</td>
<td class="gt_row gt_right" headers="stub_1_2 Unsafe" style="background-color: #B3D3E8; color: #000000">31%</td>
<td class="gt_row gt_right" headers="stub_1_2 Total">100%</td>
</tr>
<tr class="odd">
<td id="stub_1_3" class="gt_row gt_left gt_stub" data-quarto-table-cell-role="th" scope="row" style="background-color: #FF0000; color: #FFFFFF; font-weight: bold">Unsafe</td>
<td class="gt_row gt_right" headers="stub_1_3 Safe" style="background-color: #D6E6F4; color: #000000">17%</td>
<td class="gt_row gt_right" headers="stub_1_3 Caution" style="background-color: #D4E4F4; color: #000000">18%</td>
<td class="gt_row gt_right" headers="stub_1_3 Unsafe" style="background-color: #3B89C2; color: #FFFFFF">66%</td>
<td class="gt_row gt_right" headers="stub_1_3 Total">100%</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<p>Another common way to visualize the performance of a model is with a “receiver operator characteristic” (ROC) curve. The ROC curve shows the trade off between true positives and false positives for each class. The larger the area under the curves (AUC), the better the model. An AUC of 1.0 is perfect while 0.5 is what random chance would show. In our case we have a combined AUC of 0.73, which is better than useless but not great.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot a ROC curve</span></span>
<span id="cb16-2"></span>
<span id="cb16-3">wq_pred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> quality, .pred_Safe,.pred_Caution,.pred_Unsafe) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> specificity, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> sensitivity,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color=</span>.level)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_path</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"orange"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"green"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_equal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb16-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ROC Curve"</span>,</span>
<span id="cb16-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"All Three Water Classifications"</span>,</span>
<span id="cb16-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"False Positive Rate"</span>,</span>
<span id="cb16-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True Positive Rate"</span>,</span>
<span id="cb16-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classification"</span></span>
<span id="cb16-16">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb16-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/plot ROC-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>We can easily tweak this model. Looking at rainfall as separate daily factors improves the truth table slightly but masks rainfall as the most important factor. We can quickly visualize the difference using the ROC. Splitting rainfall into three separate time windows rather than the full week improves the model slightly.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># select only variables for model</span></span>
<span id="cb17-2">wq_subset_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-3">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># make year a category, rather than numeric</span></span>
<span id="cb17-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(year)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb17-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb17-6">    quality,</span>
<span id="cb17-7">    site_id,</span>
<span id="cb17-8">    year,</span>
<span id="cb17-9">    time_since_high_tide,</span>
<span id="cb17-10">    precip_t0,</span>
<span id="cb17-11">    precip_48,</span>
<span id="cb17-12">    precip_earlier,</span>
<span id="cb17-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># precip_week,</span></span>
<span id="cb17-14">    water_body,</span>
<span id="cb17-15">    water_class,</span>
<span id="cb17-16">    sewershed,</span>
<span id="cb17-17">    lab</span>
<span id="cb17-18">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-19">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove rows with missing values</span></span>
<span id="cb17-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>()</span>
<span id="cb17-21"></span>
<span id="cb17-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># or...we could aggregate the precipitation data</span></span>
<span id="cb17-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># wq_subset &lt;- wq_subset |&gt;</span></span>
<span id="cb17-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   #sum across all precip columns</span></span>
<span id="cb17-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   mutate(precip_total = precip_t0 + precip_48 + precip_earlier) |&gt;</span></span>
<span id="cb17-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   select(-c(precip_t0, precip_48, precip_earlier))</span></span>
<span id="cb17-27"></span>
<span id="cb17-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># seeding the random number generator ensures reproducibility</span></span>
<span id="cb17-29"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb17-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># split data into training and testing sets</span></span>
<span id="cb17-31">wq_split_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initial_split</span>(wq_subset_2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">strata =</span> quality)</span>
<span id="cb17-32">wq_train_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">training</span>(wq_split_2)</span>
<span id="cb17-33">wq_test_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">testing</span>(wq_split_2)</span>
<span id="cb17-34"></span>
<span id="cb17-35"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># choose the model</span></span>
<span id="cb17-36">wq_rf_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand_forest</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-37">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ranger"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">importance =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"impurity"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-38">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"classification"</span>)</span>
<span id="cb17-39"></span>
<span id="cb17-40"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create a recipe</span></span>
<span id="cb17-41">wq_recipe_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(quality <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> wq_train_2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-42">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>())</span>
<span id="cb17-43"></span>
<span id="cb17-44"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># combine into a workflow</span></span>
<span id="cb17-45">wq_wf_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-46">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(wq_rf_2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-47">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(wq_recipe_2)</span>
<span id="cb17-48"></span>
<span id="cb17-49"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fit with training data</span></span>
<span id="cb17-50">wq_fit_2<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_wf_2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-51">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> wq_train_2)</span>
<span id="cb17-52"></span>
<span id="cb17-53">wq_pred_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_fit_2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-54">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> wq_test_2)</span>
<span id="cb17-55"></span>
<span id="cb17-56"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># plot a ROC curve</span></span>
<span id="cb17-57"></span>
<span id="cb17-58">rc1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_pred <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-59">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> quality, .pred_Safe,.pred_Caution,.pred_Unsafe) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb17-60">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(.level <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unsafe"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb17-61">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.level =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Single Full Week Precip Factor"</span>)</span>
<span id="cb17-62"></span>
<span id="cb17-63">rc2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> wq_pred_2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-64">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">roc_curve</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">truth =</span> quality, .pred_Safe,.pred_Caution,.pred_Unsafe) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb17-65">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(.level <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Unsafe"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb17-66">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.level =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"3 Precip Factors"</span>)</span>
<span id="cb17-67"></span>
<span id="cb17-68"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(rc1, rc2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb17-69">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb17-70">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> specificity,</span>
<span id="cb17-71">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> sensitivity,</span>
<span id="cb17-72">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> .level</span>
<span id="cb17-73">  )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-74">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_path</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-75">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_abline</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lty =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-76">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb17-77">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Effect of Feature Choices on Accuracy"</span>,</span>
<span id="cb17-78">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ROC Curve for "SAFE" class only'</span>,</span>
<span id="cb17-79">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"False Positive Rate"</span>,</span>
<span id="cb17-80">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"True Positive Rate"</span>,</span>
<span id="cb17-81">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Features Choice"</span></span>
<span id="cb17-82">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-83">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_equal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-84">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>()</span></code></pre></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster_files/figure-html/alt feature set-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>There are other metrics we could use but you get the idea. We could also tune the model to see if we could improve the performance, but in this case I found it did not help much so I don’t include it. Refer to the tidymodels site for tuning tips and more performance statistics.</p>
<p>We could also look for other variables not present in the BOP data set that might be useful. I’ve looked at imputing tidal current and including daily temperature. Neither made much of a difference. You can see <a href="https://github.com/apsteinmetz/oyster">these experiments on Github</a>. They include techniques for downloading tide and weather data from the NOAA and matching NOAA stations to water sampling stations.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>In this project we’ve seen how easy it is to build a powerful machine learning model using the <code>tidymodels</code> tools. As citizen scientists we were able to gather and use publicly available data to gain insight into the factors influencing water quality in New York Harbor. We’ve also learned about how important oysters were to the city in days past and can dream about a day (alas, not likely in our lifetimes) when oysters “as big as dinner plates” are once again living in the harbor.</p>
<p>Don’t forget to check out <a href="https://www.billionoysterproject.org/">The Billion Oyster Project</a>. They have lot’s of ways to contribute!</p>


</section>

 ]]></description>
  <guid>https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster.html</guid>
  <pubDate>Thu, 05 Dec 2024 05:00:00 GMT</pubDate>
  <media:content url="https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/img/Oyster.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
