The image above did not exist on the Web (or anywhere) until I went to the Stable Diffusion site & entered these nine words:
"A threedimensional model of shrimp in a frying pan."
After less than one minute, the AI agent produced that image.
This is the optional material for the post on AI/ML & aquaculture.
The first section describes the traditional way of solving problems — plugging data into equations to get answers — and its limitations for modeling complex systems like RAS.
The second section outlines the Machine Learning approach, which is datadriven.
The third section is a brief comparison of those two approaches.
The final section is the most important: It underlines the need for loads of data to build useful ML models. It also touches on a data hurdle that must be cleared to develop ML models for RAS.
ML & Aquaculture Table of Contents
 (optional) The Traditional Approach
 (optional) The Machine Learning Alternative
 (optional) The Tail of the Tape
 (optional) Data Augmentation
[Optional] Traditional Approach
The traditional approach to solving problems is to collect data and use rules — often equations — to produce answers.
An Example
Suppose we measure the concentration of Total AmmoniaNitrogen (TAN) in an L. vannamei biofloc raceway.
We need to know the percentage of unionized ammonianitrogen (UIAN) — the more toxic form — in our water sample.
Well, there's a mathematical formula (the rules) based on our understanding of chemistry that tells us how to combine pH, temperature, & salinity (the data) to compute the percentage of unionized ammonia (the answer).
That approach is the workhorse of science & engineering.
It generally produces good answers for systems that are...

simple (one or few variables)

deterministic (no randomness)

homogeneous (not spatially distributed)

at steadystate (not timevarying)
The Limitation
When any of those conditions are relaxed — as they are in the real world — things get more complicated and answers suffer.
We just don’t understand complex systems well enough to adequately describe their behavior with equations.
RAS is a Complex System
Recirculating Aquaculture Systems (RAS) are such
complex realworld systems.
Recirc systems...

have many and diverse “moving parts” — biological, chemical, physical, and (not least) financial

are better viewed on some scales as stochastic (i.e., random) instead of deterministic

are 'homogeneous enough' in smallscale tanks, but not always in largescale culture units

are dynamic over the timescale of a production cycle (i.e., they're not at steadystate)
The result is that the traditional approach has limited value for understanding and managing realworld systems.
Even if we could describe the complexities of RAS mathematically...
...we'd still be faced with solving a highdimensional, nonlinear, dynamic system of equations.
Such systems generally have no analytic solution; they're solved numerically, and that involves nontrivial computational issues.
In the end, we’d have an impressive set of equations that represents our theoretical understanding of RAS...but little (or no) predictive advantage to assist us in enhancing sustainable seafood production.
[Optional] What's the alternative?
The traditional approach is woven so tightly into the fabric of science and engineering that we might well wonder if there really is any other way to solve problems.
An alternative is Machine Learning (ML).
The ML approach flips the scrpit of the traditional approach.
In the ML approach, we have data and answers but
— unlike the traditional approach —
we do not have the rules which connect them.
Why don’t we have the rules (or a formula)?
Because the system we want to control is too complex to be described accurately by nice, neat equations: We just don’t know how to write (and solve) comprehensive rules that tell us how to calculate the output from the input.
OK. So...What do we do?
We train a machine learning model by passing it the data.
How?
In very general terms, there are four steps...

feed the model the input data (which we have)

compare the output with the actual result (which we have)

calculate the error between the prediction and result

repeat until the error is so small that the model has “learned” how to map input data to the result with high accuracy
In this way, ML models learn to identify patterns in the data that relate the input to the output.
Welltrained ML models can forecast the state of complex systems with accuracies exceeding those of traditional models.
[Optional] The Tale of the Tape
Here's a brief rundown that compares the traditional approach with the ML approach...
The Rules

Traditional models use predefined rules
 e.g., the rules of stoichiometry that quantify relationships among chemical substances in a reaction.

ML models "learn" the rules from the data
 e.g., like the standard linear regression models you study in a Statistics course.
Interpretability
 Traditional models are interpretable
 They explicitly include physical, chemical, & biological parts of the system: temperature, dissociation constants, biomass, feeding rate, TDS, etc.
 ML models are NOT interpretable
 They’re 'black boxes' that hide causality.
Forecasting

Traditional models are built on semantic inference
 They're designed with an understanding of the relationships linking inputs to outputs. (e.g., shrimp eat pellets, use oxygen, produce ammonia...)

ML models are built on statistical inference
 They don't explicitly address the physical, chemical, & biological mechanisms that turn inputs into outputs.
The ML paradigm calls to mind a much cited quote of iconic computer scientist Ken Thompson:
[Optional] Data Augmentation
Google's Machine Learning Rule #1:
"Machine learning is cool, but it requires data."
ML is a glutton for highquality data, and data is the "critical infrastructure" at the core of robust machinelearning models.
But it’s not always feasible to collect enough data to satisfy ML's appetite, and without ample data, model predictions suffer.
This is generally the case for RAS, as it might take 3  4 months to collect a sufficient timeseries dataset for a single crop.
Additionally, the cost of introducing realistic — and potentially cropthreatening — waterquality changes into a production tank to train an anomalydetection model is unacceptably high.
Broad & Shallow vs. Narrow & Deep
Not all datasets are equal.
There's a fundamental difference between the data that satisfy the "Hello, World!" of ML applications — image classification that distinguishes, for example, dogs from cats — and the data needed to train a comprehensive RAS ML model.
Data to train the former are "broad and shallow": They have fewer data features to track (the feature set is shallow) and many instances (a broad set of examples) on which to train the model.
RAS data, on the other hand, will have fewer available instances (in general, each instance is the dataset of a growout cycle), and an instance comprises many features (e.g., temperature, salinity, pH, alkalinity, TSS, Ω, floc composition, biomass, size distribution, DO, TAN, morts, disease vectors, etc.)
That leads us to consider augmenting existing RAS data with generated synthetic data.
Synthetic Data
When you don’t have (or cannot collect) enough data to train a strong ML model, one remedy is data augmentation.
Similar to bootstrapping in Statistics, existing RAS datasets can be leveraged to produce synthetic data by using Deep Neural Networks (DNN) configured as Variational AutoEncoders (VAE) or Generative Adversarial Networks (GAN).
You'll find overviews of synthetic data here and here.
Generating synthetic data is a major project step in itself, which, in this case, will rely critically on the domain expertise of experienced aquaculturists.
We'll end this section by repeating what we stated in the main ML blog post:
We need data. Good data. And a lot of it.