Blog Logo

02 Feb 2022 ~ 9 MIN READ

[optional] Aquaculture & ML

AI-generated image from text with
AI-generated image from text with Stable Diffusion

The image above did not exist on the Web (or anywhere) until I went to the Stable Diffusion site & entered these nine words:

"A three-dimensional model of shrimp in a frying pan."

After less than one minute, the AI agent produced that image.

This is the optional material for the post on AI/ML & aquaculture.

The first section describes the traditional way of solving problems — plugging data into equations to get answers — and its limitations for modeling complex systems like RAS.

The second section outlines the Machine Learning approach, which is data-driven.

The third section is a brief comparison of those two approaches.

The final section is the most important: It underlines the need for loads of data to build useful ML models. It also touches on a data hurdle that must be cleared to develop ML models for RAS.

ML & Aquaculture Table of Contents


[Optional] Traditional Approach

The traditional approach to solving problems is to collect data and use rules — often equations — to produce answers.

Traditional Modeling

Fig. 3 - The Traditional Problem-solving Approach
(image from

An Example

Suppose we measure the concentration of Total Ammonia-Nitrogen (TA-N) in an L. vannamei biofloc raceway.

We need to know the percentage of un-ionized ammonia-nitrogen (UIA-N) — the more toxic form — in our water sample.

Well, there's a mathematical formula (the rules) based on our understanding of chemistry that tells us how to combine pH, temperature, & salinity (the data) to compute the percentage of un-ionized ammonia (the answer).

That approach is the work-horse of science & engineering.

It generally produces good answers for systems that are...

  • simple (one or few variables)

  • deterministic (no randomness)

  • homogeneous (not spatially distributed)

  • at steady-state (not time-varying)

The Limitation

When any of those conditions are relaxed — as they are in the real world — things get more complicated and answers suffer.

We just don’t understand complex systems well enough to adequately describe their behavior with equations.

RAS is a Complex System

Recirculating Aquaculture Systems (RAS) are such
complex real-world systems.

Recirc systems...

  • have many and diverse “moving parts” — biological, chemical, physical, and (not least) financial

  • are better viewed on some scales as stochastic (i.e., random) instead of deterministic

  • are 'homogeneous enough' in small-scale tanks, but not always in large-scale culture units

  • are dynamic over the time-scale of a production cycle (i.e., they're not at steady-state)

The result is that the traditional approach has limited value for understanding and managing real-world systems.

Even if we could describe the complexities of RAS mathematically...

...we'd still be faced with solving a high-dimensional, non-linear, dynamic system of equations.

Such systems generally have no analytic solution; they're solved numerically, and that involves non-trivial computational issues.

In the end, we’d have an impressive set of equations that represents our theoretical understanding of RAS...but little (or no) predictive advantage to assist us in enhancing sustainable seafood production.


[Optional] What's the alternative?

The traditional approach is woven so tightly into the fabric of science and engineering that we might well wonder if there really is any other way to solve problems.

An alternative is Machine Learning (ML).

The ML approach flips the scrpit of the traditional approach.

In the ML approach, we have data and answers but
— unlike the traditional approach —
we do not have the rules which connect them.

The ML approach

Fig. 4 - The ML Problem-solving Approach
(image from )

Why don’t we have the rules (or a formula)?

Because the system we want to control is too complex to be described accurately by nice, neat equations: We just don’t know how to write (and solve) comprehensive rules that tell us how to calculate the output from the input.

OK. So...What do we do?

We train a machine learning model by passing it the data.


In very general terms, there are four steps...

  1. feed the model the input data (which we have)

  2. compare the output with the actual result (which we have)

  3. calculate the error between the prediction and result

  4. repeat until the error is so small that the model has “learned” how to map input data to the result with high accuracy

(If you have some linear algebra under your belt and about 30 minutes, this video from Samson Zhang provides a concise overview of how a simple neural network does its 'magic'.)

In this way, ML models learn to identify patterns in the data that relate the input to the output.

Well-trained ML models can forecast the state of complex systems with accuracies exceeding those of traditional models.


[Optional] The Tale of the Tape

Here's a brief run-down that compares the traditional approach with the ML approach...

The Rules

  • Traditional models use pre-defined rules

    • e.g., the rules of stoichiometry that quantify relationships among chemical substances in a reaction.
  • ML models "learn" the rules from the data

    • e.g., like the standard linear regression models you study in a Statistics course.


  • Traditional models are interpretable
    • They explicitly include physical, chemical, & biological parts of the system: temperature, dissociation constants, biomass, feeding rate, TDS, etc.
  • ML models are NOT interpretable
    • They’re 'black boxes' that hide causality.


  • Traditional models are built on semantic inference

    • They're designed with an understanding of the relationships linking inputs to outputs. (e.g., shrimp eat pellets, use oxygen, produce ammonia...)
  • ML models are built on statistical inference

    • They don't explicitly address the physical, chemical, & biological mechanisms that turn inputs into outputs.

The ML paradigm calls to mind a much cited quote of iconic computer scientist Ken Thompson:

"When in doubt, use brute force."


[Optional] Data Augmentation

Google's Machine Learning Rule #1:
"Machine learning is cool, but it requires data."

ML is a glutton for high-quality data, and data is the "critical infrastructure" at the core of robust machine-learning models.

But it’s not always feasible to collect enough data to satisfy ML's appetite, and without ample data, model predictions suffer.

This is generally the case for RAS, as it might take 3 - 4 months to collect a sufficient time-series dataset for a single crop.

Additionally, the cost of introducing realistic — and potentially crop-threatening — water-quality changes into a production tank to train an anomaly-detection model is unacceptably high.

Broad & Shallow vs. Narrow & Deep

Not all datasets are equal.

There's a fundamental difference between the data that satisfy the "Hello, World!" of ML applications — image classification that distinguishes, for example, dogs from cats — and the data needed to train a comprehensive RAS ML model.

Data to train the former are "broad and shallow": They have fewer data features to track (the feature set is shallow) and many instances (a broad set of examples) on which to train the model.

RAS data, on the other hand, will have fewer available instances (in general, each instance is the dataset of a grow-out cycle), and an instance comprises many features (e.g., temperature, salinity, pH, alkalinity, TSS, Ω, floc composition, biomass, size distribution, DO, TA-N, morts, disease vectors, etc.)

That leads us to consider augmenting existing RAS data with generated synthetic data.

Synthetic Data

When you don’t have (or cannot collect) enough data to train a strong ML model, one remedy is data augmentation.

Similar to bootstrapping in Statistics, existing RAS datasets can be leveraged to produce synthetic data by using Deep Neural Networks (DNN) configured as Variational Auto-Encoders (VAE) or Generative Adversarial Networks (GAN).

You'll find overviews of synthetic data here and here.

Generating synthetic data is a major project step in itself, which, in this case, will rely critically on the domain expertise of experienced aquaculturists.

We'll end this section by repeating what we stated in the main ML blog post:

We need data. Good data. And a lot of it.


⬅️ Return to ML post