LSTM Hybrid Model with EasyHybrid.jl

This tutorial demonstrates how to use EasyHybrid to train a hybrid model with LSTM neural networks on synthetic data for respiration modeling with Q10 temperature sensitivity. The code for this tutorial can be found in docs/src/literate/tutorials => example_synthetic_lstm.jl.

1. Load Packages

Set project path and activate environment

julia

using EasyHybrid
using AxisKeys
using DimensionalData
using CairoMakie

2. Data Loading and Preprocessing

Load synthetic dataset from GitHub - it's tabular data

julia

df = load_timeseries_netcdf("https://github.com/bask0/q10hybrid/raw/master/data/Synthetic4BookChap.nc");

Select a subset of data for faster execution

julia

df = df[1:20000, :];
first(df, 5);

3. Define Neural Network Architectures

Define a standard feedforward neural network

julia

NN = Chain(Dense(15, 15, Lux.sigmoid), Dense(15, 15, Lux.sigmoid), Dense(15, 1))

Chain(
    layer_(1-2) = Dense(15 => 15, σ),             # 480 (240 x 2) parameters
    layer_3 = Dense(15 => 1),                     # 16 parameters
)         # Total: 496 parameters,
          #        plus 0 states.

Define LSTM-based neural network with memory

TIP

When the Chain ends with a Recurrence layer, EasyHybrid automatically adds a RecurrenceOutputDense layer to handle the sequence output format. The user only needs to define the Recurrence layer itself!

julia

NN_Memory = Chain(
    Recurrence(LSTMCell(15 => 15), return_sequence = true),
)

Chain(
    layer_1 = Recurrence(
        cell = LSTMCell(15 => 15),                # 1_920 parameters, plus 1 non-trainable
    ),
)         # Total: 1_920 parameters,
          #        plus 1 states.

4. We define the process-based model, a classical Q10 model for respiration

julia

"""
    RbQ10(; ta, Q10, rb, tref=15.0f0)

Respiration model with Q10 temperature sensitivity.

- `ta`: air temperature [°C]
- `Q10`: temperature sensitivity factor [-]
- `rb`: basal respiration rate [μmol/m²/s]
- `tref`: reference temperature [°C] (default: 15.0)
"""
function RbQ10(; ta, Q10, rb, tref = 15.0f0)
    reco = rb .* Q10 .^ (0.1f0 .* (ta .- tref))
    return (; reco, Q10, rb)
end

5. Define Model Parameters

Parameter specification: (default, lower_bound, upper_bound)

julia

parameters = (
    rb = (3.0f0, 0.0f0, 13.0f0),  # Basal respiration [μmol/m²/s]
    Q10 = (2.0f0, 1.0f0, 4.0f0),   # Temperature sensitivity factor [-]
)

(rb = (3.0f0, 0.0f0, 13.0f0), Q10 = (2.0f0, 1.0f0, 4.0f0))

6. Configure Hybrid Model Components

Define input variables Forcing variables (temperature)

julia

forcing = [:ta]

1-element Vector{Symbol}:
 :ta

Predictor variables (solar radiation, and its derivative)

julia

predictors = [:sw_pot, :dsw_pot]

2-element Vector{Symbol}:
 :sw_pot
 :dsw_pot

Target variable (respiration)

julia

target = [:reco]

1-element Vector{Symbol}:
 :reco

Parameter classification Global parameters (same for all samples)

julia

global_param_names = [:Q10]

1-element Vector{Symbol}:
 :Q10

Neural network predicted parameters

julia

neural_param_names = [:rb]

1-element Vector{Symbol}:
 :rb

7. Construct LSTM Hybrid Model

Create LSTM hybrid model using the unified constructor

julia

hlstm = constructHybridModel(
    predictors,
    forcing,
    target,
    RbQ10,
    parameters,
    neural_param_names,
    global_param_names,
    hidden_layers = NN_Memory, # Neural network architecture
    scale_nn_outputs = true, # Scale neural network outputs
    input_batchnorm = false   # Apply batch normalization to inputs
)

Hybrid Model (Single NN)
Neural Network: 
  Chain(
      layer_1 = WrappedFunction(identity),
      layer_2 = Dense(2 => 15, tanh),               # 45 parameters
      layer_3 = Recurrence(
          cell = LSTMCell(15 => 15),                # 1_920 parameters, plus 1 non-trainable
      ),
      layer_4 = RecurrenceOutputDense(
          layer = Dense(15 => 15, tanh),            # 240 parameters
      ),
      layer_5 = Dense(15 => 1),                     # 16 parameters
  )         # Total: 2_221 parameters,
            #        plus 1 states.
Configuration:
  predictors = [:sw_pot, :dsw_pot]
  forcing = [:ta]
  targets = [:reco]
  mechanistic_model = RbQ10
  neural_param_names = [:rb]
  global_param_names = [:Q10]
  fixed_param_names = Symbol[]
  scale_nn_outputs = true
  start_from_default = true
  config = (; hidden_layers = Chain{@NamedTuple{layer_1::Recurrence{Static.True, LSTMCell{Static.False, Static.False, Int64, Int64, NTuple{4, Nothing}, NTuple{4, Nothing}, NTuple{4, Nothing}, typeof(WeightInitializers.zeros32), typeof(WeightInitializers.zeros32), Static.True}, Lux.BatchLastIndex}}, Nothing}((layer_1 = Recurrence{Static.True, LSTMCell{Static.False, Static.False, Int64, Int64, NTuple{4, Nothing}, NTuple{4, Nothing}, NTuple{4, Nothing}, typeof(WeightInitializers.zeros32), typeof(WeightInitializers.zeros32), Static.True}, Lux.BatchLastIndex}(LSTMCell(15 => 15), Lux.BatchLastIndex(), static(true), false, false),), nothing), activation = tanh, scale_nn_outputs = true, input_batchnorm = false, start_from_default = true,)

Parameters:
  ┌─────┬─────────┬───────┬───────┐
  │     │ default │ lower │ upper │
  ├─────┼─────────┼───────┼───────┤
  │  rb │     3.0 │   0.0 │  13.0 │
  │ Q10 │     2.0 │   1.0 │   4.0 │
  └─────┴─────────┴───────┴───────┘

8. Data Preparation Steps (Demonstration)

The following steps demonstrate what happens under the hood during training. In practice, you can skip to Section 9 and use the train function directly.

:KeyedArray and :DimArray are supported

julia

pref_array_type = :DimArray
x, y = prepare_data(hlstm, df, array_type = pref_array_type);

Convert a (single) time series into many training samples by windowing.

Each sample consists of:

input_window: number of past steps given to the model (sequence length)
output_window: number of steps to predict
output_shift: stride between consecutive windows (controls overlap)
lead_time: prediction lead (e.g. lead_time=1 predicts starting 1 step ahead)

This supports many-to-one / many-to-many forecasting depending on output_window. Creates an array of shape (variable, time, batch_size) with variable being feature, time the input window, andbatch_size1:n samples (= full batch)

julia

output_shift = 1
output_window = 1
input_window = 10
(xs, forcings_s), ys = split_into_sequences(x, y; input_window = input_window, output_window = output_window, output_shift = output_shift, lead_time = 0);

First input_window/sample

julia

xs[:, :, 1]

┌ 2×10 DimArray{Float32, 2} ┐
├───────────────────────────┴─────────────────────────────────────── dims ┐
  ↓ variable Categorical{Symbol} [:sw_pot, :dsw_pot] Unordered,
  → time Categorical{Symbol} [:x9_to_x9, …, :x9_to_x0_y0] ReverseOrdered
└─────────────────────────────────────────────────────────────────────────┘
 ↓ →           :x9_to_x9     :x9_to_x8  …     :x9_to_x1     :x9_to_x0_y0
  :sw_pot   109.817       109.817          109.817       109.817
  :dsw_pot  115.595       115.595          115.595       115.595

Second input_window/sample

julia

xs[:, :, 2]

┌ 2×10 DimArray{Float32, 2} ┐
├───────────────────────────┴─────────────────────────────────────── dims ┐
  ↓ variable Categorical{Symbol} [:sw_pot, :dsw_pot] Unordered,
  → time Categorical{Symbol} [:x9_to_x9, …, :x9_to_x0_y0] ReverseOrdered
└─────────────────────────────────────────────────────────────────────────┘
 ↓ →           :x9_to_x9     :x9_to_x8  …     :x9_to_x1     :x9_to_x0_y0
  :sw_pot   109.817       109.817          109.817       109.817
  :dsw_pot  115.595       115.595          115.595       115.595

test of shift

julia

xs[:, output_shift + 1, 1] == xs[:, 1, 2]

true

First output_window/sample with time label like :x30_to_x5_y4 which indicates an accumulation of memory from x30 to x5 for the prediction of y4

julia

ys.reco

┌ 1×19991 DimArray{Float32, 2} ┐
├──────────────────────────────┴─────────────────────────────────────── dims ┐
  ↓ time Categorical{Symbol} [:x9_to_x0_y0] ForwardOrdered,
  → batch_size Sampled{Int64} [1, …, 19991] ForwardOrdered Irregular Points
└────────────────────────────────────────────────────────────────────────────┘
 ↓ →            1         2         3         …  19990         19991
  :x9_to_x0_y0  0.806911  0.803646  0.794573         0.211483      0.211106

Second output_window/sample

julia

ys.reco[:, 2]

forcings_s.ta

10×19991 Matrix{Float32}:
 2.1   1.98  1.89  2.06  2.09   1.75  …  10.7   11.11  11.26  10.29   9.68
 1.98  1.89  2.06  2.09  1.75   1.51     11.11  11.26  10.29   9.68   9.22
 1.89  2.06  2.09  1.75  1.51   1.36     11.26  10.29   9.68   9.22   9.63
 2.06  2.09  1.75  1.51  1.36   1.11     10.29   9.68   9.22   9.63   9.93
 2.09  1.75  1.51  1.36  1.11   0.97      9.68   9.22   9.63   9.93  11.11
 1.75  1.51  1.36  1.11  0.97   0.87  …   9.22   9.63   9.93  11.11  11.06
 1.51  1.36  1.11  0.97  0.87   0.59      9.63   9.93  11.11  11.06  10.96
 1.36  1.11  0.97  0.87  0.59   0.23      9.93  11.11  11.06  10.96  10.55
 1.11  0.97  0.87  0.59  0.23   0.06     11.11  11.06  10.96  10.55  11.06
 0.97  0.87  0.59  0.23  0.06  -0.23     11.06  10.96  10.55  11.06  10.91

Any of the first output_window the same as the second output_window? ideally not big overlap

julia

overlap = output_window - output_shift
overlap_length = sum(in(ys.reco[:, 1]), ys.reco[:, 2])

Split the (windowed) dataset into train/validation in the same way as train does.

julia

sdf = split_data(df, hlstm, sequence_kwargs = (; input_window = input_window, output_window = output_window, output_shift = output_shift, lead_time = 0), array_type = pref_array_type);

((x_train, f_train), y_train), ((x_val, f_val), y_val) = sdf;
x_train
y_train
f_train
y_train_nan = map(v -> .!isnan.(v), y_train)

(reco = Bool[1 1 … 1 1],)

Wrap the training windows/samples in a DataLoader to form batches.

WARNING

batchsize is the number of windows/samples used per gradient step to update the parameters. Processing 32 windows in one array is usually much faster than doing 32 separate forward/backward passes with batch_size=1.

julia

train_dl = EasyHybrid.DataLoader(((x_train, f_train), y_train); batchsize = 32);

Run hybrid model forwards

julia

(x_first, forcings_first), y_first = first(train_dl)

ps, st = Lux.setup(Random.default_rng(), hlstm);
xf = (x_first, forcings_first)
frun = hlstm(xf, ps, st);

┌ Warning: `replicate` doesn't work for `TaskLocalRNG`. Returning the same `TaskLocalRNG`.
└ @ LuxCore ~/.julia/packages/LuxCore/kQC9S/src/LuxCore.jl:18
┌ Warning: `replicate` doesn't work for `TaskLocalRNG`. Returning the same `TaskLocalRNG`.
└ @ LuxCore ~/.julia/packages/LuxCore/kQC9S/src/LuxCore.jl:18

Extract predicted yhat

julia

reco_mod = frun[1].reco

10×32 Matrix{Float32}:
 3.05688  3.03156  3.01271  3.04842  …  3.35858  3.40359  3.30437  3.24218
 2.92547  2.90728  2.94174  2.94786     3.19484  3.05387  2.96275  2.75481
 2.86271  2.89664  2.90267  2.83506     3.00384  2.91657  2.7046   2.68965
 2.8976   2.90363  2.836    2.78921     2.91807  2.69534  2.66617  2.60472
 2.90769  2.83996  2.79311  2.76422     2.69578  2.65755  2.58537  2.49733
 2.84355  2.79664  2.76771  2.72016  …  2.6557   2.57664  2.48141  2.42136
 2.79844  2.76949  2.72191  2.69563     2.57442  2.47478  2.41004  2.42682
 2.76954  2.72196  2.69568  2.67706     2.47309  2.40546  2.41884  2.37386
 2.72077  2.6945   2.67588  2.62445     2.40434  2.41563  2.3683   2.41889
 2.69254  2.67394  2.62254  2.55791     2.41488  2.36603  2.41469  2.41111

Bring observations in same shape

julia

reco_obs = y_first.reco
reco_nan = .!isnan.(reco_obs);

Compute loss

julia

EasyHybrid.compute_loss(hlstm, ps, st, ((x_train, f_train), (y_train, y_train_nan)), logging = LoggingLoss(train_mode = true))

(10.420438f0, (st_nn = (layer_1 = NamedTuple(), layer_2 = NamedTuple(), layer_3 = (rng = TaskLocalRNG(),), layer_4 = NamedTuple(), layer_5 = NamedTuple()), fixed = NamedTuple()), NamedTuple())

9. Train LSTM Hybrid Model

julia

out_lstm = train(
    hlstm,
    df,
    ();
    nepochs = 100,           # Number of training epochs
    batchsize = 128,         # Batch size of training windows/samples
    opt = RMSProp(0.01),   # Optimizer and learning rate
    monitor_names = [:rb, :Q10], # Parameters to monitor during training
    yscale = identity,       # Scaling for outputs
    shuffleobs = false,
    training_loss = :nseLoss,
    loss_types = [:nse, :nseLoss],
    sequence_kwargs = (; input_window = input_window, output_window = output_window, output_shift = output_shift, lead_time = 0),
    plotting = true,
    show_progress = false,
    input_batchnorm = false,
    array_type = pref_array_type,
    model_name = "RbQ10_synthetic_lstm"
);

[ Info: Using split_into_sequences: (input_window = 10, output_window = 1, output_shift = 1, lead_time = 0)
┌ Warning: `replicate` doesn't work for `TaskLocalRNG`. Returning the same `TaskLocalRNG`.
└ @ LuxCore ~/.julia/packages/LuxCore/kQC9S/src/LuxCore.jl:18
[ Info: Training outputs will be saved to: /Users/runner/work/EasyHybrid.jl/EasyHybrid.jl/docs/build
[ Info: Animation (dashboard) saved to /Users/runner/work/EasyHybrid.jl/EasyHybrid.jl/docs/build/training_history_RbQ10_synthetic_lstm.mp4
[ Info: Returning best model from epoch 78 with validation loss: -1.0728831

julia

first(out_lstm.val_obs_pred, 5)

5×2 DataFrame

Row	reco	reco_pred
	Float32	Float32
1	1.40187	2.21723
2	1.38034	2.17916
3	1.42572	2.26685
4	1.45769	2.32988
5	1.50553	2.42363

10. Train Single NN Hybrid Model (Optional)

For comparison, we can also train a hybrid model with a standard feed-forward neural network

julia

hm = constructHybridModel(
    predictors,
    forcing,
    target,
    RbQ10,
    parameters,
    neural_param_names,
    global_param_names,
    hidden_layers = NN, # Neural network architecture
    scale_nn_outputs = true, # Scale neural network outputs
    input_batchnorm = false,   # Apply batch normalization to inputs
);

Train the hybrid model

julia

single_nn_out = train(
    hm,
    df,
    ();
    nepochs = 100,           # Number of training epochs
    batchsize = 128,         # Batch size for training
    opt = RMSProp(0.01),   # Optimizer and learning rate
    monitor_names = [:rb, :Q10], # Parameters to monitor during training
    yscale = identity,       # Scaling for outputs
    shuffleobs = false,
    training_loss = :nseLoss,
    loss_types = [:nse, :nseLoss],
    array_type = :DimArray,
    plotting = true,
    show_progress = false,
    model_name = "RbQ10_synthetic_single_nn"
);

[ Info: Training outputs will be saved to: /Users/runner/work/EasyHybrid.jl/EasyHybrid.jl/docs/build
[ Info: Animation (dashboard) saved to /Users/runner/work/EasyHybrid.jl/EasyHybrid.jl/docs/build/training_history_RbQ10_synthetic_single_nn.mp4
[ Info: Returning best model from epoch 1 with validation loss: -1.3012946

Close enough

julia

out_lstm.best_loss, single_nn_out.best_loss

(-1.0728831f0, -1.3012946f0)

LSTM Hybrid Model with EasyHybrid.jl ​

1. Load Packages ​

2. Data Loading and Preprocessing ​

3. Define Neural Network Architectures ​

4. We define the process-based model, a classical Q10 model for respiration ​

5. Define Model Parameters ​

6. Configure Hybrid Model Components ​

7. Construct LSTM Hybrid Model ​

8. Data Preparation Steps (Demonstration) ​

9. Train LSTM Hybrid Model ​

10. Train Single NN Hybrid Model (Optional) ​

LSTM Hybrid Model with EasyHybrid.jl

1. Load Packages

2. Data Loading and Preprocessing

3. Define Neural Network Architectures

4. We define the process-based model, a classical Q10 model for respiration

5. Define Model Parameters

6. Configure Hybrid Model Components

7. Construct LSTM Hybrid Model

8. Data Preparation Steps (Demonstration)

9. Train LSTM Hybrid Model

10. Train Single NN Hybrid Model (Optional)