EasyHybrid.jl
Documentation for EasyHybrid.jl
.
EasyHybrid.EasyHybrid Module
EasyHybrid
EasyHybrid is a Julia package for hybrid machine learning models, combining neural networks and traditional statistical methods. It provides tools for data preprocessing, model training, and evaluation, making it easier to build and deploy hybrid models.
sourceEasyHybrid.BulkDensitySOC Type
BulkDensitySOC(NN, predictors, targets, oBD)
A hybrid model with a neural network NN
, predictors
and one global parameter oBD.
EasyHybrid.BulkDensitySOC Method
BulkDensitySOC(NN, predictors, oBD)(ds_k)
Hybrid model for bulk density based on the Federer (1993) paper http://dx.doi.org/10.1139/x93-131 plus SOC concentrations, density and coarse fraction
sourceEasyHybrid.FluxPartModelQ10Lux Type
FluxPartModelQ10Lux(RUE_NN, Rb_NN, RUE_predictors, Rb_predictors, forcing, targets, Q10)
A flux partitioning model with separate neural networks for RUE (Radiation Use Efficiency) and Rb (basal respiration), using Q10 temperature sensitivity for respiration calculations.
sourceEasyHybrid.FluxPartModelQ10Lux Method
FluxPartModelQ10Lux(RUE_NN, Rb_NN, RUE_predictors, Rb_predictors, forcing, targets, Q10)(ds_k, ps, st)
Model definition
GPP = SW_IN * RUE(αᵢ(t)) / 12.011 # µmol/m²/s = J/s/m² * g/MJ / g/mol
Reco = Rb(βᵢ(t)) * Q10^((T(t) - T_ref)/10)
NEE = Reco - GPP
where:
RUE(αᵢ(t)) is the radiation use efficiency predicted by neural network
Rb(βᵢ(t)) is the basal respiration predicted by neural network
SW_IN is incoming shortwave radiation
T(t) is air temperature
T_ref is reference temperature (15°C)
Q10 is temperature sensitivity factor
EasyHybrid.FluxPartModel_NEE_ET2 Method
FluxPartModel_NEE_ET2
( RUE_predictors
::AbstractArray{Symbol}, Rb_predictors
::AbstractArray{Symbol}, WUE_predictors
::AbstractArray{Symbol}, Ecoeff_predictors
::AbstractArray{Symbol}; forcing=[:SW_IN, :TA], Q10=[1.5f0], neurons=15 )
EasyHybrid.FluxPartModel_Q10 Method
FluxPartModel_Q10(RUE_predictors, Rb_predictors; Q10=[1.5f0])
sourceEasyHybrid.HybridParams Type
HybridParams{M<:Function}
A little parametric stub for “the params of function M
.” All of your function‐based models become HybridParams{typeof(f)}
.
EasyHybrid.LinearHM Type
LinearHM(NN, predictors, forcing, targets, β)
A linear hybrid model with a neural network NN
, predictors
, forcing
and targets
terms.
EasyHybrid.LinearHM Method
LinearHM(NN, predictors, forcing, β)(ds_k)
Model definition ŷ = α x + β
Apply the linear hybrid model to the input data ds_k
(a KeyedArray
with proper dimensions). The model uses the neural network NN
to compute new α
's based on the predictors
and then computes ŷ
using the forcing
term x
.
Returns:
A tuple containing the predicted values ŷ
and a named tuple with the computed values of α
and the state st
.
Example:
using Lux
using EasyHybrid
using Random
using AxisKeys
ds_k = KeyedArray(rand(Float32, 3,4); data=[:a, :b, :c], sample=1:4)
m = Lux.Chain(Dense(2, 5), Dense(5, 1))
# Instantiate the model
# Note: The model is initialized with a neural network and the predictors and forcing terms
lh_model = LinearHM(m, (:a, :b), (:c,), 1.5f0)
rng = Random.default_rng()
Random.seed!(rng, 0)
ps, st = LuxCore.setup(rng, lh_model)
# Apply the model to the data
ŷ, αst = LuxCore.apply(lh_model, ds_k, ps, st)
EasyHybrid.LinearHybridModel Method
LinearHybridModel(predictors::AbstractArray{Symbol}, forcing::AbstractArray{Symbol}, out_dim::Int, neurons::Int; b=[1.5f0])
sourceEasyHybrid.LinearHybridModel_2outputs Method
LinearHybridModel_2outputs
(predictors::AbstractArray{Symbol}, forcing::AbstractArray{Symbol}, out_dim
::Int, neurons::Int, nn_chain
; a=[1.0f0], b=[1.5f0])
- nn_chain :: DenseNN, a Dense neural network
EasyHybrid.LoggingLoss Type
LoggingLoss
A structure to define a logging loss function for hybrid models. It allows for multiple loss types and an aggregation function to be specified.
Arguments:
loss_types::Vector{Symbol}
: A vector of loss types to compute, e.g.,[:mse, :mae]
.training_loss::Symbol
: The loss type to use during training, e.g.,:mse
.agg::Function
: A function to aggregate the losses, e.g.,sum
ormean
.train_mode::Bool
: A flag indicating whether the model is in training mode. Iftrue
, it usestraining_loss
; otherwise, it usesloss_types
.
EasyHybrid.RbQ10_2p Method
RbQ10_2p(NN, predictors, forcing, targets, Q10)(ds_k)
Model definition ŷ = Rb(αᵢ(t)) * Q10^((T(t) - T_ref)/10)
ŷ (respiration rate) is computed as a function of the neural network output Rb(αᵢ(t))
and the temperature T(t)
adjusted by the reference temperature T_ref
(default 15°C) using the Q10 temperature sensitivity factor. ````
EasyHybrid.RespirationRbQ10 Type
RespirationRbQ10(NN, predictors, forcing, targets, Q10)
A linear hybrid model with a neural network NN
, predictors
, targets
and forcing
terms.
EasyHybrid.RespirationRbQ10 Method
RespirationRbQ10(NN, predictors, forcing, targets, Q10)(ds_k)
Model definition ŷ = Rb(αᵢ(t)) * Q10^((T(t) - T_ref)/10)
ŷ (respiration rate) is computed as a function of the neural network output Rb(αᵢ(t))
and the temperature T(t)
adjusted by the reference temperature T_ref
(default 15°C) using the Q10 temperature sensitivity factor. ````
EasyHybrid.Rs_components Type
Rs_components(NN, predictors, forcing, targets, Q10_het, Q10_root, Q10_myc)
A linear hybrid model with a neural network NN
, predictors
, targets
and forcing
terms.
EasyHybrid.Rs_components Method
Rs_components(NN, predictors, forcing, targets, Q10)(ds_k)
Model definition ŷ = Rb(αᵢ(t)) * Q10^((T(t) - T_ref)/10)
ŷ (respiration rate) is computed as a function of the neural network output Rb(αᵢ(t))
and the temperature T(t)
adjusted by the reference temperature T_ref
(default 15°C) using the Q10 temperature sensitivity factor. ````
EasyHybrid.SinusHybridModel Method
SinusHybridModel(predictors, forcing, out_dim; neurons=15, b=[1.5f0])
sourceEasyHybrid.WrappedTuples Type
WrappedTuples(vec::Vector{<:NamedTuple})
Wraps a vector of named tuples to allow dot-access to each field as a vector.
sourceBase.Multimedia.display Method
Base.display(io::IO, parameter_container::ParameterContainer)
Display a ParameterContainer containing parameter bounds in a formatted table.
This function creates a nicely formatted table showing parameter names as row labels and bound types (default, lower, upper) as column headers.
Arguments
io::IO
: Output streamparameter_container::ParameterContainer
: A ParameterContainer with parameter bounds (typically created bybuild_parameter_matrix
)
Returns
- Displays a formatted table using PrettyTables.jl
Example
# Create parameter defaults and bounds
parameter_defaults_and_bounds = (
θ_s = (0.464f0, 0.302f0, 0.700f0),
α = (log(0.103f0), log(0.01f0), log(7.874f0)),
n = (log(3.163f0 - 1), log(1.100f0 - 1), log(20.000f0 - 1)),
)
# Build ParameterContainer and display
parameter_container = ParameterContainer(parameter_defaults_and_bounds)
display(parameter_container) # or just parameter_container
Notes
Requires PrettyTables.jl to be loaded
The table shows parameter names as row labels and bound types as column headers
Default alignment is right-aligned for all columns
EasyHybrid.build_parameter_matrix Method
build_parameter_matrix(parameter_defaults_and_bounds::NamedTuple)
Build a ComponentArray matrix from a NamedTuple containing parameter defaults and bounds.
This function converts a NamedTuple where each value is a tuple of (default, lower, upper) bounds into a ComponentArray with named axes for easy parameter management in hybrid models.
Arguments
parameter_defaults_and_bounds::NamedTuple
: A NamedTuple where each key is a parameter name and each value is a tuple of (default, lower, upper) for that parameter.
Returns
ComponentArray
: A 2D ComponentArray with:Row axis: Parameter names (from the NamedTuple keys)
Column axis: Bound types (:default, :lower, :upper)
Data: The parameter values organized in a matrix format
Example
# Define parameter defaults and bounds
parameter_defaults_and_bounds = (
θ_s = (0.464f0, 0.302f0, 0.700f0), # Saturated water content [cm³/cm³]
h_r = (1500.0f0, 1500.0f0, 1500.0f0), # Pressure head at residual water content [cm]
α = (log(0.103f0), log(0.01f0), log(7.874f0)), # Shape parameter [cm⁻¹]
n = (log(3.163f0 - 1), log(1.100f0 - 1), log(20.000f0 - 1)), # Shape parameter [-]
)
# Build the ComponentArray
parameter_matrix = build_parameter_matrix(parameter_defaults_and_bounds)
# Access specific parameter bounds
parameter_matrix.θ_s.default # Get default value for θ_s
parameter_matrix[:, :lower] # Get all lower bounds
parameter_matrix[:, :upper] # Get all upper bounds
Notes
The function expects each value in the NamedTuple to be a tuple with exactly 3 elements
The order of bounds is always (default, lower, upper)
The resulting ComponentArray can be used for parameter optimization and constraint handling
EasyHybrid.build_parameters Method
build_parameters(parameters::NamedTuple, f::DataType) -> AbstractHybridModel
Constructs a parameter container from a named tuple of parameter bounds and wraps it in a user-defined subtype of AbstractHybridModel
.
Arguments
parameters::NamedTuple
: Named tuple where each entry is a tuple of (default, lower, upper) bounds for a parameter.f::DataType
: A constructor for a subtype ofAbstractHybridModel
that takes aParameterContainer
as its argument.
Returns
- An instance of the user-defined
AbstractHybridModel
subtype containing the parameter container.
EasyHybrid.compute_bulk_density Method
compute_bulk_density(SOCconc, oBD, mBD)
model for bulk density based on the Federer (1993) paper http://dx.doi.org/10.1139/x93-131 plus SOC concentrations, density and coarse fraction
sourceEasyHybrid.compute_loss Function
compute_loss(ŷ, y, y_nan, targets, training_loss::Symbol, agg::Function)
compute_loss(ŷ, y, y_nan, targets, loss_types::Vector{Symbol}, agg::Function)
Compute the loss for the given predictions and targets using the specified training loss (or vector of losses) type and aggregation function.
Arguments:
ŷ
: Predicted values.y
: Target values.y_nan
: Mask for NaN values.targets
: The targets for which the loss is computed.training_loss::Symbol
: The loss type to use during training, e.g.,:mse
.loss_types::Vector{Symbol}
: A vector of loss types to compute, e.g.,[:mse, :mae]
.agg::Function
: The aggregation function to apply to the computed losses, e.g.,sum
ormean
.
Returns a single loss value if training_loss
is provided, or a NamedTuple of losses for each type in loss_types
.
EasyHybrid.constructNNModel Method
constructNNModel(predictors, targets; hidden_layers, activation, scale_nn_outputs)
Main constructor: hidden_layers
can be either • a Vector{Int}
of sizes, or • a Chain
of hidden-layer Dense
blocks.
EasyHybrid.display_parameter_bounds Method
display_parameter_bounds(parameter_container::ParameterContainer; alignment=:r)
Legacy function for displaying parameter bounds with custom alignment.
Arguments
parameter_container::ParameterContainer
: A ParameterContainer with parameter boundsalignment
: Alignment for table columns (default: right-aligned for all columns)
Returns
- Displays a formatted table using PrettyTables.jl
EasyHybrid.get_predictions_targets Method
get_predictions_targets(HM, x, (y_t, y_nan), ps, st, targets)
Get predictions and targets from the hybrid model and return them along with the NaN mask.
sourceEasyHybrid.initialize_plotting_observables Method
initialize_plotting_observables(init_ŷ_train, init_ŷ_val, y_train, y_val, l_init_train, l_init_val, training_loss, agg, monitor_names, target_names)
Initialize plotting observables for training visualization if the Makie extension is loaded.
sourceEasyHybrid.load_timeseries_netcdf Method
load_timeseries_netcdf(path::AbstractString; timedim::AbstractString = "time") -> DataFrame
Reads a NetCDF file where all data variables are 1D over the specified timedim
and returns a tidy DataFrame with one row per time step.
Only includes variables whose sole dimension is
timedim
.Does not attempt to parse or convert time units; all columns are read as-is.
EasyHybrid.loss_fn Function
loss_fn(ŷ, y, y_nan, ::Val{:rmse})
loss_fn(ŷ, y, y_nan, ::Val{:mse})
loss_fn(ŷ, y, y_nan, ::Val{:mae})
loss_fn(ŷ, y, y_nan, ::Val{:pearson})
loss_fn(ŷ, y, y_nan, ::Val{:r2})
Compute the loss for the given predictions and targets using the specified loss type.
Arguments:
ŷ
: Predicted values.y
: Target values.y_nan
: Mask for NaN values.::Val{:rmse}
: Root Mean Square Error or::Val{:mse}
: Mean Square Error or::Val{:mae}
: Mean Absolute Error or::Val{:pearson}
: Pearson correlation coefficient or::Val{:r2}
: R-squared.
You can define additional loss functions as needed by adding more methods to this function.
Example:
In your working script just do the following:
import EasyHybrid: loss_fn
function EasyHybrid.loss_fn(ŷ, y, y_nan, ::Val{:nse})
return 1 - sum((ŷ[y_nan] .- y[y_nan]).^2) / sum((y[y_nan] .- mean(y[y_nan])).^2)
end
EasyHybrid.lossfn Method
lossfn(HM::LuxCore.AbstractLuxContainerLayer, x, (y_t, y_nan), ps, st, logging::LoggingLoss)
Arguments:
HM::LuxCore.AbstractLuxContainerLayer
: The hybrid model to compute the loss for.x
: Input data for the model.(y_t, y_nan)
: Tuple containing the target values and a mask for NaN values.ps
: Parameters of the model.st
: State of the model.logging::LoggingLoss
: Logging configuration for the loss function.
EasyHybrid.prepare_data Method
prepare_data(hm, data)
Utility function to see if the data is already in the expected format or if further filtering and re-packing is needed.
Arguments:
hm: The Hybrid Model
data: either a Tuple of KeyedArrays or a single KeyedArray.
Returns a tuple of KeyedArrays
sourceMethod
prepare_hidden_chain(hidden_layers, in_dim, out_dim; activation, input_batchnorm=false)
Construct a neural network Chain
for use in NN models.
Arguments
hidden_layers::Union{Vector{Int}, Chain}
:If a
Vector{Int}
, specifies the sizes of each hidden layer. For example,[32, 16]
creates two hidden layers with 32 and 16 units, respectively.If a
Chain
, the user provides a pre-built chain of hidden layers (excluding input/output layers).
in_dim::Int
: Number of input features (input dimension).out_dim::Int
: Number of output features (output dimension).activation
: Activation function to use in hidden layers (default:tanh
).input_batchnorm::Bool
: Iftrue
, applies aBatchNorm
layer to the input (default:false
).
Returns
- A
Chain
object representing the full neural network, with the following structure:Optional input batch normalization (if
input_batchnorm=true
)Input layer:
Dense(in_dim, h₁, activation)
whereh₁
is the first hidden sizeHidden layers: either user-supplied
Chain
or constructed fromhidden_layers
Output layer:
Dense(hₖ, out_dim)
wherehₖ
is the last hidden size
where h₁
is the first hidden size and hₖ
the last.
EasyHybrid.scale_single_param Method
scale_single_param(name, raw_val, parameters)
Scale a single parameter using the sigmoid scaling function.
sourceEasyHybrid.scale_single_param_minmax Method
scale_single_param_minmax(name, hm::AbstractHybridModel)
Scale a single parameter using the minmax scaling function.
sourceEasyHybrid.split_data Method
split_data(data, split_by_id; shuffleobs=false, split_ratio=0.8)
Split data into training and validation sets, either randomly or by grouping by ID.
Arguments:
data
: The data to split, typically a tuple of (x, y) KeyedArrayssplit_by_id
: Eithernothing
for random splitting, aSymbol
for column-based splitting, or anAbstractVector
for custom ID-based splittingshuffleobs
: Whether to shuffle observations during splitting (default: false)split_ratio
: Ratio of data to use for training (default: 0.8)
Returns:
(x_train, y_train)
: Training data tuple(x_val, y_val)
: Validation data tuple
EasyHybrid.split_data Method
split_data(df::DataFrame, target, xvars, seqID; f=0.8, batchsize=32, shuffle=true, partial=true)
sourceEasyHybrid.split_data Method
split_data(df::DataFrame, target, xvars; f=0.8, batchsize=32, shuffle=true, partial=true)
sourceEasyHybrid.to_keyedArray Function
tokeyedArray(dfg::Union{Vector,GroupedDataFrame{DataFrame}}, vars=All())
sourceEasyHybrid.train Method
train(hybridModel, data, save_ps; nepochs=200, batchsize=10, opt=Adam(0.01), patience=typemax(Int),
file_name=nothing, loss_types=[:mse, :r2], training_loss=:mse, agg=sum, train_from=nothing,
random_seed=161803, shuffleobs=false, yscale=log10, monitor_names=[], return_model=:best,
split_by_id=nothing, split_data_at=0.8, plotting=true, show_progress=true, hybrid_name=randstring(10))
Train a hybrid model using the provided data and save the training process to a file in JLD2 format. Default output file is trained_model.jld2
at the current working directory under output_tmp
.
Arguments:
hybridModel
: The hybrid model to be trained.data
: The training data, either a single DataFrame, a single KeyedArray, or a tuple of KeyedArrays.save_ps
: A tuple of physical parameters to save during training.
Core Training Parameters:
nepochs
: Number of training epochs (default: 200).batchsize
: Size of the training batches (default: 10).opt
: The optimizer to use for training (default: Adam(0.01)).patience
: The number of epochs to wait before early stopping (default:typemax(Int)
-> no early stopping).
Loss and Evaluation:
training_loss
: The loss type to use during training (default::mse
).loss_types
: A vector of loss types to compute during training (default:[:mse, :r2]
).agg
: The aggregation function to apply to the computed losses (default:sum
).
Data Handling:
shuffleobs
: Whether to shuffle the training data (default: false).split_by_id
: Column name or function to split data by ID (default: nothing -> no ID-based splitting).split_data_at
: Fraction of data to use for training when splitting (default: 0.8).
Training State and Reproducibility:
train_from
: A tuple of physical parameters and state to start training from or an output oftrain
(default: nothing -> new training).random_seed
: The random seed to use for training (default: 161803).
Output and Monitoring:
file_name
: The name of the file to save the training process (default: nothing -> "trained_model.jld2").hybrid_name
: Name identifier for the hybrid model (default: randomly generated 10-character string).return_model
: The model to return::best
for the best model,:final
for the final model (default::best
).monitor_names
: A vector of monitor names to track during training (default:[]
).
Visualization and UI:
plotting
: Whether to generate plots during training (default: true).show_progress
: Whether to show progress bars during training (default: true).yscale
: The scale to apply to the y-axis for plotting (default:log10
).
EasyHybrid.unpack_keyedarray Method
unpack_keyedarray(ka::KeyedArray, variable::Symbol) Extract a single variable from a KeyedArray and return it as a vector.
Arguments:
ka
: The KeyedArray to unpackvariable
: Symbol representing the variable to extract
Returns:
- Vector containing the variable data
Example:
# Extract just SW_IN from a KeyedArray
sw_in = unpack_keyedarray(ds_keyed, :SW_IN)
EasyHybrid.unpack_keyedarray Method
unpack_keyedarray(ka::KeyedArray, variables::Vector{Symbol}) Extract specified variables from a KeyedArray and return them as a NamedTuple of vectors.
Arguments:
ka
: The KeyedArray to unpackvariables
: Vector of symbols representing the variables to extract
Returns:
- NamedTuple with variable names as keys and vectors as values
Example:
# Extract SW_IN and TA from a KeyedArray
data = unpack_keyedarray(ds_keyed, [:SW_IN, :TA])
sw_in = data.SW_IN
ta = data.TA
EasyHybrid.unpack_keyedarray Method
unpack_keyedarray(ka::KeyedArray) Extract all variables from a KeyedArray and return them as a NamedTuple of vectors.
Arguments:
ka
: The KeyedArray to unpack
Returns:
- NamedTuple with all variable names as keys and vectors as values
Example:
# Extract all variables from a KeyedArray
data = unpack_keyedarray(ds_keyed)
# Access individual variables
sw_in = data.SW_IN
ta = data.TA
nee = data.NEE
EasyHybrid.@hybrid Macro
@hybrid ModelName α β γ
Macro to define hybrid model structs with arbitrary numbers of physical parameters.
This defines a struct with:
Default fields:
NN
(neural network),predictors
,forcing
,targets
.Additional physical parameters, i.e.,
α β γ
.
Examples
@hybrid MyModel α β γ
@hybrid FluidModel (:viscosity, :density)
@hybrid SimpleModel :a :b