Supervised Models¶

All supervised models in prosemble follow the same fit / predict API. They differ in the distance metric, loss function, and representation.

GLVQ — Baseline¶

Generalized Learning Vector Quantization is the foundation model. It learns prototype positions to minimize the relative distance cost \(\mu = (d^+ - d^-) / (d^+ + d^-)\).

from prosemble.models import GLVQ
from prosemble.datasets import load_iris_jax
from prosemble.core.utils import train_test_split_jax

dataset = load_iris_jax()
X, y = dataset.input_data, dataset.labels
X_train, X_test, y_train, y_test = train_test_split_jax(X, y)

model = GLVQ(
    n_prototypes_per_class=2,
    max_iter=100,
    lr=0.01,
    random_seed=42,
)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

Unequal prototypes per class:

model = GLVQ(
    n_prototypes_per_class={0: 3, 1: 2, 2: 1},
    prototypes_initializer='class_conditional_mean',
)

Prototype initializers (pass as string):

'stratified_random' — random samples per class (default)
'class_mean' — class centroids
'class_conditional_mean' — class centroids replicated
'stratified_noise' — random samples + Gaussian noise
'random_normal' — random normal initialization
'uniform' — random uniform
'zeros' — zero initialization
'ones' — ones initialization
'fill_value' — constant fill value (pass value= kwarg)

For unsupervised models, selection_init and mean_init from prosemble.core.initializers provide classless alternatives.

You can also pass a callable for custom initialization (e.g., literal_init(values) from prosemble.core.initializers).

GRLVQ — Feature Relevances¶

Adds per-feature relevance weights \(\lambda_j\) that reveal which features matter for classification.

from prosemble.models import GRLVQ

model = GRLVQ(
    n_prototypes_per_class=2,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Inspect learned relevances
print(model.relevances_)  # shape: (n_features,)

The relevance vector satisfies \(\lambda_j \geq 0\) and \(\sum_j \lambda_j = 1\). Features with large \(\lambda_j\) are important for classification.

GMLVQ — Matrix Metric Learning¶

Learns a full linear transformation \(\Omega\) such that the distance is \(d(x, w) = \|\Omega(x - w)\|^2\). This captures feature correlations.

from prosemble.models import GMLVQ

model = GMLVQ(
    n_prototypes_per_class=1,
    latent_dim=2,        # project to 2D
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

# Learned matrices
print(model.omega_matrix.shape)   # (latent_dim, n_features)
print(model.lambda_matrix)        # Omega^T @ Omega — feature importance

# Feature importance from diagonal
import jax.numpy as jnp
feature_importance = jnp.diag(model.lambda_matrix)

When latent_dim < n_features, \(\Omega\) also performs dimensionality reduction optimized for classification.

LGMLVQ — Local Metrics¶

Each prototype gets its own \(\Omega_k\) matrix, allowing different regions of the feature space to use different metrics.

from prosemble.models import LGMLVQ

model = LGMLVQ(
    n_prototypes_per_class=2,
    latent_dim=2,
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

GTLVQ — Tangent Distance¶

Learns per-prototype invariance subspaces. The tangent distance measures distance only in directions orthogonal to the invariance subspace: \(d(x, w_k) = \|(I - \Omega_k \Omega_k^T)(x - w_k)\|^2\).

from prosemble.models import GTLVQ

model = GTLVQ(
    n_prototypes_per_class=2,
    tangent_dim=1,        # 1D invariance subspace
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

Wasserstein Prototype LVQ¶

The Wasserstein LVQ family uses Gaussian distributional prototypes instead of point prototypes. Each prototype \(k\) is a diagonal Gaussian \(\mathcal{N}(\mu_k, \text{diag}(\sigma_k^2))\). The distance from a data point \(x\) (treated as a Dirac delta) to prototype \(k\) is the squared 2-Wasserstein distance:

\[W_2^2(\delta_x, \mathcal{N}(\mu_k, \text{diag}(\sigma_k^2))) = \|x - \mu_k\|^2 + \sum_j \sigma_{kj}^2\]

The variance term acts as a per-prototype learned “spread” — prototypes with smaller variance are more specific and attract nearby points more strongly.

WGLVQ — Wasserstein GLVQ (base variant):

from prosemble.models import WGLVQ

model = WGLVQ(
    n_prototypes_per_class=2,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Learned prototype means and variances
print(model.prototype_means.shape)      # (n_prototypes, n_features)
print(model.prototype_variances.shape)   # (n_prototypes, n_features)

WGMLVQ — Adds a global \(\Omega\) matrix projection:

\[W_2^2(x, k) = \|\Omega(x - \mu_k)\|^2 + \sum_j \sigma_{kj}^2\]

from prosemble.models import WGMLVQ

model = WGMLVQ(
    n_prototypes_per_class=2,
    latent_dim=2,          # project to 2D
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

print(model.omega_matrix.shape)   # (n_features, latent_dim)
print(model.lambda_matrix)        # Omega^T @ Omega

WGRLVQ — Adds per-feature relevance weighting:

\[W_2^2(x, k) = \sum_j \lambda_j (x_j - \mu_{kj})^2 + \sum_j \sigma_{kj}^2\]

where \(\lambda_j \geq 0\) and \(\sum_j \lambda_j = 1\).

from prosemble.models import WGRLVQ

model = WGRLVQ(
    n_prototypes_per_class=2,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Learned relevance profile
print(model.relevance_profile)   # shape: (n_features,), sums to 1

Note

Variances are parameterized internally as \(\log(\sigma^2)\) to ensure positivity during gradient optimization. Access the actual variances via model.prototype_variances.

CELVQ — Cross-Entropy¶

Replaces the GLVQ \(\mu\) cost with cross-entropy loss over class probabilities derived from distances. Produces calibrated probabilities.

from prosemble.models import CELVQ

model = CELVQ(
    n_prototypes_per_class=2,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)  # calibrated class probabilities

LVQ1 and LVQ2.1 — Classic Non-Gradient¶

Simple competitive learning without explicit gradients. LVQ1 updates only the nearest prototype; LVQ2.1 updates both nearest correct and incorrect.

from prosemble.models import LVQ1, LVQ21

model = LVQ1(
    n_prototypes_per_class=2,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

Probabilistic Models (SLVQ, RSLVQ)¶

Treat prototypes as Gaussian components. RSLVQ provides robust log-likelihood training with confidence-based rejection.

from prosemble.models import RSLVQ

model = RSLVQ(
    n_prototypes_per_class=2,
    sigma=1.0,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

MRSLVQ — Matrix RSLVQ with global Omega metric learning:

from prosemble.models import MRSLVQ

model = MRSLVQ(
    n_prototypes_per_class=1,
    latent_dim=2,
    sigma=1.0,
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)
print(model.omega_matrix.shape)   # (n_features, latent_dim)
print(model.lambda_matrix)        # Omega^T @ Omega

LMRSLVQ — Local Matrix RSLVQ with per-prototype Omega matrices:

from prosemble.models import LMRSLVQ

model = LMRSLVQ(
    n_prototypes_per_class=1,
    latent_dim=2,
    sigma=1.0,
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

Probabilistic LVQ with Neural Gas (RSLVQ-NG Family)¶

The RSLVQ-NG family combines RSLVQ’s Gaussian mixture probabilistic assignment with Neural Gas rank-based neighborhood cooperation. All prototypes participate in the loss, weighted by both their Gaussian probability and NG rank.

RSLVQ_NG — Euclidean distance (base variant):

from prosemble.models import RSLVQ_NG

model = RSLVQ_NG(
    n_prototypes_per_class=3,
    sigma=1.0,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

MRSLVQ_NG — Global Omega matrix metric learning:

from prosemble.models import MRSLVQ_NG

model = MRSLVQ_NG(
    n_prototypes_per_class=2,
    latent_dim=2,
    sigma=1.0,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

# Learned metric matrices
print(model.omega_matrix.shape)   # (n_features, latent_dim)
print(model.lambda_matrix)        # Omega^T @ Omega

LMRSLVQ_NG — Per-prototype local Omega matrices:

from prosemble.models import LMRSLVQ_NG

model = LMRSLVQ_NG(
    n_prototypes_per_class=2,
    latent_dim=2,
    sigma=1.0,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

Median LVQ¶

Constrains prototypes to be actual data points for maximum interpretability.

from prosemble.models import MedianLVQ

model = MedianLVQ(
    n_prototypes_per_class=1,
    max_iter=50,
    random_seed=42,
)
model.fit(X_train, y_train)
# model.prototypes_ are actual training examples

Deep Variants (LVQMLN, PLVQ)¶

Add a trainable MLP backbone for nonlinear feature extraction. Prototypes live in the latent space (not the input space).

from prosemble.models import LVQMLN

model = LVQMLN(
    n_prototypes_per_class=2,
    hidden_dims=[64, 32],  # MLP architecture
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

Siamese Variants¶

Like LVQMLN, but prototypes are in the input space and pass through the same backbone as input data. This makes prototypes interpretable.

from prosemble.models import SiameseGLVQ, SiameseGMLVQ, SiameseGTLVQ

model = SiameseGLVQ(
    n_prototypes_per_class=2,
    hidden_dims=[64, 32],
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)
# model.prototypes_ is in the original input space

Image LVQ¶

Siamese architecture with a CNN backbone for image classification.

from prosemble.models import ImageGLVQ, ImageGMLVQ, ImageGTLVQ

model = ImageGLVQ(
    n_prototypes_per_class=2,
    max_iter=50,
    lr=0.001,
)
# X_train should be image data: (n_samples, height, width, channels)
model.fit(X_train, y_train)

CBC — Classification-By-Components¶

Uses classless components with learned reasoning matrices for explainable classification.

from prosemble.models import CBC

model = CBC(
    n_components=6,
    num_classes=3,
    max_iter=100,
    lr=0.001,
)
model.fit(X_train, y_train)

Supervised Neural Gas (SNG)¶

The simplest Supervised Neural Gas variant. Combines GLVQ loss with Neural Gas neighborhood cooperation using squared Euclidean distance. All same-class prototypes are updated per sample, weighted by rank. When \(\gamma \to 0\), SNG recovers standard GLVQ.

from prosemble.models import SNG

model = SNG(
    n_prototypes_per_class=3,
    gamma_init=5.0,         # initial neighborhood range
    gamma_final=0.01,       # final (narrower)
    lr_ratio=0.5,           # separate learning rates (Hammer et al. 2003)
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

Supervised Relevance Neural Gas (SRNG)¶

Combines GLVQ loss with Neural Gas neighborhood cooperation. All same-class prototypes are updated per sample, weighted by rank. SRNG also learns per-feature relevance weights (like GRLVQ).

from prosemble.models import SRNG

model = SRNG(
    n_prototypes_per_class=3,
    lambda_init=5.0,       # initial neighborhood range
    lambda_final=0.01,     # final (narrower)
    lr_ratio=0.5,          # separate learning rates (Hammer et al. 2003)
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Inspect relevances (SRNG learns feature weights like GRLVQ)
print(model.relevances_)

Supervised Matrix Neural Gas (SMNG, SCMNG, SLNG, STNG)¶

The SMNG family extends SRNG with matrix metric adaptation while keeping Neural Gas neighborhood cooperation. All same-class prototypes participate in the loss weighted by rank.

All supervised NG models support a lr_ratio parameter that controls separate learning rates for correct-class and wrong-class prototypes (Hammer et al. 2003). Setting lr_ratio < 1.0 reduces the gradient magnitude flowing through the closest wrong-class prototype:

model = SMNG(
    n_prototypes_per_class=5,
    lr_ratio=0.5,   # wrong-class lr = 0.5 * correct-class lr
    max_iter=200,
    lr=0.01,
)

When lr_ratio=1.0 (default), correct and wrong-class prototypes receive equal gradient magnitude.

SMNG — Global Omega matrix (like GMLVQ + Neural Gas):

from prosemble.models import SMNG

model = SMNG(
    n_prototypes_per_class=3,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=200,
    lr=0.01,
)
model.fit(X_train, y_train)
print(model.omega_matrix.shape)   # (n_features, latent_dim)
print(model.lambda_matrix.shape)  # (n_features, n_features)

SCMNG — Per-class Omega matrices (like class-wise GMLVQ + Neural Gas):

from prosemble.models import SCMNG

model = SCMNG(
    n_prototypes_per_class=3,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=200,
    lr=0.01,
)
model.fit(X_train, y_train)
print(model.omegas_.shape)  # (n_classes, n_features, latent_dim)

Each class shares a single \(\Omega_c\) matrix, so cooperating same-class prototypes contribute aligned gradients to the metric. This avoids the gradient dilution of a global \(\Omega\) (SMNG) while using fewer parameters than per-prototype matrices (SLNG).

SLNG — Per-prototype Omega matrices (like LGMLVQ + Neural Gas):

from prosemble.models import SLNG

model = SLNG(
    n_prototypes_per_class=3,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=200,
    lr=0.01,
)
model.fit(X_train, y_train)
print(model.omegas_.shape)  # (n_protos, n_features, latent_dim)

STNG — Per-prototype tangent subspaces (like GTLVQ + Neural Gas):

from prosemble.models import STNG

model = STNG(
    n_prototypes_per_class=3,
    subspace_dim=2,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=200,
    lr=0.01,
)
model.fit(X_train, y_train)
print(model.omegas_.shape)  # (n_protos, n_features, subspace_dim)

Cross-Entropy Neural Gas (CELVQ-NG Family)¶

The CELVQ-NG family combines cross-entropy loss over all-class softmax logits with Neural Gas rank-based neighborhood cooperation. Unlike SRNG (which uses pairwise GLVQ \(\mu\) cost), CELVQ-NG considers all classes simultaneously via softmax, providing better calibrated probabilities and gradient flow to all prototypes.

Neural Gas cooperation replaces the hard per-class min pooling in CELVQ with NG-weighted soft pooling: for each class, prototypes are ranked by distance and weighted by \(h_k = \exp(-\text{rank} / \gamma)\). When \(\gamma \to 0\), CELVQ-NG recovers standard CELVQ.

CELVQ_NG — Euclidean distance (base variant):

from prosemble.models import CELVQ_NG

model = CELVQ_NG(
    n_prototypes_per_class=3,
    gamma_init=5.0,        # initial neighborhood range
    gamma_final=0.01,      # final (narrower)
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)  # calibrated probabilities

MCELVQ_NG — Global Omega matrix metric learning:

from prosemble.models import MCELVQ_NG

model = MCELVQ_NG(
    n_prototypes_per_class=3,
    latent_dim=2,          # project to 2D
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Learned metric matrices
print(model.omega_matrix.shape)   # (n_features, latent_dim)
print(model.lambda_matrix)        # Omega^T @ Omega — feature importance

LCELVQ_NG — Per-prototype local Omega matrices:

from prosemble.models import LCELVQ_NG

model = LCELVQ_NG(
    n_prototypes_per_class=2,
    latent_dim=2,
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Each prototype has its own Omega_k
print(model.omegas_.shape)  # (n_prototypes, n_features, latent_dim)

TCELVQ_NG — Tangent subspace distance:

from prosemble.models import TCELVQ_NG

model = TCELVQ_NG(
    n_prototypes_per_class=2,
    subspace_dim=1,        # 1D invariance subspace
    gamma_init=5.0,
    gamma_final=0.01,
    max_iter=100,
    lr=0.01,
)
model.fit(X_train, y_train)

# Learned orthogonal tangent bases
print(model.omegas_.shape)  # (n_prototypes, n_features, subspace_dim)

The tangent variant measures distance orthogonal to learned invariance subspaces: \(d(x, w_k) = \|(I - \Omega_k \Omega_k^T)(x - w_k)\|^2\). Best suited for high-dimensional data with invariance structure (images, spectra, signals).

CELVQ-NG Family Summary¶
Model	Distance Metric	Learnable Parameters	Best For
CELVQ_NG	Euclidean	Prototypes only	General-purpose, fast training
MCELVQ_NG	\(\\|\Omega(x-w)\\|^2\)	Global \(\Omega\) matrix	Feature selection, dimensionality reduction
LCELVQ_NG	\(\\|\Omega_k(x-w_k)\\|^2\)	Per-prototype \(\Omega_k\)	Heterogeneous feature spaces
TCELVQ_NG	\(\\|(I-\Omega_k\Omega_k^T)(x-w_k)\\|^2\)	Tangent bases \(\Omega_k\)	High-dimensional data with invariances

Common Patterns¶

Resume training:

model.fit(X_train, y_train, max_iter=50)
model.fit(X_train, y_train, resume=True, max_iter=50)  # continue from last state

Fitted attributes (available after fit):

model.prototypes_ — prototype positions
model.prototype_labels_ — class labels per prototype
model.n_iter_ — number of iterations run
model.loss_ — final loss value
model.loss_history_ — loss per iteration

All models support:

predict(X) — hard labels
predict_proba(X) — soft class probabilities
save(path) / Model.load(path) — persistence