Supervised Models¶
All supervised models in prosemble follow the same fit / predict API.
They differ in the distance metric, loss function, and representation.
GLVQ — Baseline¶
Generalized Learning Vector Quantization is the foundation model. It learns prototype positions to minimize the relative distance cost \(\mu = (d^+ - d^-) / (d^+ + d^-)\).
from prosemble.models import GLVQ
from prosemble.datasets import load_iris_jax
from prosemble.core.utils import train_test_split_jax
dataset = load_iris_jax()
X, y = dataset.input_data, dataset.labels
X_train, X_test, y_train, y_test = train_test_split_jax(X, y)
model = GLVQ(
n_prototypes_per_class=2,
max_iter=100,
lr=0.01,
random_seed=42,
)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
Unequal prototypes per class:
model = GLVQ(
n_prototypes_per_class={0: 3, 1: 2, 2: 1},
prototypes_initializer='class_conditional_mean',
)
Prototype initializers (pass as string):
'stratified_random'— random samples per class (default)'class_mean'— class centroids'class_conditional_mean'— class centroids replicated'stratified_noise'— random samples + Gaussian noise'random_normal'— random normal initialization'uniform'— random uniform'zeros'— zero initialization'ones'— ones initialization'fill_value'— constant fill value (passvalue=kwarg)
For unsupervised models, selection_init and mean_init from
prosemble.core.initializers provide classless alternatives.
You can also pass a callable for custom initialization (e.g., literal_init(values)
from prosemble.core.initializers).
GRLVQ — Feature Relevances¶
Adds per-feature relevance weights \(\lambda_j\) that reveal which features matter for classification.
from prosemble.models import GRLVQ
model = GRLVQ(
n_prototypes_per_class=2,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Inspect learned relevances
print(model.relevances_) # shape: (n_features,)
The relevance vector satisfies \(\lambda_j \geq 0\) and \(\sum_j \lambda_j = 1\). Features with large \(\lambda_j\) are important for classification.
GMLVQ — Matrix Metric Learning¶
Learns a full linear transformation \(\Omega\) such that the distance is \(d(x, w) = \|\Omega(x - w)\|^2\). This captures feature correlations.
from prosemble.models import GMLVQ
model = GMLVQ(
n_prototypes_per_class=1,
latent_dim=2, # project to 2D
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
# Learned matrices
print(model.omega_matrix.shape) # (latent_dim, n_features)
print(model.lambda_matrix) # Omega^T @ Omega — feature importance
# Feature importance from diagonal
import jax.numpy as jnp
feature_importance = jnp.diag(model.lambda_matrix)
When latent_dim < n_features, \(\Omega\) also performs dimensionality
reduction optimized for classification.
LGMLVQ — Local Metrics¶
Each prototype gets its own \(\Omega_k\) matrix, allowing different regions of the feature space to use different metrics.
from prosemble.models import LGMLVQ
model = LGMLVQ(
n_prototypes_per_class=2,
latent_dim=2,
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
GTLVQ — Tangent Distance¶
Learns per-prototype invariance subspaces. The tangent distance measures distance only in directions orthogonal to the invariance subspace: \(d(x, w_k) = \|(I - \Omega_k \Omega_k^T)(x - w_k)\|^2\).
from prosemble.models import GTLVQ
model = GTLVQ(
n_prototypes_per_class=2,
tangent_dim=1, # 1D invariance subspace
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
Wasserstein Prototype LVQ¶
The Wasserstein LVQ family uses Gaussian distributional prototypes instead of point prototypes. Each prototype \(k\) is a diagonal Gaussian \(\mathcal{N}(\mu_k, \text{diag}(\sigma_k^2))\). The distance from a data point \(x\) (treated as a Dirac delta) to prototype \(k\) is the squared 2-Wasserstein distance:
The variance term acts as a per-prototype learned “spread” — prototypes with smaller variance are more specific and attract nearby points more strongly.
WGLVQ — Wasserstein GLVQ (base variant):
from prosemble.models import WGLVQ
model = WGLVQ(
n_prototypes_per_class=2,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Learned prototype means and variances
print(model.prototype_means.shape) # (n_prototypes, n_features)
print(model.prototype_variances.shape) # (n_prototypes, n_features)
WGMLVQ — Adds a global \(\Omega\) matrix projection:
from prosemble.models import WGMLVQ
model = WGMLVQ(
n_prototypes_per_class=2,
latent_dim=2, # project to 2D
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
print(model.omega_matrix.shape) # (n_features, latent_dim)
print(model.lambda_matrix) # Omega^T @ Omega
WGRLVQ — Adds per-feature relevance weighting:
where \(\lambda_j \geq 0\) and \(\sum_j \lambda_j = 1\).
from prosemble.models import WGRLVQ
model = WGRLVQ(
n_prototypes_per_class=2,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Learned relevance profile
print(model.relevance_profile) # shape: (n_features,), sums to 1
Note
Variances are parameterized internally as \(\log(\sigma^2)\) to ensure
positivity during gradient optimization. Access the actual variances via
model.prototype_variances.
CELVQ — Cross-Entropy¶
Replaces the GLVQ \(\mu\) cost with cross-entropy loss over class probabilities derived from distances. Produces calibrated probabilities.
from prosemble.models import CELVQ
model = CELVQ(
n_prototypes_per_class=2,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test) # calibrated class probabilities
LVQ1 and LVQ2.1 — Classic Non-Gradient¶
Simple competitive learning without explicit gradients. LVQ1 updates only the nearest prototype; LVQ2.1 updates both nearest correct and incorrect.
from prosemble.models import LVQ1, LVQ21
model = LVQ1(
n_prototypes_per_class=2,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
Probabilistic Models (SLVQ, RSLVQ)¶
Treat prototypes as Gaussian components. RSLVQ provides robust log-likelihood training with confidence-based rejection.
from prosemble.models import RSLVQ
model = RSLVQ(
n_prototypes_per_class=2,
sigma=1.0,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
MRSLVQ — Matrix RSLVQ with global Omega metric learning:
from prosemble.models import MRSLVQ
model = MRSLVQ(
n_prototypes_per_class=1,
latent_dim=2,
sigma=1.0,
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
print(model.omega_matrix.shape) # (n_features, latent_dim)
print(model.lambda_matrix) # Omega^T @ Omega
LMRSLVQ — Local Matrix RSLVQ with per-prototype Omega matrices:
from prosemble.models import LMRSLVQ
model = LMRSLVQ(
n_prototypes_per_class=1,
latent_dim=2,
sigma=1.0,
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
Probabilistic LVQ with Neural Gas (RSLVQ-NG Family)¶
The RSLVQ-NG family combines RSLVQ’s Gaussian mixture probabilistic assignment with Neural Gas rank-based neighborhood cooperation. All prototypes participate in the loss, weighted by both their Gaussian probability and NG rank.
RSLVQ_NG — Euclidean distance (base variant):
from prosemble.models import RSLVQ_NG
model = RSLVQ_NG(
n_prototypes_per_class=3,
sigma=1.0,
gamma_init=5.0,
gamma_final=0.01,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
MRSLVQ_NG — Global Omega matrix metric learning:
from prosemble.models import MRSLVQ_NG
model = MRSLVQ_NG(
n_prototypes_per_class=2,
latent_dim=2,
sigma=1.0,
gamma_init=5.0,
gamma_final=0.01,
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
# Learned metric matrices
print(model.omega_matrix.shape) # (n_features, latent_dim)
print(model.lambda_matrix) # Omega^T @ Omega
LMRSLVQ_NG — Per-prototype local Omega matrices:
from prosemble.models import LMRSLVQ_NG
model = LMRSLVQ_NG(
n_prototypes_per_class=2,
latent_dim=2,
sigma=1.0,
gamma_init=5.0,
gamma_final=0.01,
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
Median LVQ¶
Constrains prototypes to be actual data points for maximum interpretability.
from prosemble.models import MedianLVQ
model = MedianLVQ(
n_prototypes_per_class=1,
max_iter=50,
random_seed=42,
)
model.fit(X_train, y_train)
# model.prototypes_ are actual training examples
Deep Variants (LVQMLN, PLVQ)¶
Add a trainable MLP backbone for nonlinear feature extraction. Prototypes live in the latent space (not the input space).
from prosemble.models import LVQMLN
model = LVQMLN(
n_prototypes_per_class=2,
hidden_dims=[64, 32], # MLP architecture
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
Siamese Variants¶
Like LVQMLN, but prototypes are in the input space and pass through the same backbone as input data. This makes prototypes interpretable.
from prosemble.models import SiameseGLVQ, SiameseGMLVQ, SiameseGTLVQ
model = SiameseGLVQ(
n_prototypes_per_class=2,
hidden_dims=[64, 32],
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
# model.prototypes_ is in the original input space
Image LVQ¶
Siamese architecture with a CNN backbone for image classification.
from prosemble.models import ImageGLVQ, ImageGMLVQ, ImageGTLVQ
model = ImageGLVQ(
n_prototypes_per_class=2,
max_iter=50,
lr=0.001,
)
# X_train should be image data: (n_samples, height, width, channels)
model.fit(X_train, y_train)
CBC — Classification-By-Components¶
Uses classless components with learned reasoning matrices for explainable classification.
from prosemble.models import CBC
model = CBC(
n_components=6,
num_classes=3,
max_iter=100,
lr=0.001,
)
model.fit(X_train, y_train)
Supervised Relevance Neural Gas (SRNG)¶
Combines GLVQ loss with Neural Gas neighborhood cooperation. All same-class prototypes are updated per sample, weighted by rank. SRNG also learns per-feature relevance weights (like GRLVQ).
from prosemble.models import SRNG
model = SRNG(
n_prototypes_per_class=3,
lambda_init=5.0, # initial neighborhood range
lambda_final=0.01, # final (narrower)
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Inspect relevances (SRNG learns feature weights like GRLVQ)
print(model.relevances_)
Supervised Matrix Neural Gas (SMNG, SLNG, STNG)¶
The SMNG family extends SRNG with matrix metric adaptation while keeping Neural Gas neighborhood cooperation. All same-class prototypes participate in the loss weighted by rank.
SMNG — Global Omega matrix (like GMLVQ + Neural Gas):
from prosemble.models import SMNG
model = SMNG(
n_prototypes_per_class=3,
gamma_init=5.0,
gamma_final=0.01,
max_iter=200,
lr=0.01,
)
model.fit(X_train, y_train)
print(model.omega_matrix.shape) # (n_features, latent_dim)
print(model.lambda_matrix.shape) # (n_features, n_features)
SLNG — Per-prototype Omega matrices (like LGMLVQ + Neural Gas):
from prosemble.models import SLNG
model = SLNG(
n_prototypes_per_class=3,
gamma_init=5.0,
gamma_final=0.01,
max_iter=200,
lr=0.01,
)
model.fit(X_train, y_train)
print(model.omegas_.shape) # (n_protos, n_features, latent_dim)
STNG — Per-prototype tangent subspaces (like GTLVQ + Neural Gas):
from prosemble.models import STNG
model = STNG(
n_prototypes_per_class=3,
subspace_dim=2,
gamma_init=5.0,
gamma_final=0.01,
max_iter=200,
lr=0.01,
)
model.fit(X_train, y_train)
print(model.omegas_.shape) # (n_protos, n_features, subspace_dim)
Cross-Entropy Neural Gas (CELVQ-NG Family)¶
The CELVQ-NG family combines cross-entropy loss over all-class softmax logits with Neural Gas rank-based neighborhood cooperation. Unlike SRNG (which uses pairwise GLVQ \(\mu\) cost), CELVQ-NG considers all classes simultaneously via softmax, providing better calibrated probabilities and gradient flow to all prototypes.
Neural Gas cooperation replaces the hard per-class min pooling in CELVQ
with NG-weighted soft pooling: for each class, prototypes are ranked by
distance and weighted by \(h_k = \exp(-\text{rank} / \gamma)\). When
\(\gamma \to 0\), CELVQ-NG recovers standard CELVQ.
CELVQ_NG — Euclidean distance (base variant):
from prosemble.models import CELVQ_NG
model = CELVQ_NG(
n_prototypes_per_class=3,
gamma_init=5.0, # initial neighborhood range
gamma_final=0.01, # final (narrower)
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
proba = model.predict_proba(X_test) # calibrated probabilities
MCELVQ_NG — Global Omega matrix metric learning:
from prosemble.models import MCELVQ_NG
model = MCELVQ_NG(
n_prototypes_per_class=3,
latent_dim=2, # project to 2D
gamma_init=5.0,
gamma_final=0.01,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Learned metric matrices
print(model.omega_matrix.shape) # (n_features, latent_dim)
print(model.lambda_matrix) # Omega^T @ Omega — feature importance
LCELVQ_NG — Per-prototype local Omega matrices:
from prosemble.models import LCELVQ_NG
model = LCELVQ_NG(
n_prototypes_per_class=2,
latent_dim=2,
gamma_init=5.0,
gamma_final=0.01,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Each prototype has its own Omega_k
print(model.omegas_.shape) # (n_prototypes, n_features, latent_dim)
TCELVQ_NG — Tangent subspace distance:
from prosemble.models import TCELVQ_NG
model = TCELVQ_NG(
n_prototypes_per_class=2,
subspace_dim=1, # 1D invariance subspace
gamma_init=5.0,
gamma_final=0.01,
max_iter=100,
lr=0.01,
)
model.fit(X_train, y_train)
# Learned orthogonal tangent bases
print(model.omegas_.shape) # (n_prototypes, n_features, subspace_dim)
The tangent variant measures distance orthogonal to learned invariance subspaces: \(d(x, w_k) = \|(I - \Omega_k \Omega_k^T)(x - w_k)\|^2\). Best suited for high-dimensional data with invariance structure (images, spectra, signals).
Model |
Distance Metric |
Learnable Parameters |
Best For |
|---|---|---|---|
CELVQ_NG |
Euclidean |
Prototypes only |
General-purpose, fast training |
MCELVQ_NG |
\(\|\Omega(x-w)\|^2\) |
Global \(\Omega\) matrix |
Feature selection, dimensionality reduction |
LCELVQ_NG |
\(\|\Omega_k(x-w_k)\|^2\) |
Per-prototype \(\Omega_k\) |
Heterogeneous feature spaces |
TCELVQ_NG |
\(\|(I-\Omega_k\Omega_k^T)(x-w_k)\|^2\) |
Tangent bases \(\Omega_k\) |
High-dimensional data with invariances |
Common Patterns¶
Resume training:
model.fit(X_train, y_train, max_iter=50)
model.fit(X_train, y_train, resume=True, max_iter=50) # continue from last state
Fitted attributes (available after fit):
model.prototypes_— prototype positionsmodel.prototype_labels_— class labels per prototypemodel.n_iter_— number of iterations runmodel.loss_— final loss valuemodel.loss_history_— loss per iteration
All models support:
predict(X)— hard labelspredict_proba(X)— soft class probabilitiessave(path)/Model.load(path)— persistence