Skip to content

feat: manifold diagnostic toolkit — dimension selection, CV reconstruction/decoding, Procrustes alignment, within-session stability#5

Open
Copilot wants to merge 9 commits into
mainfrom
copilot/add-manifold-diagnostic-toolkit
Open

feat: manifold diagnostic toolkit — dimension selection, CV reconstruction/decoding, Procrustes alignment, within-session stability#5
Copilot wants to merge 9 commits into
mainfrom
copilot/add-manifold-diagnostic-toolkit

Conversation

Copilot AI commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Adds a comprehensive set of best-practice diagnostics for neural latent spaces, integrated into the existing @NeuralEmbedding class and +-package structure. All methods accept a scalar or vector of NeuralEmbedding objects. The toolkit is self-sufficient: all diagnostic results are stored inside the object, aligned subspaces are persisted with a flag to toggle them, and all long-running loops print [Animal.Session] progress to the console.

New class methods (@NeuralEmbedding/)

  • selectDimension(dims, pars) — shuffle-based dimension selection (parallel analysis): independently permutes each neuron's time series to build a null eigenspectrum using the same projection pipeline as findEmbedding; returns dStar = largest component where real eigenvalue exceeds the (1−α) null quantile. Result auto-saved to M_.
  • crossValReconstruct(dim, pars) — k-fold CV PCA reconstruction; z-score and PCA fit are confined to training folds (no leakage); reports Pearson r and R² on held-out bins. Result auto-saved to M_.
  • crossValDecode(y, dim, pars) — k-fold CV nearest-centroid decoding + permutation test (global / blocked / circular-shift null); returns accuracy, balanced accuracy, p-value, and effect-size z-score. Result auto-saved to M_.
  • alignSessions(pars) — orthogonal Procrustes alignment of all sessions to a reference, applied independently for every area (including "AllNeurons"). Stores the rotation-transformed embeddings and projection matrices back in each object (E_aligned_, W_aligned_). Per-session alignment metrics saved to each object's M_ as 'SessionAlignment'.
  • crossValAlignment(pars) — within-session latent-space stability: randomly splits trials into two equal halves, fits orthogonal Procrustes alignment between the halves, and repeats nSplit times. Returns a disparity distribution (mean, median, std), principal angles, and distance correlation. Stored in M_ as 'IntraAlignment'. Reuses existing align_procrustes and alignment_metrics compute functions.
  • labelsFromEvents(obj, eventNames) — builds a per-time-bin categorical label vector from stored behavioral events, respecting the current cMask. Each bin takes the name of the most recent event from eventNames that has occurred up to that bin; pre-event bins are labelled '0'. Output aligns with obj.S / obj.E and is compatible directly with crossValDecode.
  • i_storeM(data, type) (private) — stores a diagnostic result struct in M_ using the same replace-or-append logic as computeMetrics; re-running any diagnostic with the same condition/area mask replaces the previous entry.
% Single session
res = NE.selectDimension(1:15);              % dStar; saved to NE.M
res = NE.crossValReconstruct(res.dStar);     % CV Pearson r; saved to NE.M

% Build per-time-bin labels from stored events
y = NE.labelsFromEvents({'Cue','Go'});       % categorical T x 1
res = NE.crossValDecode(y, res.dStar);       % acc, p, z; saved to NE.M

NE.M   % table with ParallelAnalysis, CVReconstruction, CVDecoding entries

% Within-session stability
resIA = NE.crossValAlignment();              % saved as 'IntraAlignment'
fprintf('Intra-session disparity = %.4f ± %.4f\n', resIA.disparityMean, resIA.disparityStd);

% Multi-session — pass object array, get struct array back
NEobjs = [NE1, NE2, NE3];
resA = NEobjs.selectDimension(1:15);         % numel == 3; each result saved to respective M_
resD = NEobjs.alignSessions();               % Procrustes per session & area ('SessionAlignment' in M_)

% Toggle aligned subspace (per object)
NE2.useAlignment = true;   % get.E and get.W now return the aligned subspace
NE2.useAlignment = false;  % restore original

Self-sufficient aligned subspace storage

alignSessions writes the rotated embeddings directly back into each object:

Property Type Description
E_aligned_ private, Transient Rotation-transformed per-trial embeddings (all areas)
W_aligned_ private Rotation-transformed projection matrices (all areas)
useAlignment public logical, default false When true, get.E / get.W return the aligned subspace

The flag is independent per object and silently falls back to the original embedding if E_aligned_ is empty. Alignment is performed for all areas simultaneously.

Diagnostic results stored in M_

All diagnostic methods auto-save their output to the object's M_ property via the private helper i_storeM, using the same replace-or-append pattern as computeMetrics:

Method M_ type field
selectDimension 'ParallelAnalysis'
crossValReconstruct 'CVReconstruction'
crossValDecode 'CVDecoding'
alignSessions (per session) 'SessionAlignment'
crossValAlignment 'IntraAlignment'

Note: The cross-session alignment type is 'SessionAlignment' (not 'Alignment') to avoid confusion with the pre-existing alignment metric from computeMetrics.

Embedding-aware parallel analysis (selectDimension)

The null eigenspectrum in selectDimension is now built using the same projection pipeline as findEmbedding, ensuring apples-to-apples comparison:

  • PCA / SmoothPCA: covariance PCA (centre only, no extra z-score), matching embedding.PCA.reduce / MATLAB's pca() exactly. obj.S already applies any z-scoring configured on the object, so double-standardising is avoided.
  • GPFA / CCA / other: a warning is issued and covariance PCA on obj.S is used as a useful approximation. crossValReconstruct is recommended for rigorous dimension selection with non-PCA embeddings.

dim_parallel_analysis gains an optional projFcn argument. When provided it skips internal z-scoring (treating input as already preprocessed); when absent the legacy z-score + SVD behaviour is preserved for standalone/toolbox-free use. The result struct now includes embeddingMethod for reproducibility.

Progress output

All long-running loops print [Animal.Session] progress with a dynamic counter:

Parallel analysis [Rat1.S01]: shuffle 200/200 done.
Permutation test [Rat1.S01]: perm 500/500 done.
AlignSessions [area=AllNeurons]: Rat1.S02 → Rat1.S01
IntraAlignment [Rat1.S01]: split 100/100 done.

Suppress with pars.verbose = false (available on all pars structs).

New +diagnostics/ package

Mirrors the existing +metrics/ layout (+compute/, +pars/, +shufflers/):

Subpackage Files
+compute/ dim_parallel_analysis, cv_reconstruction, cv_decoding, permutation_test, shuffle_neuronwise, circular_shift, align_procrustes, alignment_metrics, intra_alignment
+pars/ ParallelAnalysis, CVReconstruction, CVDecoding, ProcrustesAlignment, IntraAlignment
+shufflers/ global_permute, blocked_permute, circular_shift

All implementations are toolbox-free (base MATLAB SVD/linear algebra only). PCA fit uses economy SVD directly to avoid a redundant covariance-matrix step.

Other

  • docs/manifold_diagnostics.md — updated with within-session stability (E), event-based labels (F), progress output note, SessionAlignment renaming, embedding-aware parallel analysis, and M_ result table.
  • examples/demo_manifold_diagnostics.m — end-to-end demo covering all diagnostics, useAlignment, labelsFromEvents, crossValAlignment, and M_ inspection.
  • tests/smoke_test_diagnostics.m — 10 smoke tests: dStar near true dim, reconstruction improves with dim, decoding significance with real vs. random labels, alignment bounded disparity, SessionAlignment type in M_, useAlignment flag, multi-session dispatch, diagnostics in M_, crossValAlignment finite disparity + stored in M_, labelsFromEvents correct length and bin counts.
  • README.md updated with diagnostics section and quick-start.
Original prompt

Implement manifold diagnostic toolkit in MATLAB and integrate it into the existing folder/object structure of barbaLab/NeuralEmbedding, with support for multi-session workflows where each session is represented by one NeuralEmbedding object and multi-session operations accept a vector/array of such objects.

High-level goals

Add robust best-practice diagnostics for reconstructed neural manifolds / latent spaces, focusing on:

  1. Shuffle-based null models (parallel analysis, label shuffles, blocked shuffles, circular time-shifts)
  2. Permutation testing for decoding and other summary statistics
  3. Cross-validated (out-of-sample) reconstruction and decoding
  4. Making different projections comparable across sessions via alignment (orthogonal Procrustes) and reporting quantitative alignment metrics

All functionality must be integrated into the repo’s existing folder and class structure (do not create an arbitrary new top-level layout if the repo already has conventions). Create changes on a feature branch and open a PR.

Repository

  • Repo: barbaLab/NeuralEmbedding

Key constraint (object structure)

  • Each recording/session is encoded as a single NeuralEmbedding object.
  • Any operation working on multiple sessions must accept a vector/array of NeuralEmbedding objects as input (e.g., objs(1:nSessions)), rather than requiring the user to concatenate sessions manually.
  • The API should support both:
    • single-session: obj.methodName(...)
    • multi-session: methodName(objs, ...) or objs.methodName(...) depending on MATLAB class design; follow existing style in the repo.

What to implement (functional requirements)

A) Dimension selection via shuffle null (parallel analysis)

  • Provide a method/function that:
    • takes neural data from a NeuralEmbedding session (and optionally a vector of objects)
    • performs neuron-wise shuffle null (shuffle rows independently per neuron/feature)
    • computes eigen-spectrum / PCA spectrum for real and null
    • selects dimension d* as largest k where real eigenvalue exceeds the (1-alpha) quantile of null eigenvalues
  • Return a results struct including eigenvalues, null distribution summary, selected dimension, alpha, nShuffle, RNG seed.

B) Cross-validated reconstruction from latent embedding

  • Implement k-fold CV where fitting (embedding) happens on training fold only, then reconstruct held-out activity.
  • Score reconstruction on held-out data using at least Pearson correlation between X_test(:) and Xhat_test(:); optionally add R2.
  • Ensure no data leakage: any z-scoring parameters must be fit on train folds and applied to test.

C) Cross-validated decoding + permutation test

  • Implement decoding from latent space to labels/behavior variable:
    • classification baseline: nearest-centroid classifier in latent space (toolbox-free)
    • compute CV accuracy (and optionally balanced accuracy)
  • Permutation test:
    • compute null distribution by re-running full CV pipeline under permuted labels
    • p-value: p = (1 + sum(null >= real)) / (nPerm + 1)
    • report effect size z-score (real - mean(null))/std(null)

D) Null models / shuffles (must include)

Implement and expose the following shuffles/permutations:

  1. Global label permutation
  2. Blocked label permutation within session blocks (where blocks may correspond to trials, conditions, or user-provided grouping)
  3. Circular time-shift null for time series:
    • shift each neuron independently by a random offset (preserve autocorrelation)

These should be usable in the decoding permutation test and also available as utilities.

E) Multi-session comparability: alignment across sessions

  • Implement orthogonal Procrustes alignment of latent spaces from two sessions.
  • Provide quantitative alignment metrics:
    • Procrustes disparity
    • principal angles / subspace angle
    • optionally correlation of pairwise distances after alignment
  • Support multi-session inputs:
    • align session i to a reference session (e.g., session 1)
    • return per-session alignment transforms and metrics
  • If there is an existing alignment mechanism in the repo, integrate with/extend it rather than duplicating.

F) End-to-end demo / example

  • Add an example script (or live script if the repo uses them) demonstrating:
    • running dimension selection
    • running reconstruction CV
    • running decoding permutation tests with both label shuffle and time-shift null
    • aligning two sessions (or more) and reporting alignment metrics
  • Example should work with either provided sample data in repo or synthetic data generated in-script if no sample data is available.

G) Documentation

  • Add docs describing:
    • what each diagnostic tests
    • what each null preserves/breaks
    • recommended defaults (kfold, nShuffle, nPerm, alpha)
    • interpretation of p-values/effect sizes
    • multi-session usage patterns with object arrays
  • Update README (or existing docs index) to link to the new diagnostics documentatio...

This pull request was created from Copilot chat.

Copilot AI and others added 3 commits April 27, 2026 17:25
…lReconstruct, crossValDecode, alignSessions)

Agent-Logs-Url: https://github.com/barbaLab/NeuralEmbedding/sessions/ae72e613-51c6-4045-9f89-d02f135f5a14

Co-authored-by: Nabarb <23075957+Nabarb@users.noreply.github.com>
Copilot AI changed the title [WIP] Add manifold diagnostic toolkit for reconstructed neural manifolds feat: manifold diagnostic toolkit — dimension selection, CV reconstruction/decoding, Procrustes alignment Apr 27, 2026
Copilot AI requested a review from Nabarb April 27, 2026 17:30

@Nabarb Nabarb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed changes are well executed but miss the spirit of the toolbox. It should store all the results within itself so that it can be self sufficient, therefore:

  • Once aligned, subspaces should be stored within the object. Create a flag to activate or deactivate the alignment, so that if the flag is set to one the aligned subspace is set as default otherwise the original unaligned is the one returned. This means that there should be a hidden W_aligned projection matrix stored somewhere. If this is empty, of course it should be unable to switch that flag. When the flag is switched both E and W should return the aligned manifold and projection matrix. Also, this should be performed for all areas. if active and uniform across objs.
  • The output of the diagnostics should probably be saved either to M or to a separate table within the object (you can choose what makes more sense to you). The logic should be the same as the one in M, so updated computations should replace the older ones on the same metric.

Copilot AI commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

The proposed changes are well executed but miss the spirit of the toolbox. It should store all the results within itself so that it can be self sufficient, therefore:

  • Once aligned, subspaces should ...

Done in commit 8bb3dee. Here's what was added:

Aligned-subspace storage (E_aligned_, W_aligned_, useAlignment):

  • alignSessions now loops over all areas (including "AllNeurons"), applies the orthogonal Procrustes rotation to every trial's embedding and to the projection matrix W, and writes the results into the private properties E_aligned_ (Transient) and W_aligned_ on each object.
  • New public flag useAlignment (default false): when set to true, get.E and get.W transparently return the aligned subspace. If no aligned data exists yet it silently falls back to the original. Flag is per-object, so you can activate it for a subset of sessions:
    res = alignSessions([NE1, NE2, NE3]);
    NE2.useAlignment = true;   % get.E / get.W now return aligned data for session 2
    NE2.useAlignment = false;  % restore original

Diagnostic results stored in M_ (same logic as computeMetrics):

  • New private helper i_storeM(obj, data, type) reuses the replace-or-append pattern from computeMetrics (checks M_ directly, coerces to string for reliable == comparison).
  • selectDimension → type 'ParallelAnalysis'; crossValReconstruct'CVReconstruction'; crossValDecode'CVDecoding'; alignSessions → per-session 'Alignment'. Re-running with the same condition/area mask replaces the previous entry.
    NE.selectDimension(1:15);
    NE.crossValReconstruct(5);
    NE.M   % table with ParallelAnalysis and CVReconstruction entries

Copilot AI requested a review from Nabarb April 28, 2026 09:11
@Nabarb Nabarb marked this pull request as ready for review April 28, 2026 12:45

@Nabarb Nabarb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes. What is missing now is some sort of progress indication given permutations can take a long time. Also a combination of animal and session should be displayed when showing progress.

There should also be a way to create the timebins labels used in crossValReconstruction from the 'events' in the events field.
Also aligment should not be used as a name in M, since it can cause confusion with the disaligment metric. Use something like sessionAligment.

Also add a new metric in spirit similar to what we already have in alignment that implements procrustes alignment evaluation within session, by randomly splitting the trials in two thus creating two dummy sessions and repeating this random assignment multiple times. If possible try to use some of your already built shuffling functions.

Copilot AI commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Great changes. What is missing now is some sort of progress indication given permutations can take a long time. Also a combination of animal and session should be displayed when showing progress.

The...

Done in commit d0c1af1. Here's a summary of what was added:

Progress indication (pars.verbose = true by default, false to suppress):
All long-running loops now print [Animal.Session] progress using a dynamic backspace counter so the width stays correct as numbers grow:

Parallel analysis [SmokeTest.Test]: shuffle 200/200 done.
Permutation test [SmokeTest.Test]: perm 500/500 done.
AlignSessions [area=AllNeurons]: SmokeTest.Test2 → SmokeTest.Test
IntraAlignment [SmokeTest.Test]: split 100/100 done.

labelsFromEvents(obj, eventNames) – builds a per-time-bin categorical label vector from stored events, respecting the current cMask. Bins before the first event are labelled '0'; each subsequent bin takes the name of the most recent requested event. Compatible directly with crossValDecode:

NE.addEvents(evts);
y = NE.labelsFromEvents({'Cue','Go'});   % T x 1 categorical
res = NE.crossValDecode(y, dStar);

'SessionAlignment' M_ type – renamed from 'Alignment' to avoid confusion with the existing computeMetrics alignment metric.

crossValAlignment(pars) – within-session stability metric: randomly splits trials into two equal halves, fits orthogonal Procrustes alignment, and repeats nSplit times. Returns a disparity distribution that can be compared against the cross-session disparity from alignSessions. Stored as 'IntraAlignment' in M_. Reuses the existing align_procrustes and alignment_metrics compute functions:

resIA = NE.crossValAlignment();
fprintf('Intra-session disparity = %.4f ± %.4f\n', resIA.disparityMean, resIA.disparityStd);

Copilot AI changed the title feat: manifold diagnostic toolkit — dimension selection, CV reconstruction/decoding, Procrustes alignment feat: manifold diagnostic toolkit — dimension selection, CV reconstruction/decoding, Procrustes alignment, within-session stability Apr 28, 2026
Copilot AI requested a review from Nabarb April 28, 2026 14:29

@Nabarb Nabarb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dim_parallel_analysis it would make more sense to project data using whatever method was used to get the embedding, with the same parameters and using the project utilities already built in the tool. This should make comparison more reasonable: if the data was manipulated in some ways to better identify the subspace your approach would yield confusing results.

Copilot AI commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

In dim_parallel_analysis it would make more sense to project data using whatever method was used to get the embedding, with the same parameters and using the project utilities already built in the too...

Done in commit 4755809.

selectDimension now injects a projFcn (projection function) into dim_parallel_analysis that matches the object's embedding pipeline exactly:

  • PCA / SmoothPCA: covariance PCA (centre-only, no extra z-score), matching embedding.PCA.reduce / MATLAB's pca(). obj.S already applies any z-scoring configured on the object, so double-standardising is avoided.
  • GPFA / CCA / other: a warning is issued and covariance PCA on obj.S is used as a useful approximation. For rigorous dimension selection with non-PCA methods, crossValReconstruct is recommended.

dim_parallel_analysis gains an optional 5th projFcn argument. When provided it skips its internal z-score step (data is treated as already preprocessed); when absent the old z-score+SVD behaviour is preserved for standalone use.

results.embeddingMethod is also included in the output for reproducibility.

Copilot AI requested a review from Nabarb April 29, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants