SNN N! Ordering SpaceSNN N! 排序空間

Proof-of-concept framework for exploiting the factorial representational space of spike temporal ordering in Spiking Neural Networks. This document records all findings, known issues, and open questions as of checkpoint v2.

利用脈衝神經網路中尖峰時間排序所構成的階乘表示空間之概念驗證框架。本文件記錄截至 checkpoint v2 的所有發現、已知問題與待解問題。

June 5, 2026 · 14:372026年6月5日 · 14:37

§1 Symbol Table符號表

MNumber of neurons神經元數量ℤ₊

Determines the size of the ordering space: $M$ neurons produce $M!$ distinct spike orderings. Tested range: $M \in \{4,6,8,10,12\}$.

決定排序空間大小：$M$ 個神經元產生 $M!$ 種不同尖峰排序。測試範圍：$M \in \{4,6,8,10,12\}$。

TNumber of timesteps時間步數ℤ₊

Simulation length. Must satisfy $T \gg M$ so all neurons can fire at distinct timesteps. Set as $T = \max(16,\, 2M)$ in sweep experiments.

模擬長度。須滿足 $T \gg M$，使所有神經元能在不同時間步發放。掃描實驗中設為 $T = \max(16,\, 2M)$。

CNumber of classes類別數ℤ₊

Output categories. The ratio $M!/C$ governs available ordering space per class. Capacity degrades when $M!/C \lesssim 8$; system is over capacity when $M!/C < 1$.

輸出類別數。比值 $M!/C$ 決定每類可用的排序空間。當 $M!/C \lesssim 8$ 時容量下降；$M!/C < 1$ 時系統超載。

$\beta$Membrane decay rate膜電位衰減率(0,1)

LIF leakage coefficient. Higher $\beta$ means slower decay and longer integration window. LIF₁: $\beta_1=0.9$; LIF₂: $\beta_2=0.95$. Controls integration speed: steady-state membrane potential is $\text{mem}_\infty = \text{cur}/(1-\beta)$.

LIF 漏電係數。$\beta$ 越大衰減越慢、積分窗口越長。LIF₁：$\beta_1=0.9$；LIF₂：$\beta_2=0.95$。穩態膜電位為 $\text{mem}_\infty = \text{cur}/(1-\beta)$。

$\theta$Firing threshold發放閾值ℝ₊

Membrane potential value triggering a spike. Default $\theta=1.0$ (snntorch). A neuron will eventually spike iff $\text{cur} > (1-\beta)\theta$.

觸發尖峰的膜電位閾值。預設 $\theta=1.0$（snntorch）。神經元最終會發放當且僅當 $\text{cur} > (1-\beta)\theta$。

baseBase current基準電流ℝ₊

Input current for the highest-rank neuron. Must satisfy $\text{base} < \theta$ to force membrane integration (rather than immediate spiking at $t=0$). Default: $0.40$.

最高排名神經元的輸入電流。須滿足 $\text{base} < \theta$，以強制膜電位積分（而非在 $t=0$ 立即發放）。預設：$0.40$。

$\Delta$Current step (delta)電流步長（delta）ℝ₊

Current decrement per rank: neuron at rank $k$ receives current $\text{base} - k\Delta$. Controls first-spike time separation between adjacent neurons. Must satisfy $\Delta < (\text{base} - (1-\beta)\theta)/(M-1)$ to guarantee all neurons fire within $T$.

每級電流遞減：排名 $k$ 的神經元接收 $\text{base} - k\Delta$。控制相鄰神經元首次尖峰時間間隔。須滿足 $\Delta < (\text{base} - (1-\beta)\theta)/(M-1)$ 以保證所有神經元在 $T$ 內發放。

$\sigma$Input noise輸入雜訊ℝ₊

Standard deviation of Gaussian noise added to each sample's current vector. Provides within-class sample diversity. Must remain small relative to $\Delta$ to preserve ordering: $\sigma \ll \Delta$.

加於每個樣本電流向量的高斯雜訊標準差。提供類內樣本多樣性。相對 $\Delta$ 須足夠小以維持排序：$\sigma \ll \Delta$。

$w$Temporal decay rate時間衰減率ℝ₊

Exponential weighting of spike times in soft first-spike score. Larger $w$ emphasises earlier spikes more strongly. Default $w=0.5$.

軟首次尖峰分數中對尖峰時間的指數加權。$w$ 越大越強調較早的尖峰。預設 $w=0.5$。

$\tau$Soft rank temperature軟排序溫度ℝ₊

Sigmoid sharpness in soft rank formula. $\tau\to0$ recovers exact rank (zero gradient); $\tau\to\infty$ gives uniform output. Default $\tau=0.01$ for OPS invariance.

軟排序公式中 sigmoid 的銳度。$\tau\to0$ 逼近精確排序（梯度為零）；$\tau\to\infty$ 輸出均勻。預設 $\tau=0.01$ 以達 OPS 不變性。

$K$Monte Carlo samples蒙地卡羅樣本數ℤ₊

Permutations sampled via Gumbel-max when $M>6$. Default $K=512$. Readout dimension equals $K$ in MC mode.

當 $M>6$ 時以 Gumbel-max 抽樣的排列數。預設 $K=512$。MC 模式下讀出維度等於 $K$。

$M_{i,t}$First-spike mask首次尖峰遮罩{0,1}

$M_{i,t}=1$ iff neuron $i$ fires at $t$ and has not fired before. Isolates the first spike for differentiable scoring.

$M_{i,t}=1$ 當且僅當神經元 $i$ 在 $t$ 發放且此前未發放。隔離首次尖峰以供可微分計分。

$a_i$Soft first-spike score軟首次尖峰分數ℝ₊

Differentiable approximation of $e^{-w t_i^*}$. Larger = earlier first spike. Gradient $\partial a_i/\partial S_{i,t} = e^{-wt}\neq 0$.

$e^{-w t_i^*}$ 的可微近似。越大表示首次尖峰越早。梯度 $\partial a_i/\partial S_{i,t} = e^{-wt}\neq 0$。

$\hat{a}_i$Soft rank score軟排序分數[0,1]

Rank-normalised version of $a_i$. Contains only ordinal information — magnitude is discarded. Invariant to order-preserving rescaling of spike times (OPS-invariant when $\tau$ is small).

$a_i$ 的排序正規化版本。僅含序數資訊——幅度被捨棄。對保序尖峰時間縮放不變（$\tau$ 小時具 OPS 不變性）。

$\mathbf{p}$Plackett-Luce distributionPlackett-Luce 分布$\Delta^{K-1}$

Probability distribution over $M!$ permutations parameterised by $\hat{\mathbf{a}}$. Entry $p_\pi = P(\pi|\hat{\mathbf{a}})$. Approximated by $K$ Gumbel-max samples for $M>6$.

以 $\hat{\mathbf{a}}$ 參數化的 $M!$ 排列機率分布。$p_\pi = P(\pi|\hat{\mathbf{a}})$。$M>6$ 時以 $K$ 個 Gumbel-max 樣本近似。

fst-stdFirst-spike time std首次尖峰時間標準差ℝ₊

Average std of per-neuron first-spike times across classes. Higher = greater ordering separation between classes.

各類別間每神經元首次尖峰時間的平均標準差。越高表示類間排序分離越大。

E-dropTime-shuffle accuracy drop時間打亂準確率下降[0,1]

$\text{acc(normal)} - \text{acc(time-shuffled)}$. High E-drop confirms temporal ordering is the primary information source.

$\text{acc(normal)} - \text{acc(time-shuffled)}$。高 E-drop 確認時間排序是主要資訊來源。

OPS-dropOrder-preserving shuffle drop保序打亂下降[0,1]

$\text{acc(normal)} - \text{acc(OPS)}$. OPS re-samples spike times while preserving $\pi$. Low OPS-drop means the model uses ordering, not absolute timing values.

$\text{acc(normal)} - \text{acc(OPS)}$。OPS 在保留 $\pi$ 下重新抽樣尖峰時間。低 OPS-drop 表示模型使用排序而非絕對時間值。

§2 Parameter Guide參數指南

How base, delta, beta, theta, and sigma interact, and constraints for correct operation.

base、delta、beta、theta 與 sigma 的交互作用，以及正確運作的約束條件。

Guarantee all neurons fire within $T$保證所有神經元在 $T$ 內發放

A neuron fires eventually iff its steady-state membrane potential exceeds $\theta$:

神經元最終會發放當且僅當其穩態膜電位超過 $\theta$：

$$\frac{\text{cur}}{1-\beta} > \theta \;\Longrightarrow\; \text{cur} > (1-\beta)\theta$$

For the weakest neuron (rank $M-1$, current $= \text{base}-(M-1)\Delta$):

最弱神經元（排名 $M-1$，電流 $= \text{base}-(M-1)\Delta$）：

$$\Delta < \frac{\text{base} - (1-\beta)\theta}{M-1}$$

Avoid immediate spiking at $t=0$避免在 $t=0$ 立即發放

If $\text{cur} \geq \theta$, LIF₁ fires on the very first timestep before any integration occurs, eliminating timing information. Requirement:

若 $\text{cur} \geq \theta$，LIF₁ 在第一步就發放，無法積分，時間資訊消失。要求：

$$\text{base} < \theta = 1.0$$

Preserve ordering under noise在雜訊下保持排序

Noise $\sigma$ must be small enough that adjacent neurons rarely swap rank:

雜訊 $\sigma$ 須足夠小，使相鄰神經元很少交換排名：

$$\sigma \ll \Delta \quad\text{(rule of thumb: } \sigma \leq \Delta/3\text{)}$$

Current configuration目前設定

theta 1.0 ← snntorch default, fixed← snntorch 預設，固定
beta1 0.9 ← LIF₁ decay rate← LIF₁ 衰減率
beta2 0.95 ← LIF₂ decay rate (smoother)← LIF₂ 衰減率（更平滑）
base 0.40 ← < theta, forces integration← < theta，強制積分
delta 0.05 ← verified: M=4 gives {0:5,1:6,2:8,3:10}← 已驗證：M=4 得 {0:5,1:6,2:8,3:10}
sigma 0.02 ← sigma/delta = 0.4, ordering stable← sigma/delta = 0.4，排序穩定
w 0.5 ← temporal decay in soft score← 軟分數中的時間衰減
tau 0.01 ← near-exact rank, OPS-drop=0.017← 近精確排序，OPS-drop=0.017
K 512 ← MC samples for M>6← M>6 的 MC 樣本
T max(16,2M)← ensures all neurons fire← 確保所有神經元發放

Known issue with M=8, delta=0.05已知問題：M=8、delta=0.05

With delta=0.05 and M=8, neurons 6 and 7 (currents 0.10 and 0.05) do not fire within T=50. The minimum viable current is $(1-0.9)\times1.0 = 0.10$, so neuron 6 is borderline and neuron 7 never fires. See §9 Known Issues for discussion.

delta=0.05、M=8 時，神經元 6 和 7（電流 0.10 與 0.05）在 T=50 內不發放。最低可行電流為 $(1-0.9)\times1.0 = 0.10$，神經元 6 處於臨界，神經元 7 永不發放。詳見 §10 已知問題。

§3 Formula Sheet公式表

LIF DynamicsLIF 動力學Membrane update & spike膜電位更新與尖峰

$$\text{mem}_i[t] = \beta\cdot\text{mem}_i[t-1] + \text{cur}_i$$ $$S_{i,t} = \mathbf{1}[\text{mem}_i[t]>\theta]$$

Surrogate gradient approximates $\partial S/\partial\text{mem}$. Steady-state: $\text{mem}_\infty = \text{cur}/(1-\beta)$.

代理梯度近似 $\partial S/\partial\text{mem}$。穩態：$\text{mem}_\infty = \text{cur}/(1-\beta)$。

First-Spike Mask首次尖峰遮罩Isolate first event隔離首次事件

$$M_{i,t} = S_{i,t}\cdot\prod_{s=0}^{t-1}(1-S_{i,s})$$

$M_{i,t}=1$ iff neuron $i$ fires at $t$ and nowhere before. Gradient: $\partial M_{i,t}/\partial S_{i,s}\neq0$.

$M_{i,t}=1$ 當且僅當神經元 $i$ 在 $t$ 發放且此前未發放。梯度：$\partial M_{i,t}/\partial S_{i,s}\neq0$。

Soft First-Spike Score軟首次尖峰分數Differentiable timing可微分時間

$$a_i = \sum_{t=0}^{T-1} M_{i,t}\cdot e^{-w\cdot t}$$

Approximates $e^{-w t_i^*}$. Gradient: $\partial a_i/\partial S_{i,t} = e^{-wt}\cdot\prod_{s<t}(1-S_{i,s})\neq0$.

近似 $e^{-w t_i^*}$。梯度：$\partial a_i/\partial S_{i,t} = e^{-wt}\cdot\prod_{s<t}(1-S_{i,s})\neq0$。

Soft Rank軟排序Discard magnitude捨棄幅度

$$\hat{a}_i = \frac{1}{M-1}\sum_{j\neq i}\sigma\!\left(\frac{a_i-a_j}{\tau}\right)$$

$\hat{a}_i\in[0,1]$. Invariant to order-preserving transforms. As $\tau\to0$, $\hat{a}_i\to\text{rank}(a_i)/(M-1)$. Tied neurons: $\hat{a}_i=\hat{a}_j=0.5$ (unresolvable).

$\hat{a}_i\in[0,1]$。對保序變換不變。$\tau\to0$ 時 $\hat{a}_i\to\text{rank}(a_i)/(M-1)$。並列神經元：$\hat{a}_i=\hat{a}_j=0.5$（無法分辨）。

Plackett-LucePlackett-LuceDistribution over $S_M$$S_M$ 上的分布

$$\log P(\pi|\hat{\mathbf{a}}) = \sum_{k=1}^{M}\!\left(\hat{a}_{\pi_k} - \log\sum_{j=k}^{M}e^{\hat{a}_{\pi_j}}\right)$$

Computed in log-space. Final $\mathbf{p}=\text{softmax}([\log P(\pi_i|\hat{\mathbf{a}})])$ over all $M!$ permutations (or $K$ MC samples).

在 log 空間計算。最終 $\mathbf{p}=\text{softmax}([\log P(\pi_i|\hat{\mathbf{a}})])$，遍歷所有 $M!$ 排列（或 $K$ 個 MC 樣本）。

Gumbel-Max SamplingGumbel-Max 抽樣Monte Carlo PL蒙地卡羅 PL

$$\pi^{(k)}=\text{argsort}(-(\hat{\mathbf{a}}+\mathbf{g}^{(k)})),\quad\mathbf{g}^{(k)}\sim\text{Gumbel}(0,1)^M$$

argsort inside torch.no_grad(). Gradients flow through $\log P(\pi^{(k)}|\hat{\mathbf{a}})$ only.

argsort 在 torch.no_grad() 內。梯度僅流經 $\log P(\pi^{(k)}|\hat{\mathbf{a}})$。

Orthogonal Layer正交層Cayley mapCayley 映射

$$\mathbf{W}=(I+\mathbf{S})^{-1}(I-\mathbf{S}),\quad\mathbf{S}=-\mathbf{S}^\top$$ $$\mathbf{W}^\top\mathbf{W}=I\;\Rightarrow\;\|\mathbf{W}\mathbf{x}\|=\|\mathbf{x}\|$$

Learned parameter is upper-triangle of $\mathbf{S}$ (unconstrained). Output is pure rotation — no scaling. Constraint: input and output must have same dimension $d\times d$.

學習參數為 $\mathbf{S}$ 的上三角（無約束）。輸出為純旋轉——無縮放。約束：輸入輸出須同維 $d\times d$。

E-drop & OPS-dropE-drop 與 OPS-dropDiagnostic metrics診斷指標

$$\text{E-drop} = \text{acc}_\text{normal} - \text{acc}_\text{time-shuffled}$$ $$\text{OPS-drop} = \text{acc}_\text{normal} - \text{acc}_\text{OPS}$$

Time-shuffle permutes $T$ axis of $S$. OPS re-samples spike times while preserving $\pi$. Expected when model uses ordering: acc after shuffle $\approx 1/C$.

時間打亂置換 $S$ 的 $T$ 軸。OPS 在保留 $\pi$ 下重新抽樣尖峰時間。模型使用排序時，打亂後 acc $\approx 1/C$。

§4 Full Pipeline完整管線

$$\underbrace{\mathbf{cur}}_{\mathbb{R}^M} \xrightarrow{\text{LIF}_1,\text{LIF}_2} \underbrace{S}_{\{0,1\}^{M\times T}} \xrightarrow{\text{mask}} \underbrace{a_i}_{\mathbb{R}_+^M} \xrightarrow{\text{rank}} \underbrace{\hat{a}_i}_{[0,1]^M} \xrightarrow{\text{PL}} \underbrace{\mathbf{p}}_{\Delta^{K-1}} \xrightarrow{\mathbf{W}_\text{out}} \underbrace{\text{logits}}_{\mathbb{R}^C} \xrightarrow{\mathcal{L}_\text{CE}} \text{loss}$$

Gradient path (all non-zero):

梯度路徑（皆非零）：

$$\frac{\partial\mathcal{L}}{\partial\mathbf{cur}} = \frac{\partial\mathcal{L}}{\partial\mathbf{p}}\cdot\frac{\partial\mathbf{p}}{\partial\hat{\mathbf{a}}}\cdot\frac{\partial\hat{\mathbf{a}}}{\partial\mathbf{a}}\cdot\frac{\partial\mathbf{a}}{\partial S}\cdot\underbrace{\frac{\partial S}{\partial\text{mem}}}_{\text{surrogate}}\cdot\frac{\partial\text{mem}}{\partial\mathbf{cur}}$$

Learnable parameters可學習參數

W_out K × C ← sole learnable component in final POC architecture← 最終 POC 架構中唯一可學習元件
S M(M-1)/2 ← Cayley generator, if orthogonal layer present← Cayley 生成元（若有正交層）
LIF₁,₂ none ← β and θ are fixed hyperparameters (learn_beta=False)← β 與 θ 為固定超參數（learn_beta=False）
W_rec M × M ← removed in final architecture (see §9)← 最終架構中已移除（見 §10）

§5 Component Notes元件說明

Why base must be below theta為何 base 須低於 theta

When $\text{cur} \geq \theta$, LIF₁ fires on the very first timestep ($t=0$) without any membrane integration. Multiple neurons with currents above threshold all fire simultaneously at $t=0$, destroying ordering information before it can form. Setting $\text{base} < \theta$ forces the membrane to integrate over multiple steps, allowing timing differences to emerge from current magnitude differences.

當 $\text{cur} \geq \theta$ 時，LIF₁ 在 $t=0$ 立即發放，無膜電位積分。多個超閾神經元同時在 $t=0$ 發放，排序資訊在形成前即被破壞。設 $\text{base} < \theta$ 可強制膜電位跨多步積分，使時間差異由電流幅度差異浮現。

State Leaky — two-stage smoothingState Leaky — 兩階段平滑

LIF₁ ($\beta=0.9$) receives input current and generates an initial sparse spike train. LIF₂ ($\beta=0.95$, higher $\beta$ = longer memory) receives LIF₁'s spikes as input, integrating them over a wider window. Even if two neurons fire simultaneously at LIF₁, LIF₂'s independent integration may produce distinct timing. The chain increases temporal spread.

LIF₁（$\beta=0.9$）接收輸入電流並產生初始稀疏尖峰序列。LIF₂（$\beta=0.95$，較高 $\beta$ 表示更長記憶）以 LIF₁ 尖峰為輸入，在更寬窗口積分。即使 LIF₁ 有兩神經元同時發放，LIF₂ 的獨立積分仍可能產生不同時間。此鏈增加時間展開。

Orthogonal layer — square matrix constraint正交層 — 方陣約束

The Cayley map requires a square $d\times d$ generator $\mathbf{S}$, producing a $d\times d$ orthogonal $\mathbf{W}$. This means input and output dimensions must be equal. To use an orthogonal pre-SNN layer, the FFN encoder must first compress to exactly $M$ dimensions, then the orthogonal layer rotates within that $M$-dimensional space.

Cayley 映射需要 $d\times d$ 方陣生成元 $\mathbf{S}$，產生 $d\times d$ 正交 $\mathbf{W}$。輸入輸出維度須相等。使用正交 pre-SNN 層時，FFN 編碼器須先壓縮至恰好 $M$ 維，再由正交層在該 $M$ 維空間旋轉。

W_rec — timing pattern vs ordering tradeoffW_rec — 時間模式與排序的權衡

W_rec isolation experiments revealed: when W_rec is trained freely, it learns to exploit timing patterns (absolute spike times) rather than pure ordering, reducing E-drop from 0.744 to 0.416. W_rec naturally learns 92% inhibitory connections, but even purely inhibitory W_rec causes the model to walk a timing shortcut. Removing W_rec yields the highest E-drop. W_rec is excluded from the final POC architecture.

W_rec 隔離實驗顯示：自由訓練時 W_rec 利用時間模式（絕對尖峰時間）而非純排序，E-drop 從 0.744 降至 0.416。W_rec 自然學到 92% 抑制連接，但即使純抑制 W_rec 也會讓模型走時間捷徑。移除 W_rec 可得最高 E-drop。最終 POC 架構排除 W_rec。

Tied spikes — what happens when two neurons fire at the same timestep並列尖峰 — 兩神經元同時發放時

If neurons $i$ and $j$ both fire first at $t^*$: $a_i = a_j = e^{-wt^*}$, so $\sigma((a_i-a_j)/\tau) = 0.5$, giving $\hat{a}_i = \hat{a}_j$. The ordering between them is unresolvable — this is correct behaviour, not a bug. In practice, with $\text{delta}=0.05$ and $\sigma=0.02$, ties are rare because adjacent neurons have a 0.05 current gap while noise is only 0.02.

若神經元 $i$、$j$ 皆首次在 $t^*$ 發放：$a_i = a_j = e^{-wt^*}$，故 $\sigma((a_i-a_j)/\tau) = 0.5$，得 $\hat{a}_i = \hat{a}_j$。兩者排序無法分辨——這是正確行為，非 bug。實務上 $\text{delta}=0.05$、$\sigma=0.02$ 時並列罕見，因相鄰神經元電流差 0.05 而雜訊僅 0.02。

§6 Proof Chain證明鏈

Four experiments establishing that spike ordering is the primary information carrier.

四項實驗證明尖峰排序是主要資訊載體。

Temporal information is used — E-drop test使用時間資訊 — E-drop 測試

Random time-axis shuffle of $S$ drops accuracy from 1.000 to 0.256. After shuffle, acc $\approx 1/C$ (random chance). Ratio between/within = 18.2. No non-temporal feature can compensate.

隨機打亂 $S$ 的時間軸使準確率從 1.000 降至 0.256。打亂後 acc $\approx 1/C$（隨機猜測）。類間/類內比 = 18.2。非時間特徵無法補償。

E-drop = 0.744 · ratio between/within = 18.2 · shuffle acc → 0.256

No other information source — acc ≈ E-drop across all C無其他資訊來源 — 所有 C 下 acc ≈ E-drop

Across $C \in \{24,100,500,1000,5000,10000\}$, accuracy and E-drop remain equal. After time-shuffle, acc falls to exactly $1/C$. Verified: at $C=10000$, shuffle acc $= 0.002 \approx 1/C = 0.0001$.

在 $C \in \{24,100,500,1000,5000,10000\}$ 下，準確率與 E-drop 始終相等。時間打亂後 acc 降至 $1/C$。驗證：$C=10000$ 時打亂 acc $= 0.002 \approx 1/C$。

acc ≈ E-drop for all C · shuffle acc ≈ 1/C confirmed

Ordering is learned, not absolute timing — OID test學習的是排序而非絕對時間 — OID 測試

Ordering-Invariant Dataset fixes each class's ordering while re-sampling absolute spike times for every example. Within-class first-spike std = 1.96, yet ordering accuracy = 1.000.

排序不變資料集固定每類排序，但每樣本重新抽樣絕對尖峰時間。類內首次尖峰 std = 1.96，排序準確率仍為 1.000。

ordering_acc = 1.000 · within_fst_std = 1.96 · acc = 0.793

Only ordering matters at evaluation — OPS + soft rank評估時僅排序重要 — OPS + 軟排序

Order-Preserving Shuffle changes all spike times while preserving $\pi$. Without soft rank: OPS-drop = 0.228. With soft rank ($\tau=0.01$): OPS-drop = 0.017 (13× reduction). Accuracy unchanged at 1.000.

保序打亂改變所有尖峰時間但保留 $\pi$。無軟排序：OPS-drop = 0.228。有軟排序（$\tau=0.01$）：OPS-drop = 0.017（降 13 倍）。準確率維持 1.000。

OPS-drop (τ=0.01) = 0.017 · acc = 1.000 · 13× improvement

§7 Key Experimental Results關鍵實驗結果

Exp A — C scaling (M=8, M!=40,320)實驗 A — C 縮放（M=8, M!=40,320）

C	M!/C	acc	E-drop	acc − E-drop
24	1,680	1.000	0.764	+0.236
100	403	0.978	0.885	+0.093
500	81	0.856	0.830	+0.026
1,000	40	0.733	0.717	+0.016
5,000	8	0.310	0.307	+0.003
10,000	4	0.162	0.160	+0.002 ≈ 1/C

acc ≈ E-drop confirms ordering is the sole information channel across all C.acc ≈ E-drop 確認排序是所有 C 下唯一的資訊通道。

W_rec isolationW_rec 隔離

Condition條件	acc	E-drop	fst-std
Baseline (W_rec trained)基線（W_rec 已訓練）	0.999	0.416	8.935
Test 1: W_rec = 0測試 1：W_rec = 0	0.999	0.768	0.954
Test 2: W_rec col shuffle測試 2：W_rec 欄打亂	0.688	0.553	6.428
Test 3: W_rec frozen at init測試 3：W_rec 初始化凍結	1.000	0.744	0.954

Removing W_rec maximises E-drop. Trained W_rec exploits timing patterns, not ordering.移除 W_rec 可最大化 E-drop。訓練後的 W_rec 利用時間模式而非排序。

W_rec regularisation sweepW_rec 正則化掃描

λ	acc	E-drop	pos_frac
0.0 (none)	0.999	0.416	0.077
0.01	0.999	0.430	0.018
0.1	0.999	0.283	0.000
1.0	0.999	0.159	0.000
10.0	0.999	0.070	0.000
W_rec=0 (reference)	1.000	0.744	—

Regularisation makes E-drop monotonically worse. Fully suppressing W_rec (λ→∞) approaches but never equals W_rec=0.正則化使 E-drop 單調惡化。完全抑制 W_rec（λ→∞）逼近但不等於 W_rec=0。

OPS + soft rank — tau sweepOPS + 軟排序 — tau 掃描

τ	acc	OPS-drop	vs baseline相對基線
baseline (no rank)基線（無排序）	1.000	0.228	—
1.0	0.999	0.341	worse更差
0.1	1.000	0.113	−0.115
0.01	1.000	0.017	−0.211 (13×)

§8 Capacity Matrix容量矩陣

Full sweep: SNN vs NN baseline across $M \in \{4,6,8,10,12\}$ and $C \in \{10,100,1000,10000\}$. N/A indicates $M! < C$ (theoretically infeasible).

完整掃描：SNN 對 NN 基線，$M \in \{4,6,8,10,12\}$、$C \in \{10,100,1000,10000\}$。N/A 表示 $M! < C$（理論上不可行）。

SNN AccuracySNN 準確率

	C=10	C=100	C=1,000	C=10,000
M=4	0.600	N/A	N/A	N/A
M=6	0.900	0.463	N/A	N/A
M=8	1.000	0.973	0.741	0.163
M=10	1.000	1.000	0.997	0.964
M=12	1.000	1.000	1.000	0.999

NN Accuracy (same M-dim input)NN 準確率（相同 M 維輸入）

	C=10	C=100	C=1,000	C=10,000
M=4	0.750	N/A	N/A	N/A
M=6	1.000	0.723	N/A	N/A
M=8	1.000	0.978	1.000	0.961
M=10	1.000	1.000	1.000	0.998
M=12	1.000	1.000	1.000	1.000

E-drop (SNN) — ordering dependenceE-drop（SNN）— 排序依賴

	C=10	C=100	C=1,000	C=10,000
M=8	0.675	0.900	0.719	0.162
M=10	0.400	0.745	0.931	0.951
M=12	0.025	0.500	0.738	0.909

High E-drop (bottom-right) = ordering-dominated regime. Low E-drop (top-left) = trivially classifiable without ordering. NN dominates because DirectCurrentDataset encodes class in the current vector itself — see §9 and §11.高 E-drop（右下）= 排序主導區。低 E-drop（左上）= 無需排序即可分類。NN 佔優因 DirectCurrentDataset 在電流向量本身編碼類別 — 見 §9 與 §11。

M! / C ratioM! / C 比值

	C=10	C=100	C=1,000	C=10,000
M=4	2.4	0.24	0.024	0.002
M=6	72	7.2	0.72	0.072
M=8	4,032	403	40	4
M=10	363K	36K	3,629	363
M=12	47.9M	4.79M	479K	47.9K

§9 Confirmed Findings已確認發現

F1SNN can learn spike orderingSNN 可學習尖峰排序

With $\text{base}<\theta$ and soft first-spike scoring, the SNN exploits temporal ordering as the primary information channel. E-drop = 0.744; time-shuffle reduces accuracy to near random chance.

在 $\text{base}<\theta$ 與軟首次尖峰計分下，SNN 以時間排序為主要資訊通道。E-drop = 0.744；時間打亂使準確率降至近隨機。

F2$M!$ is the real capacity bound$M!$ 是真實容量上界

acc ≈ E-drop across all C. Capacity degrades when $M!/C \lesssim 8$. The bound is hard: 200 epochs at $M=8$, $C=10000$ ($M!/C=4$) gives the same acc=0.162 as 100 epochs.

所有 C 下 acc ≈ E-drop。$M!/C \lesssim 8$ 時容量下降。上界是硬的：$M=8$、$C=10000$（$M!/C=4$）訓練 200 epoch 與 100 epoch 同得 acc=0.162。

F3Any linear layer collapses ordering任何線性層都會破壞排序

An 8×8 standard linear layer before the SNN drops E-drop from 0.744 to 0.051. Even tiny scaling of currents allows the model to use a binary on/off code instead of fine-grained ordering.

SNN 前 8×8 標準線性層使 E-drop 從 0.744 降至 0.051。即使微小電流縮放也讓模型用開關碼取代細粒度排序。

F4Orthogonal layer preserves ordering正交層保留排序

Cayley-parameterised orthogonal layer maintains E-drop at 0.420. Orthogonal acc (0.923) exceeds standard linear (0.860), showing ordering is beneficial. Constraint: must be square ($M\times M$).

Cayley 參數化正交層維持 E-drop 0.420。正交 acc（0.923）優於標準線性（0.860），顯示排序有益。約束：須為方陣（$M\times M$）。

F5W_rec is a timing shortcut, not an ordering amplifierW_rec 是時間捷徑，非排序放大器

Trained W_rec reduces E-drop from 0.744 (W_rec=0) to 0.416. It naturally learns 92% inhibitory connections, but the whole W_rec system enables timing pattern exploitation. Regularisation makes it worse monotonically. Removing W_rec is the correct choice for ordering purity.

訓練後 W_rec 使 E-drop 從 0.744（W_rec=0）降至 0.416。自然學到 92% 抑制連接，但整體仍啟用時間模式利用。正則化單調惡化。為排序純度應移除 W_rec。

F6Soft rank eliminates magnitude leakage軟排序消除幅度洩漏

OPS-drop measures magnitude dependence beyond ordinal structure. Without soft rank: 0.228. With $\tau=0.01$: 0.017 (13× reduction). Accuracy maintained at 1.000.

OPS-drop 衡量序數結構外的幅度依賴。無軟排序：0.228。$\tau=0.01$：0.017（降 13 倍）。準確率維持 1.000。

F7Geometric structure is partially learned幾何結構部分被學習

Geo-gen experiment: fn-unseen = 0.670 vs chance 0.250. The SNN correctly identifies the first-firing neuron for unseen permutations. The bottleneck is the PL → readout interface, not the SNN itself.

Geo-gen 實驗：fn-unseen = 0.670 對隨機 0.250。SNN 能正確識別未見排列的首次發放神經元。瓶頸在 PL → 讀出介面，非 SNN 本身。

F8$M!/C$ hard threshold confirmed$M!/C$ 硬閾值已確認

$M=8$, $C=10000$, $M!/C=4$: acc=0.163, no improvement with 200 epochs. $M=10$, $C=10000$, $M!/C=363$: acc=0.952, converges in 58 epochs. The threshold is $M!/C \approx 8$.

$M=8$、$C=10000$、$M!/C=4$：acc=0.163，200 epoch 無改善。$M=10$、$C=10000$、$M!/C=363$：acc=0.952，58 epoch 收斂。閾值約 $M!/C \approx 8$。

F9E-drop is highest in the ordering-dominated regimeE-drop 在排序主導區最高

Capacity matrix shows: high E-drop only when both $M$ is large and $C$ is large (bottom-right of matrix). $M=10$, $C=10000$: E-drop=0.951. $M=12$, $C=10$: E-drop=0.025. Ordering is only necessary — and therefore used — under capacity pressure.

容量矩陣顯示：僅當 $M$ 與 $C$ 皆大時（矩陣右下）E-drop 才高。$M=10$、$C=10000$：E-drop=0.951。$M=12$、$C=10$：E-drop=0.025。排序僅在容量壓力下必要且被使用。

F10base must be below theta for LIF integration to matterbase 須低於 theta 才能使 LIF 積分有意義

Old setting (base=1.2 > θ=1.0): all neurons fire at $t=0$, ordering collapses. New setting (base=0.4 < θ=1.0): membrane integrates over multiple steps, timing differences emerge. Verified with sanity checks.

舊設定（base=1.2 > θ=1.0）：所有神經元在 $t=0$ 發放，排序崩潰。新設定（base=0.4 < θ=1.0）：膜電位跨多步積分，時間差異浮現。已通過健全性檢查驗證。

§10 Known Issues已知問題

—
Old dataset: base=1.2 > theta.舊資料集：base=1.2 > theta。 All neurons with cur > θ fire at $t=0$, destroying ordering. Fixed by setting base=0.40 < θ=1.0.cur > θ 的神經元皆在 $t=0$ 發放，破壞排序。已修正為 base=0.40 < θ=1.0。
resolved已解決
—
M=8, delta=0.05: neurons 6 and 7 never fire.M=8、delta=0.05：神經元 6 和 7 永不發放。 Current for neuron 7 = 0.05 < $(1-\beta)\theta = 0.10$. With delta=0.05 and M=8, the weakest two neurons cannot accumulate enough potential. Fix: use delta < $(0.40 - 0.10)/7 \approx 0.043$ for M=8, or accept that these neurons always rank last.神經元 7 電流 = 0.05 < $(1-\beta)\theta = 0.10$。M=8、delta=0.05 時最弱兩神經元無法累積足夠電位。修正：M=8 時 delta < $(0.40 - 0.10)/7 \approx 0.043$，或接受其永遠排名最後。
open待解
—
Simultaneous LIF₂ spikes (tie) cannot be resolved by W_rec.LIF₂ 同時尖峰（並列）無法由 W_rec 解決。 W_rec is one-step delayed: inhibition from a spike at $t^*$ only reaches other neurons at $t^*+1$. A tie at $t^*$ has already occurred before W_rec can act. Soft rank assigns $\hat{a}_i = \hat{a}_j = 0.5$ for tied neurons, which is the correct information-theoretic response but loses ordering detail between them.W_rec 延遲一步：$t^*$ 的抑制要到 $t^*+1$ 才影響其他神經元。$t^*$ 的並列在 W_rec 作用前已發生。軟排序對並列神經元設 $\hat{a}_i = \hat{a}_j = 0.5$，資訊論上正確但失去兩者間排序細節。
open待解
—
DirectCurrentDataset allows NN shortcut.DirectCurrentDataset 允許 NN 捷徑。 The current vector itself encodes class identity (each class maps to a fixed current pattern). NN achieves acc ≥ SNN across most of the capacity matrix. E-drop confirms SNN uses ordering, but a fair SNN vs NN comparison requires a dataset where the current vector alone is uninformative. This is a Phase 2 problem.電流向量本身編碼類別（每類對應固定電流模式）。NN 在容量矩陣多數區域 acc ≥ SNN。E-drop 確認 SNN 使用排序，但公平比較需電流向量本身無資訊的資料集。屬第二階段問題。
phase 2第二階段
—
Cayley map is square-only.Cayley 映射僅限方陣。 $\mathbf{W} \in \mathbb{R}^{M\times M}$. Cannot map from a higher-dimensional FFN output directly to $M$ neurons with an orthogonal layer. Requires FFN to first compress to exactly $M$ dimensions. Stiefel manifold parameterisation (semi-orthogonal tall matrix) is the generalisation but has not been implemented.$\mathbf{W} \in \mathbb{R}^{M\times M}$。無法從高維 FFN 輸出直接以正交層映射至 $M$ 神經元。須先壓縮至恰好 $M$ 維。Stiefel 流形參數化（半正交高矩陣）是推廣但尚未實作。
open待解
—
PL readout does not generalise to unseen permutations.PL 讀出無法泛化至未見排列。 W_out learns weights per permutation index. Permutations not seen during training have uninformed weights. fn-unseen=0.670 shows the SNN part generalises, but the readout does not. Hierarchical readout or ordering embedding required.W_out 按排列索引學習權重。訓練未見排列的權重無資訊。fn-unseen=0.670 顯示 SNN 部分可泛化，讀出不行。需階層讀出或排序嵌入。
open待解
—
tau=1.0 worsens OPS-drop vs baseline.tau=1.0 使 OPS-drop 較基線惡化。 Large tau makes soft rank output uniformly ~0.5, reducing both ordinal and magnitude information. This weakens the PL signal more than it helps OPS invariance, causing acc and E-drop to degrade slightly.大 tau 使軟排序輸出均約 0.5，削弱序數與幅度資訊。對 PL 信號的削弱大於 OPS 不變性收益，acc 與 E-drop 略降。
resolved — use tau=0.01已解決 — 使用 tau=0.01

§11 Open Questions待解問題

Q1 Tie-breaking.並列消解。 What is the most principled way to resolve simultaneous LIF₂ spikes without introducing a non-ordering side channel? Options: per-neuron threshold offsets, Stiefel pre-layer, or accepting ties as valid partial orderings.如何在不引入非排序旁路下最合理地解決 LIF₂ 同時尖峰？選項：每神經元閾值偏移、Stiefel 前置層，或接受並列為有效偏序。 Open待解
Q2 Delta constraint for M=8.M=8 的 Delta 約束。 What is the optimal $(base, \Delta, T)$ triple for $M=8$ that guarantees all 8 neurons fire within $T$ while maximising first-spike time separation?$M=8$ 時最優 $(base, \Delta, T)$ 三元組為何，能保證 8 神經元在 $T$ 內發放且最大化首次尖峰時間分離？ Open待解
Q3 Hierarchical readout.階層讀出。 Can replacing the flat PL readout with a hierarchical one (reading first-fire neuron, then second, etc.) allow generalisation to unseen permutations? fn-unseen=0.670 suggests the SNN is ready.以階層讀出（先讀首次發放神經元，再讀第二次…）取代扁平 PL 讀出，能否泛化至未見排列？fn-unseen=0.670 顯示 SNN 已就緒。 Partial部分
Q4 Ordering embedding.排序嵌入。 Can a learned $\pi\mapsto\mathbf{e}_\pi\in\mathbb{R}^d$ encode geometric structure so similar permutations are close, enabling $\mathbf{z}=\sum_\pi p_\pi\mathbf{e}_\pi$ to carry neighbourhood information?學習的 $\pi\mapsto\mathbf{e}_\pi\in\mathbb{R}^d$ 能否編碼幾何結構使相似排列相近，讓 $\mathbf{z}=\sum_\pi p_\pi\mathbf{e}_\pi$ 攜帶鄰域資訊？ Open待解
Q5 Stiefel manifold layer.Stiefel 流形層。 Can a semi-orthogonal tall matrix $\mathbf{W}\in\mathbb{R}^{M\times N}$ ($N>M$, $\mathbf{W}\mathbf{W}^\top=I_M$) allow an FFN encoder of arbitrary width to feed the SNN without scaling currents?半正交高矩陣 $\mathbf{W}\in\mathbb{R}^{M\times N}$（$N>M$，$\mathbf{W}\mathbf{W}^\top=I_M$）能否讓任意寬度 FFN 餵入 SNN 而不縮放電流？ Open待解
Q6 Optimal w and tau.最優 w 與 tau。 Theoretical relationship between $w$, $T$, and gradient magnitude reaching early vs late timesteps. Is there a principled choice of $\tau$ given $M$ and $\Delta$?$w$、$T$ 與梯度到達早/晚時間步幅度的理論關係。給定 $M$ 與 $\Delta$ 是否有原則性的 $\tau$ 選擇？ Open待解
Q7 Ordering autoencoder.排序自編碼器。 If a decoder can reconstruct input $\mathbf{x}$ from ordering alone (no class supervision), it proves ordering carries the full input signal. Strongest possible POC evidence.若解碼器僅從排序（無類別監督）重建輸入 $\mathbf{x}$，即證明排序攜帶完整輸入信號。最強 POC 證據。 Open待解
Q8 W_rec with pre-spike inhibition.具發放前抑制的 W_rec。 Is there a recurrent architecture that can break ties (act before $t^*$) without enabling timing pattern shortcuts? Options: inhibitory current injection before threshold crossing, or lateral inhibition through membrane potential directly.是否存在能在 $t^*$ 前打破並列、又不啟用時間模式捷徑的循環架構？選項：閾值前抑制電流注入，或經膜電位的側向抑制。 Open待解

§12 Phase 2 Directions第二階段方向

Dataset redesign資料集重新設計

DirectCurrentDataset encodes class identity directly in the current vector. A fair comparison between SNN ordering and NN requires a dataset where the instantaneous current vector alone is uninformative — only the temporal dynamics carry class information.

DirectCurrentDataset 在電流向量中直接編碼類別。SNN 排序與 NN 的公平比較需要瞬時電流向量本身無資訊的資料集——僅時間動力學攜帶類別資訊。

Candidate designs:

候選設計：

Time-series classification: class information spread across a sequence, not a single snapshot時間序列分類：類別資訊分布於序列而非單一快照
Synthetic temporal XOR: correct class depends on the order of events, not their magnitudes合成時間 XOR：正確類別取決於事件順序而非幅度
Matched current distributions: all classes draw currents from the same distribution; only the assignment to neuron indices differs (but this still leaks via neuron identity)匹配電流分布：各類從同分布抽樣電流，僅神經元索引分配不同（但仍可能經神經元身份洩漏）
UCR benchmark: real-world time-series where NN must process the full sequenceUCR 基準：NN 須處理完整序列的真實時間序列

Architecture extensions架構擴展

Stiefel manifold pre-layer: $\mathbf{W}\in\mathbb{R}^{M\times N}$ semi-orthogonal, allows arbitrary-width FFN encoderStiefel 流形前置層：$\mathbf{W}\in\mathbb{R}^{M\times N}$ 半正交，允許任意寬度 FFN 編碼器
Hierarchical readout: read ordering level-by-level, enabling unseen-permutation generalisation階層讀出：逐層讀取排序，支援未見排列泛化
Ordering embedding: learnable $\pi\mapsto\mathbf{e}_\pi$ with geometric structure排序嵌入：可學習的 $\pi\mapsto\mathbf{e}_\pi$ 具幾何結構
Pre-spike inhibition: W_rec variant that acts on membrane potential before threshold crossing發放前抑制：在閾值前作用於膜電位的 W_rec 變體

Scale規模

Real benchmark validation: UCR time-series, genomics (DeepSEA), or event-based vision真實基準驗證：UCR 時間序列、基因組學（DeepSEA）或事件驅動視覺
Comparison against rate-based SNN and equivalent-parameter MLP under matched dataset conditions在匹配資料集條件下與 rate-based SNN 及等參數 MLP 比較
M=16+ with better MC approximation (K=2048+) once dataset issues are resolved資料集問題解決後以更好 MC 近似（K=2048+）擴展至 M=16+