§1 Symbol Table符號表

MNumber of neurons神經元數量ℤ₊
Determines the size of the ordering space: $M$ neurons produce $M!$ distinct spike orderings. Tested range: $M \in \{4,6,8,10,12\}$.
決定排序空間大小:$M$ 個神經元產生 $M!$ 種不同尖峰排序。測試範圍:$M \in \{4,6,8,10,12\}$。
TNumber of timesteps時間步數ℤ₊
Simulation length. Must satisfy $T \gg M$ so all neurons can fire at distinct timesteps. Set as $T = \max(16,\, 2M)$ in sweep experiments.
模擬長度。須滿足 $T \gg M$,使所有神經元能在不同時間步發放。掃描實驗中設為 $T = \max(16,\, 2M)$。
CNumber of classes類別數ℤ₊
Output categories. The ratio $M!/C$ governs available ordering space per class. Capacity degrades when $M!/C \lesssim 8$; system is over capacity when $M!/C < 1$.
輸出類別數。比值 $M!/C$ 決定每類可用的排序空間。當 $M!/C \lesssim 8$ 時容量下降;$M!/C < 1$ 時系統超載。
$\beta$Membrane decay rate膜電位衰減率(0,1)
LIF leakage coefficient. Higher $\beta$ means slower decay and longer integration window. LIF₁: $\beta_1=0.9$; LIF₂: $\beta_2=0.95$. Controls integration speed: steady-state membrane potential is $\text{mem}_\infty = \text{cur}/(1-\beta)$.
LIF 漏電係數。$\beta$ 越大衰減越慢、積分窗口越長。LIF₁:$\beta_1=0.9$;LIF₂:$\beta_2=0.95$。穩態膜電位為 $\text{mem}_\infty = \text{cur}/(1-\beta)$。
$\theta$Firing threshold發放閾值ℝ₊
Membrane potential value triggering a spike. Default $\theta=1.0$ (snntorch). A neuron will eventually spike iff $\text{cur} > (1-\beta)\theta$.
觸發尖峰的膜電位閾值。預設 $\theta=1.0$(snntorch)。神經元最終會發放當且僅當 $\text{cur} > (1-\beta)\theta$。
baseBase current基準電流ℝ₊
Input current for the highest-rank neuron. Must satisfy $\text{base} < \theta$ to force membrane integration (rather than immediate spiking at $t=0$). Default: $0.40$.
最高排名神經元的輸入電流。須滿足 $\text{base} < \theta$,以強制膜電位積分(而非在 $t=0$ 立即發放)。預設:$0.40$。
$\Delta$Current step (delta)電流步長(delta)ℝ₊
Current decrement per rank: neuron at rank $k$ receives current $\text{base} - k\Delta$. Controls first-spike time separation between adjacent neurons. Must satisfy $\Delta < (\text{base} - (1-\beta)\theta)/(M-1)$ to guarantee all neurons fire within $T$.
每級電流遞減:排名 $k$ 的神經元接收 $\text{base} - k\Delta$。控制相鄰神經元首次尖峰時間間隔。須滿足 $\Delta < (\text{base} - (1-\beta)\theta)/(M-1)$ 以保證所有神經元在 $T$ 內發放。
$\sigma$Input noise輸入雜訊ℝ₊
Standard deviation of Gaussian noise added to each sample's current vector. Provides within-class sample diversity. Must remain small relative to $\Delta$ to preserve ordering: $\sigma \ll \Delta$.
加於每個樣本電流向量的高斯雜訊標準差。提供類內樣本多樣性。相對 $\Delta$ 須足夠小以維持排序:$\sigma \ll \Delta$。
$w$Temporal decay rate時間衰減率ℝ₊
Exponential weighting of spike times in soft first-spike score. Larger $w$ emphasises earlier spikes more strongly. Default $w=0.5$.
軟首次尖峰分數中對尖峰時間的指數加權。$w$ 越大越強調較早的尖峰。預設 $w=0.5$。
$\tau$Soft rank temperature軟排序溫度ℝ₊
Sigmoid sharpness in soft rank formula. $\tau\to0$ recovers exact rank (zero gradient); $\tau\to\infty$ gives uniform output. Default $\tau=0.01$ for OPS invariance.
軟排序公式中 sigmoid 的銳度。$\tau\to0$ 逼近精確排序(梯度為零);$\tau\to\infty$ 輸出均勻。預設 $\tau=0.01$ 以達 OPS 不變性。
$K$Monte Carlo samples蒙地卡羅樣本數ℤ₊
Permutations sampled via Gumbel-max when $M>6$. Default $K=512$. Readout dimension equals $K$ in MC mode.
當 $M>6$ 時以 Gumbel-max 抽樣的排列數。預設 $K=512$。MC 模式下讀出維度等於 $K$。
$M_{i,t}$First-spike mask首次尖峰遮罩{0,1}
$M_{i,t}=1$ iff neuron $i$ fires at $t$ and has not fired before. Isolates the first spike for differentiable scoring.
$M_{i,t}=1$ 當且僅當神經元 $i$ 在 $t$ 發放且此前未發放。隔離首次尖峰以供可微分計分。
$a_i$Soft first-spike score軟首次尖峰分數ℝ₊
Differentiable approximation of $e^{-w t_i^*}$. Larger = earlier first spike. Gradient $\partial a_i/\partial S_{i,t} = e^{-wt}\neq 0$.
$e^{-w t_i^*}$ 的可微近似。越大表示首次尖峰越早。梯度 $\partial a_i/\partial S_{i,t} = e^{-wt}\neq 0$。
$\hat{a}_i$Soft rank score軟排序分數[0,1]
Rank-normalised version of $a_i$. Contains only ordinal information — magnitude is discarded. Invariant to order-preserving rescaling of spike times (OPS-invariant when $\tau$ is small).
$a_i$ 的排序正規化版本。僅含序數資訊——幅度被捨棄。對保序尖峰時間縮放不變($\tau$ 小時具 OPS 不變性)。
$\mathbf{p}$Plackett-Luce distributionPlackett-Luce 分布$\Delta^{K-1}$
Probability distribution over $M!$ permutations parameterised by $\hat{\mathbf{a}}$. Entry $p_\pi = P(\pi|\hat{\mathbf{a}})$. Approximated by $K$ Gumbel-max samples for $M>6$.
以 $\hat{\mathbf{a}}$ 參數化的 $M!$ 排列機率分布。$p_\pi = P(\pi|\hat{\mathbf{a}})$。$M>6$ 時以 $K$ 個 Gumbel-max 樣本近似。
fst-stdFirst-spike time std首次尖峰時間標準差ℝ₊
Average std of per-neuron first-spike times across classes. Higher = greater ordering separation between classes.
各類別間每神經元首次尖峰時間的平均標準差。越高表示類間排序分離越大。
E-dropTime-shuffle accuracy drop時間打亂準確率下降[0,1]
$\text{acc(normal)} - \text{acc(time-shuffled)}$. High E-drop confirms temporal ordering is the primary information source.
$\text{acc(normal)} - \text{acc(time-shuffled)}$。高 E-drop 確認時間排序是主要資訊來源。
OPS-dropOrder-preserving shuffle drop保序打亂下降[0,1]
$\text{acc(normal)} - \text{acc(OPS)}$. OPS re-samples spike times while preserving $\pi$. Low OPS-drop means the model uses ordering, not absolute timing values.
$\text{acc(normal)} - \text{acc(OPS)}$。OPS 在保留 $\pi$ 下重新抽樣尖峰時間。低 OPS-drop 表示模型使用排序而非絕對時間值。

§2 Parameter Guide參數指南

How base, delta, beta, theta, and sigma interact, and constraints for correct operation.

base、delta、beta、theta 與 sigma 的交互作用,以及正確運作的約束條件。

Guarantee all neurons fire within $T$保證所有神經元在 $T$ 內發放

A neuron fires eventually iff its steady-state membrane potential exceeds $\theta$:

神經元最終會發放當且僅當其穩態膜電位超過 $\theta$:

$$\frac{\text{cur}}{1-\beta} > \theta \;\Longrightarrow\; \text{cur} > (1-\beta)\theta$$

For the weakest neuron (rank $M-1$, current $= \text{base}-(M-1)\Delta$):

最弱神經元(排名 $M-1$,電流 $= \text{base}-(M-1)\Delta$):

$$\Delta < \frac{\text{base} - (1-\beta)\theta}{M-1}$$

Avoid immediate spiking at $t=0$避免在 $t=0$ 立即發放

If $\text{cur} \geq \theta$, LIF₁ fires on the very first timestep before any integration occurs, eliminating timing information. Requirement:

若 $\text{cur} \geq \theta$,LIF₁ 在第一步就發放,無法積分,時間資訊消失。要求:

$$\text{base} < \theta = 1.0$$

Preserve ordering under noise在雜訊下保持排序

Noise $\sigma$ must be small enough that adjacent neurons rarely swap rank:

雜訊 $\sigma$ 須足夠小,使相鄰神經元很少交換排名:

$$\sigma \ll \Delta \quad\text{(rule of thumb: } \sigma \leq \Delta/3\text{)}$$

Current configuration目前設定

theta 1.0 ← snntorch default, fixed← snntorch 預設,固定
beta1 0.9 ← LIF₁ decay rate← LIF₁ 衰減率
beta2 0.95 ← LIF₂ decay rate (smoother)← LIF₂ 衰減率(更平滑)
base 0.40 ← < theta, forces integration← < theta,強制積分
delta 0.05 ← verified: M=4 gives {0:5,1:6,2:8,3:10}← 已驗證:M=4 得 {0:5,1:6,2:8,3:10}
sigma 0.02 ← sigma/delta = 0.4, ordering stable← sigma/delta = 0.4,排序穩定
w 0.5 ← temporal decay in soft score← 軟分數中的時間衰減
tau 0.01 ← near-exact rank, OPS-drop=0.017← 近精確排序,OPS-drop=0.017
K 512 ← MC samples for M>6← M>6 的 MC 樣本
T max(16,2M)← ensures all neurons fire← 確保所有神經元發放

Known issue with M=8, delta=0.05已知問題:M=8、delta=0.05

With delta=0.05 and M=8, neurons 6 and 7 (currents 0.10 and 0.05) do not fire within T=50. The minimum viable current is $(1-0.9)\times1.0 = 0.10$, so neuron 6 is borderline and neuron 7 never fires. See §9 Known Issues for discussion.

delta=0.05、M=8 時,神經元 6 和 7(電流 0.10 與 0.05)在 T=50 內不發放。最低可行電流為 $(1-0.9)\times1.0 = 0.10$,神經元 6 處於臨界,神經元 7 永不發放。詳見 §10 已知問題。

§3 Formula Sheet公式表

LIF DynamicsLIF 動力學Membrane update & spike膜電位更新與尖峰
$$\text{mem}_i[t] = \beta\cdot\text{mem}_i[t-1] + \text{cur}_i$$ $$S_{i,t} = \mathbf{1}[\text{mem}_i[t]>\theta]$$

Surrogate gradient approximates $\partial S/\partial\text{mem}$. Steady-state: $\text{mem}_\infty = \text{cur}/(1-\beta)$.

代理梯度近似 $\partial S/\partial\text{mem}$。穩態:$\text{mem}_\infty = \text{cur}/(1-\beta)$。

First-Spike Mask首次尖峰遮罩Isolate first event隔離首次事件
$$M_{i,t} = S_{i,t}\cdot\prod_{s=0}^{t-1}(1-S_{i,s})$$

$M_{i,t}=1$ iff neuron $i$ fires at $t$ and nowhere before. Gradient: $\partial M_{i,t}/\partial S_{i,s}\neq0$.

$M_{i,t}=1$ 當且僅當神經元 $i$ 在 $t$ 發放且此前未發放。梯度:$\partial M_{i,t}/\partial S_{i,s}\neq0$。

Soft First-Spike Score軟首次尖峰分數Differentiable timing可微分時間
$$a_i = \sum_{t=0}^{T-1} M_{i,t}\cdot e^{-w\cdot t}$$

Approximates $e^{-w t_i^*}$. Gradient: $\partial a_i/\partial S_{i,t} = e^{-wt}\cdot\prod_{s<t}(1-S_{i,s})\neq0$.

近似 $e^{-w t_i^*}$。梯度:$\partial a_i/\partial S_{i,t} = e^{-wt}\cdot\prod_{s<t}(1-S_{i,s})\neq0$。

Soft Rank軟排序Discard magnitude捨棄幅度
$$\hat{a}_i = \frac{1}{M-1}\sum_{j\neq i}\sigma\!\left(\frac{a_i-a_j}{\tau}\right)$$

$\hat{a}_i\in[0,1]$. Invariant to order-preserving transforms. As $\tau\to0$, $\hat{a}_i\to\text{rank}(a_i)/(M-1)$. Tied neurons: $\hat{a}_i=\hat{a}_j=0.5$ (unresolvable).

$\hat{a}_i\in[0,1]$。對保序變換不變。$\tau\to0$ 時 $\hat{a}_i\to\text{rank}(a_i)/(M-1)$。並列神經元:$\hat{a}_i=\hat{a}_j=0.5$(無法分辨)。

Plackett-LucePlackett-LuceDistribution over $S_M$$S_M$ 上的分布
$$\log P(\pi|\hat{\mathbf{a}}) = \sum_{k=1}^{M}\!\left(\hat{a}_{\pi_k} - \log\sum_{j=k}^{M}e^{\hat{a}_{\pi_j}}\right)$$

Computed in log-space. Final $\mathbf{p}=\text{softmax}([\log P(\pi_i|\hat{\mathbf{a}})])$ over all $M!$ permutations (or $K$ MC samples).

在 log 空間計算。最終 $\mathbf{p}=\text{softmax}([\log P(\pi_i|\hat{\mathbf{a}})])$,遍歷所有 $M!$ 排列(或 $K$ 個 MC 樣本)。

Gumbel-Max SamplingGumbel-Max 抽樣Monte Carlo PL蒙地卡羅 PL
$$\pi^{(k)}=\text{argsort}(-(\hat{\mathbf{a}}+\mathbf{g}^{(k)})),\quad\mathbf{g}^{(k)}\sim\text{Gumbel}(0,1)^M$$

argsort inside torch.no_grad(). Gradients flow through $\log P(\pi^{(k)}|\hat{\mathbf{a}})$ only.

argsort 在 torch.no_grad() 內。梯度僅流經 $\log P(\pi^{(k)}|\hat{\mathbf{a}})$。

Orthogonal Layer正交層Cayley mapCayley 映射
$$\mathbf{W}=(I+\mathbf{S})^{-1}(I-\mathbf{S}),\quad\mathbf{S}=-\mathbf{S}^\top$$ $$\mathbf{W}^\top\mathbf{W}=I\;\Rightarrow\;\|\mathbf{W}\mathbf{x}\|=\|\mathbf{x}\|$$

Learned parameter is upper-triangle of $\mathbf{S}$ (unconstrained). Output is pure rotation — no scaling. Constraint: input and output must have same dimension $d\times d$.

學習參數為 $\mathbf{S}$ 的上三角(無約束)。輸出為純旋轉——無縮放。約束:輸入輸出須同維 $d\times d$。

E-drop & OPS-dropE-drop 與 OPS-dropDiagnostic metrics診斷指標
$$\text{E-drop} = \text{acc}_\text{normal} - \text{acc}_\text{time-shuffled}$$ $$\text{OPS-drop} = \text{acc}_\text{normal} - \text{acc}_\text{OPS}$$

Time-shuffle permutes $T$ axis of $S$. OPS re-samples spike times while preserving $\pi$. Expected when model uses ordering: acc after shuffle $\approx 1/C$.

時間打亂置換 $S$ 的 $T$ 軸。OPS 在保留 $\pi$ 下重新抽樣尖峰時間。模型使用排序時,打亂後 acc $\approx 1/C$。

§4 Full Pipeline完整管線

$$\underbrace{\mathbf{cur}}_{\mathbb{R}^M} \xrightarrow{\text{LIF}_1,\text{LIF}_2} \underbrace{S}_{\{0,1\}^{M\times T}} \xrightarrow{\text{mask}} \underbrace{a_i}_{\mathbb{R}_+^M} \xrightarrow{\text{rank}} \underbrace{\hat{a}_i}_{[0,1]^M} \xrightarrow{\text{PL}} \underbrace{\mathbf{p}}_{\Delta^{K-1}} \xrightarrow{\mathbf{W}_\text{out}} \underbrace{\text{logits}}_{\mathbb{R}^C} \xrightarrow{\mathcal{L}_\text{CE}} \text{loss}$$

Gradient path (all non-zero):

梯度路徑(皆非零):

$$\frac{\partial\mathcal{L}}{\partial\mathbf{cur}} = \frac{\partial\mathcal{L}}{\partial\mathbf{p}}\cdot\frac{\partial\mathbf{p}}{\partial\hat{\mathbf{a}}}\cdot\frac{\partial\hat{\mathbf{a}}}{\partial\mathbf{a}}\cdot\frac{\partial\mathbf{a}}{\partial S}\cdot\underbrace{\frac{\partial S}{\partial\text{mem}}}_{\text{surrogate}}\cdot\frac{\partial\text{mem}}{\partial\mathbf{cur}}$$

Learnable parameters可學習參數

W_out K × C ← sole learnable component in final POC architecture← 最終 POC 架構中唯一可學習元件
S M(M-1)/2 ← Cayley generator, if orthogonal layer present← Cayley 生成元(若有正交層)
LIF₁,₂ none ← β and θ are fixed hyperparameters (learn_beta=False)← β 與 θ 為固定超參數(learn_beta=False)
W_rec M × M ← removed in final architecture (see §9)← 最終架構中已移除(見 §10)

§5 Component Notes元件說明

Why base must be below theta為何 base 須低於 theta

When $\text{cur} \geq \theta$, LIF₁ fires on the very first timestep ($t=0$) without any membrane integration. Multiple neurons with currents above threshold all fire simultaneously at $t=0$, destroying ordering information before it can form. Setting $\text{base} < \theta$ forces the membrane to integrate over multiple steps, allowing timing differences to emerge from current magnitude differences.

當 $\text{cur} \geq \theta$ 時,LIF₁ 在 $t=0$ 立即發放,無膜電位積分。多個超閾神經元同時在 $t=0$ 發放,排序資訊在形成前即被破壞。設 $\text{base} < \theta$ 可強制膜電位跨多步積分,使時間差異由電流幅度差異浮現。

State Leaky — two-stage smoothingState Leaky — 兩階段平滑

LIF₁ ($\beta=0.9$) receives input current and generates an initial sparse spike train. LIF₂ ($\beta=0.95$, higher $\beta$ = longer memory) receives LIF₁'s spikes as input, integrating them over a wider window. Even if two neurons fire simultaneously at LIF₁, LIF₂'s independent integration may produce distinct timing. The chain increases temporal spread.

LIF₁($\beta=0.9$)接收輸入電流並產生初始稀疏尖峰序列。LIF₂($\beta=0.95$,較高 $\beta$ 表示更長記憶)以 LIF₁ 尖峰為輸入,在更寬窗口積分。即使 LIF₁ 有兩神經元同時發放,LIF₂ 的獨立積分仍可能產生不同時間。此鏈增加時間展開。

Orthogonal layer — square matrix constraint正交層 — 方陣約束

The Cayley map requires a square $d\times d$ generator $\mathbf{S}$, producing a $d\times d$ orthogonal $\mathbf{W}$. This means input and output dimensions must be equal. To use an orthogonal pre-SNN layer, the FFN encoder must first compress to exactly $M$ dimensions, then the orthogonal layer rotates within that $M$-dimensional space.

Cayley 映射需要 $d\times d$ 方陣生成元 $\mathbf{S}$,產生 $d\times d$ 正交 $\mathbf{W}$。輸入輸出維度須相等。使用正交 pre-SNN 層時,FFN 編碼器須先壓縮至恰好 $M$ 維,再由正交層在該 $M$ 維空間旋轉。

W_rec — timing pattern vs ordering tradeoffW_rec — 時間模式與排序的權衡

W_rec isolation experiments revealed: when W_rec is trained freely, it learns to exploit timing patterns (absolute spike times) rather than pure ordering, reducing E-drop from 0.744 to 0.416. W_rec naturally learns 92% inhibitory connections, but even purely inhibitory W_rec causes the model to walk a timing shortcut. Removing W_rec yields the highest E-drop. W_rec is excluded from the final POC architecture.

W_rec 隔離實驗顯示:自由訓練時 W_rec 利用時間模式(絕對尖峰時間)而非純排序,E-drop 從 0.744 降至 0.416。W_rec 自然學到 92% 抑制連接,但即使純抑制 W_rec 也會讓模型走時間捷徑。移除 W_rec 可得最高 E-drop。最終 POC 架構排除 W_rec。

Tied spikes — what happens when two neurons fire at the same timestep並列尖峰 — 兩神經元同時發放時

If neurons $i$ and $j$ both fire first at $t^*$: $a_i = a_j = e^{-wt^*}$, so $\sigma((a_i-a_j)/\tau) = 0.5$, giving $\hat{a}_i = \hat{a}_j$. The ordering between them is unresolvable — this is correct behaviour, not a bug. In practice, with $\text{delta}=0.05$ and $\sigma=0.02$, ties are rare because adjacent neurons have a 0.05 current gap while noise is only 0.02.

若神經元 $i$、$j$ 皆首次在 $t^*$ 發放:$a_i = a_j = e^{-wt^*}$,故 $\sigma((a_i-a_j)/\tau) = 0.5$,得 $\hat{a}_i = \hat{a}_j$。兩者排序無法分辨——這是正確行為,非 bug。實務上 $\text{delta}=0.05$、$\sigma=0.02$ 時並列罕見,因相鄰神經元電流差 0.05 而雜訊僅 0.02。

§6 Proof Chain證明鏈

Four experiments establishing that spike ordering is the primary information carrier.

四項實驗證明尖峰排序是主要資訊載體。

01
Temporal information is used — E-drop test使用時間資訊 — E-drop 測試

Random time-axis shuffle of $S$ drops accuracy from 1.000 to 0.256. After shuffle, acc $\approx 1/C$ (random chance). Ratio between/within = 18.2. No non-temporal feature can compensate.

隨機打亂 $S$ 的時間軸使準確率從 1.000 降至 0.256。打亂後 acc $\approx 1/C$(隨機猜測)。類間/類內比 = 18.2。非時間特徵無法補償。

E-drop = 0.744 · ratio between/within = 18.2 · shuffle acc → 0.256
02
No other information source — acc ≈ E-drop across all C無其他資訊來源 — 所有 C 下 acc ≈ E-drop

Across $C \in \{24,100,500,1000,5000,10000\}$, accuracy and E-drop remain equal. After time-shuffle, acc falls to exactly $1/C$. Verified: at $C=10000$, shuffle acc $= 0.002 \approx 1/C = 0.0001$.

在 $C \in \{24,100,500,1000,5000,10000\}$ 下,準確率與 E-drop 始終相等。時間打亂後 acc 降至 $1/C$。驗證:$C=10000$ 時打亂 acc $= 0.002 \approx 1/C$。

acc ≈ E-drop for all C · shuffle acc ≈ 1/C confirmed
03
Ordering is learned, not absolute timing — OID test學習的是排序而非絕對時間 — OID 測試

Ordering-Invariant Dataset fixes each class's ordering while re-sampling absolute spike times for every example. Within-class first-spike std = 1.96, yet ordering accuracy = 1.000.

排序不變資料集固定每類排序,但每樣本重新抽樣絕對尖峰時間。類內首次尖峰 std = 1.96,排序準確率仍為 1.000。

ordering_acc = 1.000 · within_fst_std = 1.96 · acc = 0.793
04
Only ordering matters at evaluation — OPS + soft rank評估時僅排序重要 — OPS + 軟排序

Order-Preserving Shuffle changes all spike times while preserving $\pi$. Without soft rank: OPS-drop = 0.228. With soft rank ($\tau=0.01$): OPS-drop = 0.017 (13× reduction). Accuracy unchanged at 1.000.

保序打亂改變所有尖峰時間但保留 $\pi$。無軟排序:OPS-drop = 0.228。有軟排序($\tau=0.01$):OPS-drop = 0.017(降 13 倍)。準確率維持 1.000。

OPS-drop (τ=0.01) = 0.017 · acc = 1.000 · 13× improvement

§7 Key Experimental Results關鍵實驗結果

Exp A — C scaling (M=8, M!=40,320)實驗 A — C 縮放(M=8, M!=40,320)

CM!/CaccE-dropacc − E-drop
241,6801.0000.764+0.236
1004030.9780.885+0.093
500810.8560.830+0.026
1,000400.7330.717+0.016
5,00080.3100.307+0.003
10,00040.1620.160+0.002 ≈ 1/C
acc ≈ E-drop confirms ordering is the sole information channel across all C.acc ≈ E-drop 確認排序是所有 C 下唯一的資訊通道。

W_rec isolationW_rec 隔離

Condition條件accE-dropfst-std
Baseline (W_rec trained)基線(W_rec 已訓練)0.9990.4168.935
Test 1: W_rec = 0測試 1:W_rec = 00.9990.7680.954
Test 2: W_rec col shuffle測試 2:W_rec 欄打亂0.6880.5536.428
Test 3: W_rec frozen at init測試 3:W_rec 初始化凍結1.0000.7440.954
Removing W_rec maximises E-drop. Trained W_rec exploits timing patterns, not ordering.移除 W_rec 可最大化 E-drop。訓練後的 W_rec 利用時間模式而非排序。

W_rec regularisation sweepW_rec 正則化掃描

λaccE-droppos_frac
0.0 (none)0.9990.4160.077
0.010.9990.4300.018
0.10.9990.2830.000
1.00.9990.1590.000
10.00.9990.0700.000
W_rec=0 (reference)1.0000.744
Regularisation makes E-drop monotonically worse. Fully suppressing W_rec (λ→∞) approaches but never equals W_rec=0.正則化使 E-drop 單調惡化。完全抑制 W_rec(λ→∞)逼近但不等於 W_rec=0。

OPS + soft rank — tau sweepOPS + 軟排序 — tau 掃描

τaccOPS-dropvs baseline相對基線
baseline (no rank)基線(無排序)1.0000.228
1.00.9990.341worse更差
0.11.0000.113−0.115
0.011.0000.017−0.211 (13×)

§8 Capacity Matrix容量矩陣

Full sweep: SNN vs NN baseline across $M \in \{4,6,8,10,12\}$ and $C \in \{10,100,1000,10000\}$. N/A indicates $M! < C$ (theoretically infeasible).

完整掃描:SNN 對 NN 基線,$M \in \{4,6,8,10,12\}$、$C \in \{10,100,1000,10000\}$。N/A 表示 $M! < C$(理論上不可行)。

SNN AccuracySNN 準確率
C=10C=100C=1,000C=10,000
M=40.600N/AN/AN/A
M=60.9000.463N/AN/A
M=81.0000.9730.7410.163
M=101.0001.0000.9970.964
M=121.0001.0001.0000.999
NN Accuracy (same M-dim input)NN 準確率(相同 M 維輸入)
C=10C=100C=1,000C=10,000
M=40.750N/AN/AN/A
M=61.0000.723N/AN/A
M=81.0000.9781.0000.961
M=101.0001.0001.0000.998
M=121.0001.0001.0001.000
E-drop (SNN) — ordering dependenceE-drop(SNN)— 排序依賴
C=10C=100C=1,000C=10,000
M=80.6750.9000.7190.162
M=100.4000.7450.9310.951
M=120.0250.5000.7380.909
High E-drop (bottom-right) = ordering-dominated regime. Low E-drop (top-left) = trivially classifiable without ordering. NN dominates because DirectCurrentDataset encodes class in the current vector itself — see §9 and §11.高 E-drop(右下)= 排序主導區。低 E-drop(左上)= 無需排序即可分類。NN 佔優因 DirectCurrentDataset 在電流向量本身編碼類別 — 見 §9 與 §11。
M! / C ratioM! / C 比值
C=10C=100C=1,000C=10,000
M=42.40.240.0240.002
M=6727.20.720.072
M=84,032403404
M=10363K36K3,629363
M=1247.9M4.79M479K47.9K

§9 Confirmed Findings已確認發現

F1SNN can learn spike orderingSNN 可學習尖峰排序
With $\text{base}<\theta$ and soft first-spike scoring, the SNN exploits temporal ordering as the primary information channel. E-drop = 0.744; time-shuffle reduces accuracy to near random chance.
在 $\text{base}<\theta$ 與軟首次尖峰計分下,SNN 以時間排序為主要資訊通道。E-drop = 0.744;時間打亂使準確率降至近隨機。
F2$M!$ is the real capacity bound$M!$ 是真實容量上界
acc ≈ E-drop across all C. Capacity degrades when $M!/C \lesssim 8$. The bound is hard: 200 epochs at $M=8$, $C=10000$ ($M!/C=4$) gives the same acc=0.162 as 100 epochs.
所有 C 下 acc ≈ E-drop。$M!/C \lesssim 8$ 時容量下降。上界是硬的:$M=8$、$C=10000$($M!/C=4$)訓練 200 epoch 與 100 epoch 同得 acc=0.162。
F3Any linear layer collapses ordering任何線性層都會破壞排序
An 8×8 standard linear layer before the SNN drops E-drop from 0.744 to 0.051. Even tiny scaling of currents allows the model to use a binary on/off code instead of fine-grained ordering.
SNN 前 8×8 標準線性層使 E-drop 從 0.744 降至 0.051。即使微小電流縮放也讓模型用開關碼取代細粒度排序。
F4Orthogonal layer preserves ordering正交層保留排序
Cayley-parameterised orthogonal layer maintains E-drop at 0.420. Orthogonal acc (0.923) exceeds standard linear (0.860), showing ordering is beneficial. Constraint: must be square ($M\times M$).
Cayley 參數化正交層維持 E-drop 0.420。正交 acc(0.923)優於標準線性(0.860),顯示排序有益。約束:須為方陣($M\times M$)。
F5W_rec is a timing shortcut, not an ordering amplifierW_rec 是時間捷徑,非排序放大器
Trained W_rec reduces E-drop from 0.744 (W_rec=0) to 0.416. It naturally learns 92% inhibitory connections, but the whole W_rec system enables timing pattern exploitation. Regularisation makes it worse monotonically. Removing W_rec is the correct choice for ordering purity.
訓練後 W_rec 使 E-drop 從 0.744(W_rec=0)降至 0.416。自然學到 92% 抑制連接,但整體仍啟用時間模式利用。正則化單調惡化。為排序純度應移除 W_rec。
F6Soft rank eliminates magnitude leakage軟排序消除幅度洩漏
OPS-drop measures magnitude dependence beyond ordinal structure. Without soft rank: 0.228. With $\tau=0.01$: 0.017 (13× reduction). Accuracy maintained at 1.000.
OPS-drop 衡量序數結構外的幅度依賴。無軟排序:0.228。$\tau=0.01$:0.017(降 13 倍)。準確率維持 1.000。
F7Geometric structure is partially learned幾何結構部分被學習
Geo-gen experiment: fn-unseen = 0.670 vs chance 0.250. The SNN correctly identifies the first-firing neuron for unseen permutations. The bottleneck is the PL → readout interface, not the SNN itself.
Geo-gen 實驗:fn-unseen = 0.670 對隨機 0.250。SNN 能正確識別未見排列的首次發放神經元。瓶頸在 PL → 讀出介面,非 SNN 本身。
F8$M!/C$ hard threshold confirmed$M!/C$ 硬閾值已確認
$M=8$, $C=10000$, $M!/C=4$: acc=0.163, no improvement with 200 epochs. $M=10$, $C=10000$, $M!/C=363$: acc=0.952, converges in 58 epochs. The threshold is $M!/C \approx 8$.
$M=8$、$C=10000$、$M!/C=4$:acc=0.163,200 epoch 無改善。$M=10$、$C=10000$、$M!/C=363$:acc=0.952,58 epoch 收斂。閾值約 $M!/C \approx 8$。
F9E-drop is highest in the ordering-dominated regimeE-drop 在排序主導區最高
Capacity matrix shows: high E-drop only when both $M$ is large and $C$ is large (bottom-right of matrix). $M=10$, $C=10000$: E-drop=0.951. $M=12$, $C=10$: E-drop=0.025. Ordering is only necessary — and therefore used — under capacity pressure.
容量矩陣顯示:僅當 $M$ 與 $C$ 皆大時(矩陣右下)E-drop 才高。$M=10$、$C=10000$:E-drop=0.951。$M=12$、$C=10$:E-drop=0.025。排序僅在容量壓力下必要且被使用。
F10base must be below theta for LIF integration to matterbase 須低於 theta 才能使 LIF 積分有意義
Old setting (base=1.2 > θ=1.0): all neurons fire at $t=0$, ordering collapses. New setting (base=0.4 < θ=1.0): membrane integrates over multiple steps, timing differences emerge. Verified with sanity checks.
舊設定(base=1.2 > θ=1.0):所有神經元在 $t=0$ 發放,排序崩潰。新設定(base=0.4 < θ=1.0):膜電位跨多步積分,時間差異浮現。已通過健全性檢查驗證。

§10 Known Issues已知問題

  • Old dataset: base=1.2 > theta.舊資料集:base=1.2 > theta。 All neurons with cur > θ fire at $t=0$, destroying ordering. Fixed by setting base=0.40 < θ=1.0.cur > θ 的神經元皆在 $t=0$ 發放,破壞排序。已修正為 base=0.40 < θ=1.0。
    resolved已解決
  • M=8, delta=0.05: neurons 6 and 7 never fire.M=8、delta=0.05:神經元 6 和 7 永不發放。 Current for neuron 7 = 0.05 < $(1-\beta)\theta = 0.10$. With delta=0.05 and M=8, the weakest two neurons cannot accumulate enough potential. Fix: use delta < $(0.40 - 0.10)/7 \approx 0.043$ for M=8, or accept that these neurons always rank last.神經元 7 電流 = 0.05 < $(1-\beta)\theta = 0.10$。M=8、delta=0.05 時最弱兩神經元無法累積足夠電位。修正:M=8 時 delta < $(0.40 - 0.10)/7 \approx 0.043$,或接受其永遠排名最後。
    open待解
  • Simultaneous LIF₂ spikes (tie) cannot be resolved by W_rec.LIF₂ 同時尖峰(並列)無法由 W_rec 解決。 W_rec is one-step delayed: inhibition from a spike at $t^*$ only reaches other neurons at $t^*+1$. A tie at $t^*$ has already occurred before W_rec can act. Soft rank assigns $\hat{a}_i = \hat{a}_j = 0.5$ for tied neurons, which is the correct information-theoretic response but loses ordering detail between them.W_rec 延遲一步:$t^*$ 的抑制要到 $t^*+1$ 才影響其他神經元。$t^*$ 的並列在 W_rec 作用前已發生。軟排序對並列神經元設 $\hat{a}_i = \hat{a}_j = 0.5$,資訊論上正確但失去兩者間排序細節。
    open待解
  • DirectCurrentDataset allows NN shortcut.DirectCurrentDataset 允許 NN 捷徑。 The current vector itself encodes class identity (each class maps to a fixed current pattern). NN achieves acc ≥ SNN across most of the capacity matrix. E-drop confirms SNN uses ordering, but a fair SNN vs NN comparison requires a dataset where the current vector alone is uninformative. This is a Phase 2 problem.電流向量本身編碼類別(每類對應固定電流模式)。NN 在容量矩陣多數區域 acc ≥ SNN。E-drop 確認 SNN 使用排序,但公平比較需電流向量本身無資訊的資料集。屬第二階段問題。
    phase 2第二階段
  • Cayley map is square-only.Cayley 映射僅限方陣。 $\mathbf{W} \in \mathbb{R}^{M\times M}$. Cannot map from a higher-dimensional FFN output directly to $M$ neurons with an orthogonal layer. Requires FFN to first compress to exactly $M$ dimensions. Stiefel manifold parameterisation (semi-orthogonal tall matrix) is the generalisation but has not been implemented.$\mathbf{W} \in \mathbb{R}^{M\times M}$。無法從高維 FFN 輸出直接以正交層映射至 $M$ 神經元。須先壓縮至恰好 $M$ 維。Stiefel 流形參數化(半正交高矩陣)是推廣但尚未實作。
    open待解
  • PL readout does not generalise to unseen permutations.PL 讀出無法泛化至未見排列。 W_out learns weights per permutation index. Permutations not seen during training have uninformed weights. fn-unseen=0.670 shows the SNN part generalises, but the readout does not. Hierarchical readout or ordering embedding required.W_out 按排列索引學習權重。訓練未見排列的權重無資訊。fn-unseen=0.670 顯示 SNN 部分可泛化,讀出不行。需階層讀出或排序嵌入。
    open待解
  • tau=1.0 worsens OPS-drop vs baseline.tau=1.0 使 OPS-drop 較基線惡化。 Large tau makes soft rank output uniformly ~0.5, reducing both ordinal and magnitude information. This weakens the PL signal more than it helps OPS invariance, causing acc and E-drop to degrade slightly.大 tau 使軟排序輸出均約 0.5,削弱序數與幅度資訊。對 PL 信號的削弱大於 OPS 不變性收益,acc 與 E-drop 略降。
    resolved — use tau=0.01已解決 — 使用 tau=0.01

§11 Open Questions待解問題

  • Q1 Tie-breaking.並列消解。 What is the most principled way to resolve simultaneous LIF₂ spikes without introducing a non-ordering side channel? Options: per-neuron threshold offsets, Stiefel pre-layer, or accepting ties as valid partial orderings.如何在不引入非排序旁路下最合理地解決 LIF₂ 同時尖峰?選項:每神經元閾值偏移、Stiefel 前置層,或接受並列為有效偏序。 Open待解
  • Q2 Delta constraint for M=8.M=8 的 Delta 約束。 What is the optimal $(base, \Delta, T)$ triple for $M=8$ that guarantees all 8 neurons fire within $T$ while maximising first-spike time separation?$M=8$ 時最優 $(base, \Delta, T)$ 三元組為何,能保證 8 神經元在 $T$ 內發放且最大化首次尖峰時間分離? Open待解
  • Q3 Hierarchical readout.階層讀出。 Can replacing the flat PL readout with a hierarchical one (reading first-fire neuron, then second, etc.) allow generalisation to unseen permutations? fn-unseen=0.670 suggests the SNN is ready.以階層讀出(先讀首次發放神經元,再讀第二次…)取代扁平 PL 讀出,能否泛化至未見排列?fn-unseen=0.670 顯示 SNN 已就緒。 Partial部分
  • Q4 Ordering embedding.排序嵌入。 Can a learned $\pi\mapsto\mathbf{e}_\pi\in\mathbb{R}^d$ encode geometric structure so similar permutations are close, enabling $\mathbf{z}=\sum_\pi p_\pi\mathbf{e}_\pi$ to carry neighbourhood information?學習的 $\pi\mapsto\mathbf{e}_\pi\in\mathbb{R}^d$ 能否編碼幾何結構使相似排列相近,讓 $\mathbf{z}=\sum_\pi p_\pi\mathbf{e}_\pi$ 攜帶鄰域資訊? Open待解
  • Q5 Stiefel manifold layer.Stiefel 流形層。 Can a semi-orthogonal tall matrix $\mathbf{W}\in\mathbb{R}^{M\times N}$ ($N>M$, $\mathbf{W}\mathbf{W}^\top=I_M$) allow an FFN encoder of arbitrary width to feed the SNN without scaling currents?半正交高矩陣 $\mathbf{W}\in\mathbb{R}^{M\times N}$($N>M$,$\mathbf{W}\mathbf{W}^\top=I_M$)能否讓任意寬度 FFN 餵入 SNN 而不縮放電流? Open待解
  • Q6 Optimal w and tau.最優 w 與 tau。 Theoretical relationship between $w$, $T$, and gradient magnitude reaching early vs late timesteps. Is there a principled choice of $\tau$ given $M$ and $\Delta$?$w$、$T$ 與梯度到達早/晚時間步幅度的理論關係。給定 $M$ 與 $\Delta$ 是否有原則性的 $\tau$ 選擇? Open待解
  • Q7 Ordering autoencoder.排序自編碼器。 If a decoder can reconstruct input $\mathbf{x}$ from ordering alone (no class supervision), it proves ordering carries the full input signal. Strongest possible POC evidence.若解碼器僅從排序(無類別監督)重建輸入 $\mathbf{x}$,即證明排序攜帶完整輸入信號。最強 POC 證據。 Open待解
  • Q8 W_rec with pre-spike inhibition.具發放前抑制的 W_rec。 Is there a recurrent architecture that can break ties (act before $t^*$) without enabling timing pattern shortcuts? Options: inhibitory current injection before threshold crossing, or lateral inhibition through membrane potential directly.是否存在能在 $t^*$ 前打破並列、又不啟用時間模式捷徑的循環架構?選項:閾值前抑制電流注入,或經膜電位的側向抑制。 Open待解

§12 Phase 2 Directions第二階段方向

Dataset redesign資料集重新設計

DirectCurrentDataset encodes class identity directly in the current vector. A fair comparison between SNN ordering and NN requires a dataset where the instantaneous current vector alone is uninformative — only the temporal dynamics carry class information.

DirectCurrentDataset 在電流向量中直接編碼類別。SNN 排序與 NN 的公平比較需要瞬時電流向量本身無資訊的資料集——僅時間動力學攜帶類別資訊。

Candidate designs:

候選設計:

  • Time-series classification: class information spread across a sequence, not a single snapshot時間序列分類:類別資訊分布於序列而非單一快照
  • Synthetic temporal XOR: correct class depends on the order of events, not their magnitudes合成時間 XOR:正確類別取決於事件順序而非幅度
  • Matched current distributions: all classes draw currents from the same distribution; only the assignment to neuron indices differs (but this still leaks via neuron identity)匹配電流分布:各類從同分布抽樣電流,僅神經元索引分配不同(但仍可能經神經元身份洩漏)
  • UCR benchmark: real-world time-series where NN must process the full sequenceUCR 基準:NN 須處理完整序列的真實時間序列

Architecture extensions架構擴展

  • Stiefel manifold pre-layer: $\mathbf{W}\in\mathbb{R}^{M\times N}$ semi-orthogonal, allows arbitrary-width FFN encoderStiefel 流形前置層:$\mathbf{W}\in\mathbb{R}^{M\times N}$ 半正交,允許任意寬度 FFN 編碼器
  • Hierarchical readout: read ordering level-by-level, enabling unseen-permutation generalisation階層讀出:逐層讀取排序,支援未見排列泛化
  • Ordering embedding: learnable $\pi\mapsto\mathbf{e}_\pi$ with geometric structure排序嵌入:可學習的 $\pi\mapsto\mathbf{e}_\pi$ 具幾何結構
  • Pre-spike inhibition: W_rec variant that acts on membrane potential before threshold crossing發放前抑制:在閾值前作用於膜電位的 W_rec 變體

Scale規模

  • Real benchmark validation: UCR time-series, genomics (DeepSEA), or event-based vision真實基準驗證:UCR 時間序列、基因組學(DeepSEA)或事件驅動視覺
  • Comparison against rate-based SNN and equivalent-parameter MLP under matched dataset conditions在匹配資料集條件下與 rate-based SNN 及等參數 MLP 比較
  • M=16+ with better MC approximation (K=2048+) once dataset issues are resolved資料集問題解決後以更好 MC 近似(K=2048+)擴展至 M=16+