Multi-Agent Social Behavior Understanding via Ego-GAT-SqueezeNet

Quantifying social behavior in laboratory animals is fundamental to neuroscience but remains hindered by manual annotation’s subjectivity. The Multi-Agent Behavior (MABe) challenge addresses this by benchmarking automated recognition from pose data, yet faces challenges like extreme class imbalance, complex topology, and cross-laboratory domain shifts.

In this work, we propose Ego-GAT-SqueezeNet, a unified framework for multi-agent behavior understanding. First, we introduce an egocentric alignment strategy to invariantize agent features against translation and rotation. Second, we employ a Graph Attention Network (GAT) to explicitly model the dynamic spatial topology. Crucially, we integrate a Squeezeformer backbone that leverages efficient downsampling to capture long-range dependencies in high-frequency sequences. For environmental heterogeneity, we utilize Feature-wise Linear Modulation (FiLM) to dynamically recalibrate features based on laboratory and subject identities. Our approach achieves an F1-score of 0.7702 on the validation set, outperforming baselines by identifying rare social actions across diverse experimental setups.

  

PastureNet - Cross-Domain Biomass Estimation

Accurate pasture biomass estimation is critical for precision grazing management yet remains challenged by the trade-off between the scalability of remote sensing and the reliability of manual sampling. To address this, we introduce PastureNet, a novel hierarchical ensemble framework that estimates biomass directly from high-resolution RGB images. Unlike traditional approaches, PastureNet synergizes diverse inductive biases by integrating three state-of-the-art Vision Transformers: DINOv3 (object-centric), SigLIP 2 (semantic-aligned), and EVA-02 (texture-sensitive). A key innovation is the integration of Zero-shot Semantic Concept Scores to inject explicit ecological domain knowledge (e.g., clover presence) into the regression pipeline, alongside a Matrix Reconciliation post-processing step that ensures biological consistency across biomass components. Evaluated on a heterogeneous Australian dataset, our method achieves a Weighted R2 of 0.70, significantly outperforming CNN baselines (0.47) and demonstrating robust generalization without requiring physical metadata at inference time.

  

从经典算法到深度学习,AdaBoost、PCA、稀疏编码、与粒子滤波的重生与进化

随着基于注意力机制的大模型面临数据、算力、电力的限制,与对模型可解释性、可控性、推理能力的更高要求,深度学习领域出现了显著的“回溯现象”:人们纷纷将目光投向了前深度学习时代的经典算法思想。如,OpenAI 在 2025 年 11 月发布了通过稀疏电路来理解神经网络的文章:通过稀疏电路来理解神经网络 | OpenAI;还有像清华大学孙茂松老师团队在 2025 年 12 月发布的论文H-Neurons:大语言模型中幻觉相关神经元的存在、作用及其起源,基于 L1 稀疏线性回归器 Lasso 研究的幻觉相关神经元在神经网络的分布。

本文旨在深入探讨AdaBoost、主成分分析(PCA)、稀疏编码和粒子滤波这四大经典算法的基本思想在 2025 年大模型时代的重生与进化。通过对近三年论文的梳理与分析,得出结论:这些经典算法在本质上与现代大模型的对齐(Alignment)、高效微调(PEFT)、可解释性(Interpretability)及复杂推理(Reasoning)殊途同归。AdaBoost 的间隔理论与误差修正思想不仅解释了深度学习中的“良性过拟合”现象,更通过贝叶斯奖励模型集成(BRME)解决了 RLHF 中的奖励黑客问题;PCA 的低秩假设与流形理论直接催生了 LoRA-XS 等高效微调方法及 KV Cache 压缩技术,并揭示了模型本质上的线性特征;稀疏编码的基向量分解思想通过稀疏自编码器(SAE)破解了神经元超级叠加的可解释性难题,并推动了 MoE 架构与 Sparse-Linear Attention (SLA) 的演进;而粒子滤波的序列状态估计思想则为思维链(CoT)推理提供了概率论框架,并赋予视频生成模型掌握处理不确定性的物理世界模拟能力。这些经典思想正成为大模型从 System 1 向 System 2 跃迁的关键基石。

流形假设下的架构再思考: JiT 扩散模型的改进与机理探究

摘要:Li 和 He 提出的 JiT (Just image Transformers) 架构基于流形假设,通过直接预测干净图像 (x-prediction),验证了简单的线性层配合 ViT 即可有效处理高维像素数据。然而, JiT 的极简线性 Patch Embedding 可能不足以充分捕捉自然图像高度卷曲的非线性流形结构。本文首先在 Embedding 层引入 SiLU 激活函数,构建非线性瓶颈以增强对低维流形嵌入的拟合能力。 进一步地,本文深入探讨了骨干网络 (Backbone) 中流形约束 (降维) 与计算容量 (升维) 的本质矛盾。通过将 Transformer Block 内部替换为瓶颈结构的对比实验,本文揭示了一个关键的精度-多样性权衡 (Precision-Recall Trade-off):显式的降维压缩虽然能有效过滤非流形噪声,从而显著提升生成图像的保 真度 (Precision) 与 FID 指标;但这种严苛的流形约束同时也限制了模型对高熵随机偏差的建模能力,导致生成样本的多样性 (Recall) 下降。 此外,针对 JiT 缺乏语义约束的问题,本文引入了时间 (time) 与旋转 (rotation) 预测的自监督辅助损失。在 ImageNet 256 × 256 数据集上的实验表明,非线性 Embedding 与自监督信号有效提升了 FID 指标,而 Block 层的瓶颈化实验则从反面论证了“计算容量”在扩散模型骨干网络中的必要性。

关键词: 计算机视觉;扩散模型; JiT;非线性流形;瓶颈结构;高维数据拟合;自监督学习

  

:D 一言句子获取中...

加载中,最新评论有1分钟缓存...