Lemmy: Bestiverse
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS BotMB to Hacker NewsEnglish · 4 hours ago

I unified convolution and attention into a single framework

zenodo.org

external-link
message-square
0
fedilink
2
external-link

I unified convolution and attention into a single framework

zenodo.org

RSS BotMB to Hacker NewsEnglish · 4 hours ago
message-square
0
fedilink
Window is Everything: A Grammar for Neural Operations
zenodo.org
external-link
The operational primitives of deep learning, primarily matrix multiplication and convolution, existas a fragmented landscape of highly specialized tools. This paper introduces the Generalized WindowedOperation (GWO), a theoretical framework that unifies these operations by decomposing them into threeorthogonal components: Path, defining operational locality; Shape, defining geometric structure andunderlying symmetry assumptions; and Weight, defining feature importance.We elevate this framework to a predictive theory grounded in two fundamental principles. First, weintroduce the Principle of Structural Alignment, which posits that optimal generalization is achievedwhen the GWO’s (P, S, W) configuration mirrors the data’s intrinsic structure. Second, we show thatthis principle is a direct consequence of the Information Bottleneck (IB) principle. To formalizethis, we define an Operational Complexity metric based on Kolmogorov complexity. However, wemove beyond the simplistic view that lower complexity is always better. We argue that the nature ofthis complexity—whether it contributes to brute-force capacity or to adaptive regularization—isthe true determinant of generalization. Our theory predicts that a GWO whose complexity is utilized toadaptively align with data structure will achieve a superior generalization bound. Canonical operationsand their modern variants emerge as optimal solutions to the IB objective, and our experiments reveal thatthe quality, not just the quantity, of an operation’s complexity governs its performance. The GWO theorythus provides a grammar for creating neural operations and a principled pathway from data propertiesto generalizable architecture design.

Comments

alert-triangle
You must log in or register to comment.

Hacker News

hackernews

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 227 users / day
  • 1.36K users / week
  • 3.34K users / month
  • 9.56K users / 6 months
  • 2 local subscribers
  • 2.58K subscribers
  • 30.9K Posts
  • 12.6K Comments
  • Modlog
  • mods:
  • patrick
  • RSS Bot
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org