2
Window is Everything: A Grammar for Neural Operations
zenodo.orgThe operational primitives of deep learning, primarily matrix multiplication and convolution, existas a fragmented landscape of highly specialized tools. This paper introduces the Generalized WindowedOperation (GWO), a theoretical framework that unifies these operations by decomposing them into threeorthogonal components: Path, defining operational locality; Shape, defining geometric structure andunderlying symmetry assumptions; and Weight, defining feature importance.We elevate this framework to a predictive theory grounded in two fundamental principles. First, weintroduce the Principle of Structural Alignment, which posits that optimal generalization is achievedwhen the GWO’s (P, S, W) configuration mirrors the data’s intrinsic structure. Second, we show thatthis principle is a direct consequence of the Information Bottleneck (IB) principle. To formalizethis, we define an Operational Complexity metric based on Kolmogorov complexity. However, wemove beyond the simplistic view that lower complexity is always better. We argue that the nature ofthis complexity—whether it contributes to brute-force capacity or to adaptive regularization—isthe true determinant of generalization. Our theory predicts that a GWO whose complexity is utilized toadaptively align with data structure will achieve a superior generalization bound. Canonical operationsand their modern variants emerge as optimal solutions to the IB objective, and our experiments reveal thatthe quality, not just the quantity, of an operation’s complexity governs its performance. The GWO theorythus provides a grammar for creating neural operations and a principled pathway from data propertiesto generalizable architecture design.
You must log in or register to comment.