Lemmy: Bestiverse
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS BotMB to Hacker NewsEnglish · 1 hour ago

224× Compression of Llama-70B with Higher Accuracy (Paper and Code)

zenodo.org

external-link
message-square
0
fedilink
4
external-link

224× Compression of Llama-70B with Higher Accuracy (Paper and Code)

zenodo.org

RSS BotMB to Hacker NewsEnglish · 1 hour ago
message-square
0
fedilink
Post-Transformer Inference: 224× Compression of Llama-70B with Improved Accuracy
zenodo.org
external-link
This paper introduces the first verified method to eliminate transformers from inference while preserving, and in many cases improving, downstream accuracy. We show that a frozen 70-billion-parameter Llama-3.3-70B model can be replaced by a 256-dimensional meaning field extracted from seven internal activation layers. A lightweight compressor (AN1) reduces these fields by 224× with an average +1.81 percentage point gain across classification tasks, including +3.25 pp on low-resource RTE (R² = 0.98 inverse-scaling fit, p < 0.01). A 30M-parameter student then learns to regenerate these fields directly from raw text, enabling full transformer-free inference at 60× higher throughput with only 0.35 pp average accuracy loss. The core insight is that task-aligned semantics in modern transformers occupy a remarkably low-rank manifold. Across layers we observe 72–99 percent of variance in the top one to three dimensions. Once this structure is extracted and learned, the transformer becomes unnecessary. It serves as a one-time sculptor of meaning rather than the permanent home of inference. This work establishes Field Processing Units (FPUs) as a post-transformer compute primitive that replaces deep matrix multiplication with shallow field operations. All results are averaged over five seeds with statistical significance reported. Ablations isolate the causal contributions of field supervision, geometric regularization, and anchor-layer selection. This Zenodo release provides the complete scientific manuscript and the baseline reference implementation for the AN1 Core system. Proprietary optimizations (AN1-Turbo) have been removed to support independent verification and further research into post-transformer inference.

Comments

alert-triangle
You must log in or register to comment.

Hacker News

hackernews

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 911 users / day
  • 2.02K users / week
  • 4.01K users / month
  • 9.55K users / 6 months
  • 2 local subscribers
  • 3.21K subscribers
  • 37.8K Posts
  • 17.7K Comments
  • Modlog
  • mods:
  • patrick
  • RSS Bot
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org