tta.wtf
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 18 days ago

Demystifying ARM SME to Optimize General Matrix Multiplications

arxiv.org

external-link
message-square
0
link
fedilink
1
external-link

Demystifying ARM SME to Optimize General Matrix Multiplications

arxiv.org

RSS Bot@lemmy.bestiver.seMB to Hacker News@lemmy.bestiver.seEnglish · 18 days ago
message-square
0
link
fedilink
General Matrix Multiplication (GEMM) is a critical kernel in high-performance computing and deep learning. While modern architectures like ARM's Scalable Matrix Extension (SME) introduce dedicated hardware for matrix operations, existing linear algebra libraries fail to fully exploit its potential, particularly for large matrices. This paper presents MpGEMM, an open-source library that leverages key architectural features of SME to optimize GEMM across multiple precisions. Through a systematic characterization of SME, we derive optimization guidelines that inform our design. MpGEMM employs cache-aware partitioning, efficient data packing with on-the-fly transposition, and specialized micro-kernels that utilize multi-vector loads and all available tile registers. Evaluated on an Apple M4 Pro with real-world workloads from DeepSeek and LLaMA, MpGEMM achieves an average speedup of 1.23x over the vendor-optimized Apple Accelerate library and significantly outperforms other open-source alternatives.

Comments

alert-triangle
You must log in or # to comment.

Hacker News@lemmy.bestiver.se

hackernews@lemmy.bestiver.se

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 0 users / day
  • 0 users / week
  • 0 users / month
  • 0 users / 6 months
  • 0 local subscribers
  • 3.71K subscribers
  • 454 Posts
  • 0 Comments
  • Modlog
  • mods:
  • patrick@lemmy.bestiver.se
  • RSS Bot@lemmy.bestiver.se
  • UI: unknown version
  • BE: 0.19.13
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org