The Thinking Toaster

Why? (Because We Can)
Let's get this out of the way: This is not practical.
For the price of 8 Raspberry Pi 5s (8GB), switches, and SD cards, you could buy a used GPU that is 50x faster.
But that's not the point. The point is learning how Distributed Inference works.
The Setup: "Distributed Llama"
We used a project called Distributed Llama. It uses tensor parallelism to split the model across devices over Ethernet.
- Head Node: Pi 5 (Orchestrator)
- Worker Nodes: 7x Pi 5s
- Network: Gigabit Switch (The bottleneck)
The Bottleneck is Physics
"Trying to run an LLM over Ethernet is like trying to drink a milkshake through a coffee stirrer."
The Results
We loaded Llama-3-8b-Instruct.
- Boot Time: 45 seconds (Loading weights over network).
- Inference Speed: ~14 tokens/second.
Wait. 14 t/s? That's... actually usable.
It turns out for small (8b) models, the Pi 5's memory bandwidth is just fast enough to be readable. It reads at the speed of human reading.
MEMORY_INTEGRITY_CHECK
Match the data pairs before your context window collapses.
Verdict
Don't build this to do work. Build this to understand how Google's TPU pods work. It is a miniature, slow, educational supercomputer.
And it looks really cool on a shelf.