Tokens Per Second Per Watt: A Useful Metric for Edge AI
A systems-level perspective on performance and energy efficiency
Over recent years, the NPU has emerged as a power-efficient solution for AI workloads.
In our previous discussion of NPUs (1, 2), we discussed the performance measurement du jour: TOPS. However, TOPS falls short because it doesn’t account for power consumption, which impacts battery performance.
For Small LLMs (SLMs) at the edge, performance efficiency t…