Abstract: We propose an embedded Neural Processing Unit (NPU) architecture for deep learning inference to address the stringent energy, area, and cost requirements of edge AI. This heterogeneous ...