I recently saw an interesting Linkedin post from French startup ZML demonstrating vendor-agnostic Llama 2 inference. The demo leveraged pipeline parallelism to distribute the inference across hardware from Nvidia, AMD, and Google. Each chip was housed in a separate location, and the inference results were streamed back to the Mac that kicked off the job…
© 2024 Austin Lyons
Substack is the home for great culture