Scaling Deep Learning

Enabling 128-GPU workflows in the cloud and optimizing model latency for production.

Detailed breakdown of how we optimized CUDA kernels and used Ray to scale up deep learning workloads.