Silo: private training at <10% performance cost
Setup (model load & transfer onto GPU; NCCL init; LoRA injection)
Training