Skip to main content


Idle thought: I wonder if we'll start seeing "training@home" training runs for open-source LLMs. Anyone care to run some numbers or sanity checks on whether this is possible in principle?

The folding@home project has been hugely successful, reaching at least exaFLOPS of compute.

"Training@home" would have to efficiently do partial gradient updates on extremely heterogeneous hardware with widely varying network properties; I'm not sure if this has any chance of producing base models competitive with e.g. Llama. In terms of ops alone, a 1 exaFLOPS network would have taken 10^7 seconds = ~half a year to train Llama 70b, and I imagine the costs of distributing jobs to such a network and coordinating on weight updates would make this much more expensive. So, probably not going to be competitive?

in reply to Ben Weinstein-Raun

Just this month there was a proof of concept doing distributed training of a 15B parameter model using a new technique to reduce the amount of data that needs to be shared between GPUs, so that it's actually feasible for them to not be co-located. Which is neat! Buuuut they still were using H100s (80GB of memory) as their basic unit of compute. I don't think their technique lets you train models larger than would fit in memory on each GPU, which means any training@home project is going to be limited to single- or low-double-digit billions of parameters. Small models are neat and serve some purposes but we already have a lot of pretty good ones (Llama, Phi, Gemma, NeMo, etc) and it's not clear what the niche would be for a community-trained one. (I mean, porn, I guess, but there's already a lot of NSFW fine-tunes of those models.)
in reply to Kevin Gibbons

I would guess that there will be reasons to at least want an LLM trained on an open corpus, whether it's community-trained or not.

Example reasons include ensuring that the model isn't secretly trying to get you to buy McDonalds, and the possibility that companies start releasing un-fine-tunable models.