Ben Weinstein-Raun

ben@superstimul.us

Hello! I'm Ben. I like to make stuff and not go extinct. I'm the admin of the superstimul.us Friendica instance.

Network Posts

Berkeley, California, USA

https://benwr.net

Ben Weinstein-Raun

7 months ago •

Ben Weinstein-Raun
7 months ago •

Idle thought: I wonder if we'll start seeing "training@home" training runs for open-source LLMs. Anyone care to run some numbers or sanity checks on whether this is possible in principle?

The folding@home project has been hugely successful, reaching at least exaFLOPS of compute.

"Training@home" would have to efficiently do partial gradient updates on extremely heterogeneous hardware with widely varying network properties; I'm not sure if this has any chance of producing base models competitive with e.g. Llama. In terms of ops alone, a 1 exaFLOPS network would have taken 10^7 seconds = ~half a year to train Llama 70b, and I imagine the costs of distributing jobs to such a network and coordinating on weight updates would make this much more expensive. So, probably not going to be competitive?

JP Addison likes this.

in reply to Ben Weinstein-Raun

Kevin Gibbons

in reply to Ben Weinstein-Raun • 7 months ago •

Just this month there was a proof of concept doing distributed training of a 15B parameter model using a new technique to reduce the amount of data that needs to be shared between GPUs, so that it's actually feasible for them to not be co-located. Which is neat! Buuuut they still were using H100s (80GB of memory) as their basic unit of compute. I don't think their technique lets you train models larger than would fit in memory on each GPU, which means any training@home project is going to be limited to single- or low-double-digit billions of parameters. Small models are neat and serve some purposes but we already have a lot of pretty good ones (Llama, Phi, Gemma, NeMo, etc) and it's not clear what the niche would be for a community-trained one. (I mean, porn, I guess, but there's already a lot of NSFW fine-tunes of those models.)

Ben Weinstein-Raun likes this.

in reply to Kevin Gibbons

Ben Weinstein-Raun

in reply to Kevin Gibbons • 7 months ago •

I would guess that there will be reasons to at least want an LLM trained on an open corpus, whether it's community-trained or not.

Example reasons include ensuring that the model isn't secretly trying to get you to buy McDonalds, and the possibility that companies start releasing un-fine-tunable models.

⇧