Investment is done really to train models for ever more miniscule gains. I feel like the current choices are enough to satisfy who is interested in such services, and what really is lacking is now more hardware dedicated to single user sessions to improve quality of output with the current models.
But I really want to see more development on offline services, as right now it is really done only by hobbyists and only occasionally large companies with a little dripfeed (Facebook Llama, original Deepseek model [latter being pretty much useless as no one has the hardware to run it]).
I remember seeing the Samsung Galaxy Fold 7 (“the first AI phone”, unironic cit.) presentation and listening to them talking about all the AI features instead of the real phone capabilities. “All of this is offline, right? A powerful smartphone… makes sense to have local models for tasks.” but it later became abundantly clear it was just repackaged always-online Gemini for the entire presentation on $2000 of hardware.
what really is lacking is now more hardware dedicated to single user sessions to improve quality of output with the current models
That is the exact opposite of my opinion. They’re throwing tons of computing at the current models. It has produced little improvement. The vast majority of investment is in compute hardware, rather than R&D. They need more R&D to improve the underlying models. More hardware isn’t going to get the significant gains we need
The problem is there is little continuous cash flow for on prem personal services. Look at Samsung’s home automation, its nearly all online features and when the internet is out you are SOL.
To have your own Github Copilot in a device the size and power usage of a Raspberry Pi would be amazing. But then they won’t get subscriptions.
There is absolutely massive development on open weight models that can be used offline/privately. Minimax M2, most recent one, has comparable benchmark scores to the private US megatech models at 1/12th the cost, and at higher token throughput. Qwen, GLM, deepseek have comparable models to M2, and have smaller models more easily used on very modest hardware.
Closed megatech datacenter AI strategy is partnership with US government/military for oppressive control of humanity. Spending 12x more per token while empowering big tech/US empire to steal from and oppress you is not worth a small fraction in benchmark/quality improvement.
Investment is done really to train models for ever more miniscule gains. I feel like the current choices are enough to satisfy who is interested in such services, and what really is lacking is now more hardware dedicated to single user sessions to improve quality of output with the current models.
But I really want to see more development on offline services, as right now it is really done only by hobbyists and only occasionally large companies with a little dripfeed (Facebook Llama, original Deepseek model [latter being pretty much useless as no one has the hardware to run it]).
I remember seeing the Samsung Galaxy Fold 7 (“the first AI phone”, unironic cit.) presentation and listening to them talking about all the AI features instead of the real phone capabilities. “All of this is offline, right? A powerful smartphone… makes sense to have local models for tasks.” but it later became abundantly clear it was just repackaged always-online Gemini for the entire presentation on $2000 of hardware.
That is the exact opposite of my opinion. They’re throwing tons of computing at the current models. It has produced little improvement. The vast majority of investment is in compute hardware, rather than R&D. They need more R&D to improve the underlying models. More hardware isn’t going to get the significant gains we need
The problem is there is little continuous cash flow for on prem personal services. Look at Samsung’s home automation, its nearly all online features and when the internet is out you are SOL.
To have your own Github Copilot in a device the size and power usage of a Raspberry Pi would be amazing. But then they won’t get subscriptions.
There is absolutely massive development on open weight models that can be used offline/privately. Minimax M2, most recent one, has comparable benchmark scores to the private US megatech models at 1/12th the cost, and at higher token throughput. Qwen, GLM, deepseek have comparable models to M2, and have smaller models more easily used on very modest hardware.
Closed megatech datacenter AI strategy is partnership with US government/military for oppressive control of humanity. Spending 12x more per token while empowering big tech/US empire to steal from and oppress you is not worth a small fraction in benchmark/quality improvement.