Data usage and energy cost while running it - this you would have under control when selfhosting
Training of the LLM: You have no control over how the LLM was trained - not on which data source was used, the forkforce that cleaned up the data or the amount of energy used to do so
Criticism on LLMs has two angles: