(...)
So, very precisely speaking, modification of a LLM does *not* require the
original training data. Recreating a LLM does. Also developing a new LLM
with different training methods or training conditions does need some
training data (ideally the original training data, especially to compare
end performance). But all in all a developer on a Desert Island would be
better off with a "binary" model to be modified than without it.
Say for example that an IDE saves its configuration state not in a common
text file, but as a binary memory dump. Say the maintainer of such a
package would use their experience of the IDE and years of development to
go through the GUI of this software to assemble a great setup configuration
that is great for anyone starting to use the IDE and also has clues left
around it how to tailor it further for your needs. This configuration (as a
binary memory dump of the software state) is then distributed to the users
as the default configuration. What is "the source" of it? Isn't this binary
(that the GUI can both read and write) not the preferred form for
modification? The maintainer can describe how he created the GUI state
(document the training process), but not really include all his relevant
experience (training data) that led him to believe that this state is the
best for the new users. So what is LLama if not a **very** complex nvim
configfile focused on autocomplete? :D Quite a few of those questions also
apply to fonts (IMO).