Re: DebGPT: how LLM can help debian development? demo available.

To: debian-devel@lists.debian.org
Subject: Re: DebGPT: how LLM can help debian development? demo available.
From: Mo Zhou <lumin@debian.org>
Date: Wed, 3 Jan 2024 20:07:10 -0500
Message-id: <[🔎] 1202bf97-9755-405c-b174-db13a6e72bf3@debian.org>
In-reply-to: <[🔎] 8e684936c9b419c8e5072b6543ee3b2e700ede40.camel@debian.org>
References: <[🔎] 8e684936c9b419c8e5072b6543ee3b2e700ede40.camel@debian.org>

I have implemented the OpenAI API frontend, with streaming to terminal
enabled. Just export your OPENAI_API_KEY to environment if you have one,
and specify `-F openai` in the debgpt command line. It work work without
the self-hosted LLM inference backend.

That means the command `debgpt none -i -F openai` falls back to a general
terminal client to ChatGPT. I believe adding this frontend allows more
people to try LLMs with debian-specific tasks conveniently.

Since the openai frontend allows people to use this tool without pytorch
and a bunch of deep learning frameworks, I plan to upload debgpt to
experimental/non-free, following python-openai (unstable/non-free).
Packages in debian archive should be enough to run the openai frontend,
but not for the self-hosted LLM backend.

The following are my replies to previous comments:



On 1/2/24 17:49, Jeremy Stanley wrote:

but one of the most
useful ways I've seen LLM leveraged is large open source projects
feeding in their corpus of complex documentation, and then providing
users with a human language interaction prompt where they can ask
questions and get conversational responses based on that
documentation. A continuously trained LLM backing a sort of "search"
function for all of www.d.o/doc (and maybe also wiki.d.o) could be
cool.

Yes. So the `debgpt policy ...` and `debgpt devref` are two examples on
letting LLM read long documents and answer questions. The problem is that
the full document is too long, while the supported context length is
just 4k ~ 16k tokens for openai api, or 8k tokens for self-hosted mistral7b.

Through merely prompt engineering, of course we cannot feed the whole policy
document to the context. Because solely section 4.9.1 is already overlength
against the typical chatgpt model gpt-3.5-turbo.

That's why the interface is designed to feed a specific section of long
document. That said, with more works, I think it should be possible to
feed the LLM the table of contents first, and let it choose a section it
wants to refer based on your question.



On 1/3/24 02:58, Andrius Merkys wrote:

I find this pretty impressive. Thanks a lot for working on it.

Thanks. I had fun experimenting with this.

To me the most time consuming task in Debian recently is the Pythontransitions. I wonder whether DebGPT could help with them. Maybe thereare other, non-Debian-specific GPTs for this task, but I would prefera Debian one.

This is in fact not Debian specific. Essentially the current status of this
project is almost a prompt generator, automatically gathering information

about the task you specified in the command line, and send all theinformationto LLM. It is not different from using the web-based ChatGPT, by copyand pasting

the same information before asking the question to LLM.

But if the requirement is relatively loose -- it can be seen as Debianspecific.Particularly for ChatGPT, there is a debian-specific system prompt indebgpt/frontend.py,

which asks ChatGPT to play the role of a Debian developer when responding.



On 1/3/24 04:33, Andrius Merkys wrote:

Mostly failing tests, and mostly due to API changes between subsequentPython 3.x versions.

My idea is to extract the failure from the BTS, and append the "breakingchanges"section of the cpython changelog, and see what suggestion the LLM canprovide.I do not expect perfect bug fixing suggestions but it should be able toconclude

something. The corresponding cli may look like this:

$ debgpt bts --id BUG_NUMBER --pychange 3.11-3.12 free -i # to beimplemented


Or possily pull the list of recent github upstream issues list, and let LLM
figure out which upstream bug or pull request is most relevant.

$ debgpt bts --id BUG_NUMBER --upstream_issues free -i  # to be implemented

The upstream issues webpage can be found in the Control/metadata file,and the source packagename can be typically found in the bugs page. This automatic processshould be able to

save people some time.



On 1/3/24 07:06, Andrey Rakhmatullin wrote:

So the solution is either find a patch in the upstream repo (committed or
proposed in issues/PRs) or write one yourself. Not sure what can AI help
here with.

We can ask LLMs to suggest a fix to the bug. Or let LLM to check whetherthe recent upstreamissues list contains a title which might be relevant to the content ofthe debian bug report.

For instance, the LLM can tell `Syntax error: ...., print "helloworld"` is most relevantto an upstream pull request named, e.g. `[pull-request] initial python3support`, and directlygive you the pull request link. (you may have to retry a couple of timesto make it think

correctly though).

I made up a test sample about this at:
  examples/e4d7fc9f-7469-4cad-959d-373b89498663-initmsg.txt

Complicated inference tasks are what we hope LLMs can do very welleventually.Grasping the semantics of natural language is something traditionalsoftware can

never do.



On 1/3/24 03:09, PICCA Frederic-Emmanuel wrote:

Is it planed to package transformers in Debian instead of using conda/mamba venv for this installation ?

transformers has nearly 100 build dependencies, and the pytorchdepenency tree

is already big enough. it is planned. I filed the ITP long time ago.

That said, if somebody is willing to use commercial API, we should beable to

install debgpt from non-free soon. (see beginning of this email).

* It would be great to help with the Debian patch workflow.
   - upstream status

Do you mean uscan? Or letting LLM read the home page to see whether theproject

is still maintained?

   - find upstream bug equivalent to a Debian bug report.

This is a very good proposal. I'll implement this. See my response toAndrius.

   - prepare bug report for upstream.

LLMs are terribly good at generating boiler plate texts, such asbusiness email

templates, etc. But the core part, like extracting the core error message
from buildd logs, are still done by human.

But if you are just tired of formatting the text, fixing grammar errors,
or polishing the language, LLMs are already terribly perfect at it.

   - propose improved patch description.

This is already implemented, and you can ask LLM to describe any file ofyour interest.


For example, I can let debgpt to explain its own code file:

  $ debgpt file -f debgpt/llm.py free -i

  me> what does this file do?

LLM> This file is a Python script that implements a conversationalchatbot based on a language model. The chatbot is designed to generateresponses based on user inputs. The specific language model used in this script isMistral7B, although there is also support for Mixtral8x7B. The scriptallows users to interact with the chatbot either through a command-lineinterface or through an IPython session. The script also saves the chat session logto a file for

        later reference.
  (the backend is openai gpt-3.5-turbo-16k)

As the author of the code, I'd say the summary is perfect. It will workfor patches.

* doing request on codesearch.net

What does a DD want when they go to codesearch.net?

This is not concrete enough to translate into something LLM can wellunderstood.




On 1/2/24 17:07, M. Zhou wrote:

Following what has been discussed in d-project in an offtopic
subthread, I prepared some demo on imagined use cases to
leverage LLMs to help debian development.
https://salsa.debian.org/deeplearning-team/debgpt

Reply to:

Follow-Ups:
- Re: DebGPT: how LLM can help debian development? demo available.
  - From: Mo Zhou <lumin@debian.org>

References:
- DebGPT: how LLM can help debian development? demo available.
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: /usr-move: Do we support upgrades without apt?
Next by Date: Re: Running Lintian against a debian/ directory?
Previous by thread: Re: DebGPT: how LLM can help debian development? demo available.
Next by thread: Re: DebGPT: how LLM can help debian development? demo available.
Index(es):
- Date
- Thread