On Sun, 2020-09-06 at 12:22 +0800, Paul Wise wrote: > On Thu, 2020-09-03 at 14:12 -0400, Jeremiah C. Foster wrote: > > > I would like to add that I've recently learned that the Derivatives > > Census can help determine programmatically the delta between Debian > > and > > a Derivative (if things are correctly configured.) For a > > distribution > > such as ours which aims for binary compatibility and wants to stay > > as > > close to Debian as possible, this is extremely valuable. > > I think you are referring to the patch generation? > > https://wiki.debian.org/Derivatives/Integration#Patches > > The size of the metadata about the patches is what is causing the > memory issues. > > The patch generation itself currently can only be run on the Debian > servers at LeaseWeb because it relies on access to the snapshot.d.o > database and hash based filesystem. There is a TODO item about > porting > it to the snapshot.d.o API instead so that derivatives who have > private > apt repositories can also run it locally. This sounds very useful - how can I follow along on the discussion? Is there a separate email list for this topic? > > > I feel that is our responsibility to contribute back to Debian > > (which > > we try to do) everything we can and I think that contributing time > > and > > effort is the least we can do. > > Excellent, please take a look at the census codebase and the wiki > pages > I have linked to and run the codebase locally to see how it works. Will do! > > The Debian package tracker will be of particular interest to me > > because > > of the ability to understand the delta from Debian to a derivative. > > I'm > > more than happy to contribute in any way I can and will review > > those > > URLs to find some low-hanging fruit to get me started. > > The main work needed on the package tracker is to replace the Ubuntu > panel with a patches panel that links to available patches in various > places including from the derivatives census. > > https://bugs.debian.org/779400 Super useful, I'll review to see where I can participate. > > Is there are preferred channel for communication? > > Is the mailing list preferred over IRC? > > This thread and the debian-derivatives mailing list and IRC channel > are > good places to discuss the census and I'll respond in either of them. Great, thanks. > > Regarding RAM and CPUs, I have a VM running Bullseye at Linode > > which we > > can use for Gitlab runners or the like. Perhaps this will be of > > use? > > The RAM issue is mainly caused by part of the service not being > written > in a scalable way, since it just loads giant YAML files into memory. > Throwing more RAM at the problem or making the memory storage more > efficient would be the wrong approach, since eventually the patch > metadata in YAML files will exceed the available RAM. A database > would > be a better way to do it. So we need changes to the codebase to store > the data in a database instead plus a script to stream the YAML into > the database without loading it all into RAM. A couple of links I > gathered on the problem. > > https://habr.com/en/post/458518/ > https://news.ycombinator.com/item?id=20401055 > https://stackoverflow.com/questions/429162/how-to-process-a-yaml-stream-in-python I'll review those links to find out more and see if I'm able to contribute there. Thanks again, Jeremiah
Attachment:
signature.asc
Description: This is a digitally signed message part