[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFH: Debian derivatives census



On Sun, 2020-09-06 at 12:22 +0800, Paul Wise wrote:
> On Thu, 2020-09-03 at 14:12 -0400, Jeremiah C. Foster wrote: 
> 
> > I would like to add that I've recently learned that the Derivatives
> > Census can help determine programmatically the delta between Debian
> > and
> > a Derivative (if things are correctly configured.) For a
> > distribution
> > such as ours which aims for binary compatibility and wants to stay
> > as
> > close to Debian as possible, this is extremely valuable. 
> 
> I think you are referring to the patch generation?
> 
> https://wiki.debian.org/Derivatives/Integration#Patches
> 
> The size of the metadata about the patches is what is causing the
> memory issues.
> 
> The patch generation itself currently can only be run on the Debian
> servers at LeaseWeb because it relies on access to the snapshot.d.o
> database and hash based filesystem. There is a TODO item about
> porting
> it to the snapshot.d.o API instead so that derivatives who have
> private
> apt repositories can also run it locally.

This sounds very useful -  how can I follow along on the discussion? Is
there a separate email list for this topic?

> 
> > I feel that is our responsibility to contribute back to Debian
> > (which
> > we try to do) everything we can and I think that contributing time
> > and
> > effort is the least we can do.
> 
> Excellent, please take a look at the census codebase and the wiki
> pages
> I have linked to and run the codebase locally to see how it works.

Will do!

> > The Debian package tracker will be of particular interest to me
> > because
> > of the ability to understand the delta from Debian to a derivative.
> > I'm
> > more than happy to contribute in any way I can and will review
> > those
> > URLs to find some low-hanging fruit to get me started.
> 
> The main work needed on the package tracker is to replace the Ubuntu
> panel with a patches panel that links to available patches in various
> places including from the derivatives census.
> 
> https://bugs.debian.org/779400

Super useful, I'll review to see where I can participate.

> > Is there are preferred channel for communication?
> > Is the mailing list preferred over IRC?
> 
> This thread and the debian-derivatives mailing list and IRC channel
> are
> good places to discuss the census and I'll respond in either of them.

Great, thanks.

> > Regarding RAM and CPUs, I have a VM running Bullseye at Linode
> > which we
> > can use for Gitlab runners or the like. Perhaps this will be of
> > use?
> 
> The RAM issue is mainly caused by part of the service not being
> written
> in a scalable way, since it just loads giant YAML files into memory.
> Throwing more RAM at the problem or making the memory storage more
> efficient would be the wrong approach, since eventually the patch
> metadata in YAML files will exceed the available RAM. A database
> would
> be a better way to do it. So we need changes to the codebase to store
> the data in a database instead plus a script to stream the YAML into
> the database without loading it all into RAM. A couple of links I
> gathered on the problem.
> 
> https://habr.com/en/post/458518/
> https://news.ycombinator.com/item?id=20401055
> https://stackoverflow.com/questions/429162/how-to-process-a-yaml-stream-in-python

I'll review those links to find out more and see if I'm able to
contribute there.

Thanks again,

Jeremiah

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: