[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#963887: UDD: 'duck' importer broken since 2020-05-25



On Mon, 29 Jun 2020, Baptiste BEAUPLAT wrote:
> > Indeed, creating a dedicated service for this does not seem a good idea.
> 
> I would love to have this feature integrated directly with
> distro-tracker. However, I'm wondering about the load that would case
> for the service.

Network request do not generate much "load", such processes spend the bulk
of their time waiting on the network.

> The duck worker has to process around 460000 urls (only counting
> Homepage) in less than 24h.

How do you get to that figure? We don't have that many source package
and even if you consider multiple URL for each source package due to
changes over time (in multiple releases), that makes way too many URLs
per source package.

> I'm not sure that can done properly using
> the distro-tracker tasks (parallel workers are needed to work around
> timeout). Obviously that can be optimized (different check delay for
> different results) but that's still bulk network related tasks.

Nothing forbids parallel workers and in any case, I welcome any
improvement to the task mechanism to make that kind of parallelism easier
to handle.

There are other tasks that could benefit from this (and in general I want
to merge more of such features in distro-tracker to make them available to
derivatives too).

Cheers,
-- 
  ⢀⣴⠾⠻⢶⣦⠀   Raphaël Hertzog <hertzog@debian.org>
  ⣾⠁⢠⠒⠀⣿⡁
  ⢿⡄⠘⠷⠚⠋    The Debian Handbook: https://debian-handbook.info/get/
  ⠈⠳⣄⠀⠀⠀⠀   Debian Long Term Support: https://deb.li/LTS

Attachment: signature.asc
Description: PGP signature


Reply to: