[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Preliminary launch of ci.rocm.debian.net



This is super awesome! Thank you for working on this!
I think this work will also benefit Debian's future support for
intel and nvidia GPUs.

I'm thrilled to add pytorch-rocm once I finish the pytorch 2.0
migration. I'll help with the remaining rocm dependencies then.

:-)

On Wed, 2023-08-02 at 00:23 +0200, Christian Kastner wrote:
> Hi list,
> 
> I'm happy to announce that ci.rocm.debian.net [1] is up and running.
> 
> Currently, it has one worker, ckk01 which is my box with an RX 6800
> XT
> (gfx1030). I've explained how this works in my previous email [2].
> You
> can see an example result for rocrand here [3].
> 
> This is more of an alpha release. While performing end-to-end tests
> over
> the last two days, I noticed the following issues that still need to
> be
> addressed:
> 
>   (1) There's no automatic scheduling yet. Jobs can only be submitted
>       manually. This will be addressed very soon.
> 
>   (2) There seems to be a RabbitMQ issue in bookworm; both readers
> and
>       writers seem to block occasionally.
> 
>   (3) I've underestimated the resources requirements. Last Sunday,
>       I naively scheduled jobs for unstable+testing for all packages
>       with autopkgtests. Well, some tests run for hours, they even
>       timed out (3h). I need to increase the limit.
> 
>   (4) I've not looked into getting tests done in experimental, which
>       debci treats as unstable + an extra APT source.
> 
>   (5) I've deliberately postponed stats (munin) and self-service
> (API)
>       to a later point in time.
> 
> I expect to address (1) to (4) sometime this week. This might lead to
> occasional outages while I debug stuff. This is also the reason why
> I've
> schedule so few jobs, as they block my host.
> 
> (3) is going to be interesting mid-term, I honestly did not expect
> that,
> even though it's obvious in hindsight. An update of e.g. rocm-hipamd
> or
> any other common dependency will trigger a few days' worth of tests.
> But
> I'm confident we can optimize this.
> 
> Best,
> Christian
> 
> [1] https://ci.rocm.debian.net
> [2] https://lists.debian.org/debian-ai/2023/07/msg00162.html
> [3] https://ci.rocm.debian.net/packages/r/rocrand/
> 


Reply to: