[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Idea: Reducing GDPR risk via automated log and data minimization



Thank you for your reply and for sharing your perspective.

I would like to clarify one point, because I may not have expressed myself clearly.

My concern is not about having AI “read” or analyze personal data as such. I fully understand that this can itself create additional GDPR and ethical risks. The point I was trying to raise comes more from an organizational angle.

Given that there are currently no dedicated people in a GDPR-focused role, my worry is that privacy-related work may end up being purely reactive, with someone having to act as a “firefighter” on top of their main responsibilities. I was thinking about whether there could be more proactive approaches to data minimization, so that fewer problematic records exist in the first place.

I am not claiming that my idea is the right solution, nor that Debian should use AI for this. I only wanted to express a concern about privacy, which I consider a very important value in Debian, and to share a possible angle for discussion.

I also noticed that there is a debian-ai mailing list, and since I am new to Debian mailing lists, it is possible that this was not the most appropriate list to bring up this idea. If so, I apologize for the noise and appreciate the guidance.

Thank you for taking the time to reply.

Best regards,
pipo


El mié, 7 ene 2026 a las 14:11, Bart Martens (<bartm@debian.org>) escribió:
On Wed, Jan 07, 2026 at 01:33:55AM -0300, pedro vezzosi wrote:
> Hello,
>
> I would like to share a conceptual idea for discussion, not a concrete
> implementation proposal.
>
> One of the current challenges for large and long-lived projects like Debian
> is the accumulation of historical logs, archives, and public records that
> may contain personal data (IPs, emails, names), especially for oldstable
> and EOL releases.
>
> My idea is a layered approach to data minimization:
>
>    1.
>
>    Strict retention periods for raw logs (for example 30–90 days).
>    2.
>
>    Automatic sanitization and anonymization of historical public records.
>    3.
>
>    Use of an AI-assisted classification step (human-in-the-loop), where:

I would rather make that: "protect personal data from artificial intelligence",
so the opposite of AI-assisted classification of personal data. Frankly, we
should start erasing personal data before we no longer can.

>    -
>
>       Clear personal data is anonymized automatically.
>       -
>
>       Ambiguous cases are isolated for human review.
>       4.
>
>    Preservation of technical knowledge via summarized, signed incident
>    records, instead of keeping large volumes of raw personal data.
>
> The goal would be to reduce GDPR exposure while keeping technical value,
> without rewriting history or removing useful information.
>
> I am not proposing to implement this myself, only offering an idea that
> could be discussed or explored in the future.
>
> Thank you for your time.
>
> Best regards,
> pipo

--

Reply to: