[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Ideas for a dh-privacy-helper

Le jeudi 2 septembre 2021, 16:11:48 UTC Jonas Smedegaard a écrit :
> Quoting Bastien Roucariès (2021-09-02 17:53:18)
> > A few year ago I have created the privacy-breach lintian checks in
> > order to detect trackers in our doc
> > 
> > I think we are losing the battle here.
> > 
> > I believe that we need better tools than sed in order to fix this kind
> > of problem.
> > 
> > I have some idea like:
> > - read the html tree
> > - convert the html tree dom representation to xml serialization (so called
> > 
> >   XHTML5 or polyglot)
> > 
> > - apply to this xhtml5 xslt2 rules for fixing the privacy breach
> > 
> > The problem are the tools to use...
> > 
> > I will like to use javascript for this kind of transformation but
> > nodejs does not compile on armel, and for saxon-ce I need gwt that is
> > not in debian...
> > 
> > I could use saxon2,but it will need java.
> Perl is famous for its text juggling features, and sloppy parsing of
> html can be done e.g. with HTML::HTML5::Parser (i.e. Debian package
> libhtml-html5-parser-perl).
> Also, debhelper itself is written in perl, so is likely easier to
> integrate plugins written in perl as well.  If perl is an option at all,
> obviously...

Perl is an option I implemented the privacy breach test in perl. The problem 
is I prefer to drop a debian/package.privacy.xslt file in the package instead 
of asking maintainer to code the removal of privacy problems...

Generic one could be coded in perl, but for the end side I need something like 

> I am sure Python/Ruby/PHP/Haskell/Scheme/Rust/etc. folks will argue that
> their pet language is the right for the task as well: I think it will
> help the conversation if you clarify what you are open to and what are
> constraints for you.
> E.g. do you mean that it *must* be JavaScript when you mention that?  Or
> are you perhaps asking if someone else wants to take over the challenge
> from you, so it does not matter how it is done?

No it must no be javascript, but using V8 or something like browser internal 
in order to fail to get a dom tree in case of broken html file, like a browser 
do. But may be I am overconcious

>  - Jonas

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply to: