[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Ideas for a dh-privacy-helper



Le ven. 3 sept. 2021 à 01:03, Jonas Smedegaard <jonas@jones.dk> a écrit :
>
> Quoting Bastien Roucariès (2021-09-02 23:45:30)
> > Perl is an option I implemented the privacy breach test in perl. The
> > problem is I prefer to drop a debian/package.privacy.xslt file in the
> > package instead of asking maintainer to code the removal of privacy
> > problems...
> >
> > Generic one could be coded in perl, but for the end side I need
> > something like xslt2
>
> If you are asking how to sloppily parse HTML5 files from upstream source
> and XSLT2 files provided by package maintainers, then with perl you
> could use HTML::HTML5::Parser for the first and XML::Saxon::XSLT2 for
> the second.

Unfortunatly HTML::HTML5::Parser is RC buggy since 4 years due to a
bug for handling UTF-8 (#750946)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750946

Your suggestion will work fine but we need to get some solution for
this utf-8 problem...

Bastien






>
> > > I am sure Python/Ruby/PHP/Haskell/Scheme/Rust/etc. folks will argue
> > > that their pet language is the right for the task as well: I think
> > > it will help the conversation if you clarify what you are open to
> > > and what are constraints for you.
> > >
> > > E.g. do you mean that it *must* be JavaScript when you mention that?
> > > Or are you perhaps asking if someone else wants to take over the
> > > challenge from you, so it does not matter how it is done?
> >
> > No it must no be javascript, but using V8 or something like browser
> > internal in order to fail to get a dom tree in case of broken html
> > file, like a browser do. But may be I am overconcious
>
> If you are asking how to parse HTML5 files like a web browser, then with
> perl you could use Gtk3::WebKit2 for that.
>
>
>  - Jonas
>
> --
>  * Jonas Smedegaard - idealist & Internet-arkitekt
>  * Tlf.: +45 40843136  Website: http://dr.jones.dk/
>
>  [x] quote me freely  [ ] ask before reusing  [ ] keep private


Reply to: