[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: email backend for fedmsg

On Wed, 25 Mar 2020 at 23:11, Peter Silva <peter@bsqt.homeip.net> wrote:
Most Sarrracenia stuff is tied to AMQP, but next-gen messages are called v03 (version 3) they use a JSON payload
for all the information, and that makes it somewhat protocol independent.  There is also a 500 line MQTT demo 
that implements a file replication network, using the same JSON messages, and primed from an AMQP upstream.
the peer code there is just a demonstration prototype, but it processes the messages the same way as real Sarracenia.

That code has been run against mosquitto and EMQT, and I think another broker, I forget... It worked without issues on all of them. MQTT interop is flawless afaict.   note: we were using v3.  Have not played with v5.

Sarracenia essentially defines a JSON payload for advertising that a file exists. That is a fairly popular problem, but if your problem isn´t that, then you should define a different payload.  It could be used for file replication, or orchestration/workload co-ordination, or other things in the IFTTT style... but in the end, this is just one application of a message bus, it doesn´t need to encompass all applications, but is a good way to get a useful thing implemented with it, so people see that it is useful.   I think applications need to define their messages, and trying to be too general makes them harder to understand and apply. 

Right, I think every application participating in communication through the bus could also provide a message schema (either json or yaml schema) on demand. I.e. it when a message is sent, it would also always include a reference to a particular schema and if the recipient of the message doesn't have this schema stored locally, it would send a message to the sender asking for it and the sender would send it back.

I am an upstream maintainer of fedmsg now and this is an option that I see to make fedmsg a viable solution for a linux distribution message bus. If somebody would like to cooperate on this from Debian community, it would be great, I think we could create an awesome thing together. Needless to say, I also have a day job and quite a long TODO queue but I could get to working on this if I can get somebody else interested.

If not, a confirmation from somebody from Debian community that this is interesting and they would think about using it if something like this existed would also help.

Best regards!

On Wed, Mar 25, 2020 at 5:57 PM clime <clime7@gmail.com> wrote:
<for what it is worth>
I work in telecom for meteorology, and we ended up with a general method for file copying (catchphrase: rsync on steroids*.) ( *every catchphrase is a distortion, no dis to rsync, but in certain cases we do work much faster, it just communicates the idea.) Sarracenia (https://github.com/MetPX/Sarracenia) is a GPL2 app (Python and C implementations) that use mozilla public license rabbitmq broker, as well as openssh and/or any web server to do fastish file synching, and/or processing/orchestration. The app is just json messages with file metadata sent through the broker. Then you daisy chain brokers through clients.  No centralization (every entity installs their own broker), No federated identity required (authentication is to each broker, but they can pass files/messages to each other.)
A firstish thing to do with it would be to sync the debian mirrors in real-time rather than periodically.  Each mirror has a broker, they get advertisements (AMQP messages containing JSON file metadata) download the corresponding file, and re-advertise (publish on the local broker with the local file URL) for downstream clients. You can then make a mesh of mirrors, where, if each mirror is subscribed to at least two others, then it can withstand the failure of any node.  If you add more connections, you increase redundancy.
Once you have that sort of anchor tenant for an AMQP message bus, people might want to use it to provide other forms of automation, but way quicker and in some ways much simpler than SMTP.  but yeah... SMTP is a lot more well-known/common. RabbitMQ is the industry dominant open solution for AMQP brokers. sounds like marketing bs, but if you look around it is what the vast majority are using, and there are thousands upon thousands of deployments. It's a much more viable starting point, for stability, and a lot less assembly required to get something going. Sarracenia makes it a bit easier again, but messages are kind of alien and different, so it takes a while to get used to them.
</for what it is worth>

Peter, I like the solution and for the mirrors it sounds great but I have a few nitpicks: 

- the file syncing part is makes a perfect sense for the debian mirrors but in general case you might only want to send a message and skip the file syncing part
- I am currently, personally more intrigued by even more standard technologies than RabbitMQ and I believe that a good solution might lie there

What I particularly like about Sarracenia is that it is decentralized because each host has its own broker - that I think is cool and I would like to potentially do something similar...


On Wed, 25 Mar 2020 at 01:07, clime <clime7@gmail.com> wrote:
On Wed, 25 Mar 2020 at 01:00, clime <clime7@gmail.com> wrote:
> On Tue, 24 Mar 2020 at 22:45, Nicolas Dandrimont <olasd@debian.org> wrote:
> >
> > On Tue, Mar 24, 2020, at 21:51, clime wrote:
> > > On Tue, 24 Mar 2020 at 20:40, Nicolas Dandrimont <olasd@debian.org> wrote:
> > > >
> > > > Hi!
> > > >
> > > > On Sun, Mar 22, 2020, at 13:06, clime wrote:
> > > > > Hello!
> > > > >
> > > > > Ad. https://lists.debian.org/debian-devel/2016/07/msg00377.html -
> > > > > fedmsg usage in Debian.
> > > > >
> > > > > There is a note: "it seems that people actually like parsing emails"
> > > >
> > > > This was just a way to say that fedmsg never got much of a user base in the services that run on Debian infra, and that even the new services introduced at the time kept parsing emails.
> > >
> > > Hello Nicolas!
> > >
> > > Do you remember some such service and how it used email parsing specifically?
> >
> > I believe that tracker.debian.org was introduced around that time.
> >
> > At the point it was created, tracker.d.o was mostly consuming emails from packages.debian.org to update its data. These days tracker.d.o has replaced packages.d.o as "email router", in that it receives all the mails from services (e.g. the BTS, the archive maintenance software, buildds, salsa webhooks, ...) and forwards them to the public.
> >
> > > I am still a bit unclear how email parsing is used in Debian
> > > infrastructure, don't get me wrong, I find it elegant
> >
> > Ha. I find that it's a big mess.
> >
> > Here's the set of headers of a message I received today from tracker.d.o, which are supposed to make parsing these emails better:
> >
> > X-PTS-Approved: yes
> > X-Distro-Tracker-Package: facter
> > X-Distro-Tracker-Keyword: derivatives
> > X-Remote-Delivered-To: dispatch@tracker.debian.org
> > X-Loop: dispatch@tracker.debian.org
> > X-Distro-Tracker-Keyword: derivatives
> > X-Distro-Tracker-Package: facter
> > List-Id: <facter.tracker.debian.org>
> > X-Debian: tracker.debian.org
> > X-Debian-Package: facter
> > X-PTS-Package: facter
> > X-PTS-Keyword: derivatives
> > Precedence: list
> > List-Unsubscribe: <mailto:control@tracker.debian.org?body=unsubscribe%20facter>
> >
> > I'll leave you to judge whether this makes sense or not.
> >
> > (and it turns out that the actual useful payload was just plaintext with no real chance of automated parsing)
> >
> > > but from what I have found (e.g. reportbug), in the beginning there is an
> > > email being sent by some human which will then trigger some automatic
> > > action (e.g. putting the bug into db). So it's like you could do all
> > > your work simply by sending emails (some of them machine-parsable).
> > >
> > > So do you have the opposite? I do some clicking action somewhere and
> > > it will send an email to a certain mailing list to inform human
> > > beings? Or let's not just clicking but e.g. `git push` (something that
> > > you can still do from command line).
> > >
> > > Do you have: I do some clicking action somewhere and it will send an
> > > email to a certain mailing list where the email is afterward parsed by
> > > another service which will do an action (e.g. launch a build) based on
> > > it?
> >
> > Both of these are somewhat true.
> >
> > Some examples of email-based behaviors:
> >  - Our bug tracking system is fully controlled by email.
> >  - Closing a bug in reaction to an upload is done by an email from the archive maintenance system (dak) to the bug tracking system.
> >  - Salsa has a webhook service that react to UI clicks (e.g. "clicking the merge button") by sending an email to the BTS (e.g. to tag bugs as pending), or to tracker.d.o (for new commit notifications).
> >  - Some of our IRC bots are triggered by procmail rules.
> >  - At some point mentors.debian.net depended on a NNTP gateway to the debian-devel-changes mailing list to trigger removal of superseded packages (...)
> >  - etc. etc.
> >
> > I'm still not sure where your trail of questions is going? fedmsg in Debian has been dead for years at this point, and there still doesn't seem to be much interest to implement anything beyond email parsing in some of our core systems.
> Cool, so basically what I am thinking about is to create a free
> software from what you are describing. I.e. create reusable tooling
> out of the Debian messaging system. Something that a new linux
> distribution can easily start using to connect their services.
> I didn't know Debian infra works like this but I find it very
> elegant/efficient and I would like the solution you have to be
> reusable by others.
> So basically the tooling should contain:
> - unified email message format
> - library that is able to translate a message to a language data
> structure (e.g. dictionary in python)
> - email receiver that would be listening for emails coming from the
> bus and emitting events based on that (this could be part of the
> library so you would be able to attach a callback for an incoming
> message or just do blocking waits)
> - email publisher - something that can send a new message into the
> bus, i.e. to a preconfigured mail server (a "broker" or "hub")
> - mail server that would have an http API to manage topic
> subscriptions  (i.e. add/delete me from a given topic) - it would
> receive a message from a publisher for a given topic, found out who is
> subscribed to it, and duplicated the email message for each consumer
> and send it to them
> For the mail server I am thinking about https://www.courier-mta.org/
> and using https://www.courier-mta.org/maildropgdbm.html for
> subscription management.
> Basically, this I thought could be a new "email backend" in fedmsg
> instead of zeromq one...
> I am not very familiar with email technology but I like the idea because:
> - if you do an email setup for people, you are going to already be
> technically skilled to do it for services or vice versa
> - one of communicating agents may be a human being that is watching
> what's going on in system by having dedicated inbox folders for each
> type of event (topic) - no amqp/zeromq/mqtt -> email translation is
> needed here - everything is just email (except for irc messages
> emitted based on those)
> - i think this can be optimized to work very reliably inside one
> infrastructure (e.g. debian.org) but at the same time it is easy for
> an outside listener to join in with his/her own service and start
> doing some stuff based on Debian events (if the subscription hub is
> public)
> - it uses the most standard and compatible protocol possible (SMTP) so
> shouldn't be an opinionated technology - theoretical message
> throughput will be limited because of that (i suspect SMTP is not
> extremely fast) but it should be still sufficient to handle all the
> distribution events

I forgot one large advantage - it is compatible with your way of
operating services by sending emails to them, it is just about making
the interface standardized across applications...

> I am still exploring ideas to do a federated message bus so this is one of them
> Please, take this as a wild brainstorming, maybe I should have given
> this more time to settle in my head but on the other hand, I won't
> mind being pwned too much here
> clime
> >
> > Bye,
> > --
> > Nicolas Dandrimont

Reply to: