[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#823552: Endless "supply vcc not found, using dummy regulator"



On Mon, May 23, 2016 at 07:06:17PM +0200, Steinar H. Gunderson wrote:
> On Mon, May 23, 2016 at 05:24:55PM +0100, Mark Brown wrote:

> >>> Cc-ing Mark in case he has any insights (I hope I have the right email
> >>> address).

> > But nobody who works on probe deferral or made any of the suggestions I
> > mentioned in the message you linked to, nor anyone who works on the
> > driver you've identified a bug in...  :(

> As for the former, I have no idea who they would be, sorry. For the latter,

There's the people I mentioned in the e-mail you linked to for
exmaple...  you could also look at git annotate for the probe deferral
code and see who worked on it, or do whatever it is lead to you finding
me.

> isn't linux-samsung-soc@ the right place? MAINTAINERS said it was.

That's just the list, not people.  You really want to see who's working
on the driver and talk to them, you can't guarantee that everyone reads
the lists sufficiently well that they will notice some random post that
doesn't even mention the driver in the subject line (it's probably a
good idea to submit that patch BTW).  Note also that if MAINTAINERS has
multiple hits you want to pay attention to them.

> Then, there's the issue of why the messages come for each deferred probe
> attempt. It seems from your message this is about something in the
> declaration of the device tree; I don't understand the nuances here, but I
> suppose it's pretty easy?

No.  This is nothing to do with the device tree.  The dwc3-exynos driver
is continually creating new, partially specified devices.  Since each of
these new devices has the same problem they all get the same warning
reported for them.

The regulator API has no idea that this is anything to do with deferred
probe, it's printing a warning message because something asked for a
supply that not mapped at all which is a potential concern if we explode
later due to this being an error in the code.  All the regulator API can
realistically do is remove all diagnostics in case someone triggers them
with deferred probe but that's probably not enormously helpful to people
if they run into an error directly triggered at the regulator level who
might like some diagnostics to help them figure out what went wrong.

> Fourth, there's the question of why there are thousands of probe attempts;
> it shouldn't be even if the regulator takes a long time to come up. I guess
> this is what your original message was about, and fixing that would also
> reduce the amount of logging a ton (plus presumably speed up boot by wasting
> less CPU on repeated probing). But as I understand you, it's not strictly
> necessary to actually fix this issue?

No, this is the way probe deferral is supposed to work.  It is a bit
wasteful of CPU but realistically it's not that much and it's really
simple to implement robustly unlike anything that involves actually
looking at dependencies.  Some breakage in your system is triggering
vastly more retries than are normal, even a hundred rounds would be
*extremely* high for the initial boot.

We're doing repeated probes because the way deferred probe works is that
every time a new device binds we start a new retry of all deferred
devices to see if that new device unblocked anything (if multiple
devices progress in one run it should only schedule one new run).  The
reason you're seeing the spectacular volume is that every time we retry
we add more devices (notice that you're not complaining about the log
message generated by the underlying Designware controller failing to get
the supply it requests which will print once per probe but rather by the
PHY devices it spams in).

The fact that the Exynos driver is instantiating subdevices before it's
even acquired its own resources is probably not helping here of course,
it's more than likely that at least some of those resources need to be
passed on to the child devices and of course if the children did
actually instantiate that'd trigger continual probe deferral runs.
Looking at the code I've got a horrible feeling that might be the root
cause here, it's a single regulator that's being requested so the
diagnostics should be being printed in the caller but that just silently
passes back failures to get supplies (which is why it wasn't immediately
obvious that that was failing).

> > The patch you linked to was for a completely different error message
> > which is at least related to probe deferral

> Yes, it's a different error message, but points to the same issue as #4
> above, no?

Not from the point of the view of the regulator API and really as far as
I can tell this is just a driver specific issue.  The regulator API has
no idea this is anything to do with deferred probe.

> > though fundamentally unless
> > we just stop printing diagnostics (which is getting more and more
> > tempting to be honest) I'm not sure anything is going to fully resolve
> > leaks like this, the best chance you've got is something that explicitly
> > looks at the dependencies like Raphael was proposing.

> What do you mean by “leaks” here?

Resources that are not cleaned up are said to be leaks, as you identifed
the dwc3-exynos code isn't cleaning up the child devices it creates.

Attachment: signature.asc
Description: PGP signature


Reply to: