[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#914999: [libc6] Locking problems into libc6



Hello, Aurelien


On 12/30/18 7:49 PM, Aurelien Jarno wrote:
On 2018-12-12 17:11, Roman Savochenko wrote:
On 12/4/18 1:24 PM, Roman Savochenko wrote:
On 11/29/18 9:13 PM, Aurelien Jarno wrote:
1. For my program, I was needed to create extra locking about
the function
getaddrinfo(), but that resolved the problem only for my calls
but for the
...
Vice versa, the first problem is actual one for GLibC since:

  * I have observed twice the difference, please see on the included
    screenshot.
I indeed see two different IPs circled in red. Now I don't get what they
are, if they should be different or not and how that relates to glibc.

The lower IP is addr() used as an argument of the function getaddrinfo():

        if(addr_[aOff] != '[') host = TSYS::strParse(addr_, 0, ":", &aOff);
        else { aOff++; host = TSYS::strParse(addr_, 0, "]:", &aOff); }  //Get IPv6
        port    = TSYS::strParse(addr_, 0, ":", &aOff);

        string aErr;
        sockFd = -1;
        for(int off = 0; (host_=TSYS::strParse(host,0,",",&off)).size(); ) {
            struct addrinfo hints, *res;
            memset(&hints, 0, sizeof(hints));
            hints.ai_socktype = (type == SOCK_TCP) ? SOCK_STREAM : SOCK_DGRAM;
            int error;

            if(logLen()) pushLogMess(TSYS::strMess(_("Resolving for '%s'"),host_.c_str()));

            MtxAlloc aRes(*SYS->commonLock("getaddrinfo"), true);
if((error=getaddrinfo(host_.c_str(),(port.size()?port.c_str():"10005"),&hints,&res)))
                throw TError(nodePath().c_str(), _("Error the address '%s': '%s (%d)'"), addr_.c_str(), gai_strerror(error), error);
            vector<sockaddr_storage> addrs;
            for(struct addrinfo *iAddr = res; iAddr != NULL; iAddr = iAddr->ai_next) {
                static struct sockaddr_storage ss;
                if(iAddr->ai_addrlen > sizeof(ss)) { aErr = _("sockaddr to large."); continue; }
                memcpy(&ss, iAddr->ai_addr, iAddr->ai_addrlen);
                addrs.push_back(ss);
            }
            freeaddrinfo(res);
            aRes.unlock();

Where the top IP is the real one taken from the connection addrs[iA], the getaddrinfo() result:

                    //Create socket
                    if(type == SOCK_TCP) {
if((sockFd=socket((((sockaddr*)&addrs[iA])->sa_family==AF_INET6)?PF_INET6:PF_INET,SOCK_STREAM,0)) == -1)                             throw TError(nodePath().c_str(), _("Error creating the %s socket: '%s (%d)'!"), "TCP", strerror(errno), errno);                         int vl = 1; setsockopt(sockFd, SOL_SOCKET, SO_REUSEADDR, &vl, sizeof(int));                         if(MSS()) { vl = MSS(); setsockopt(sockFd, IPPROTO_TCP, TCP_MAXSEG, &vl, sizeof(int)); }
                    }
                    else if(type == SOCK_UDP) {
if((sockFd=socket((((sockaddr*)&addrs[iA])->sa_family==AF_INET6)?PF_INET6:PF_INET,SOCK_DGRAM,0)) == -1)                             throw TError(nodePath().c_str(), _("Error creating the %s socket: '%s (%d)'!"), "UDP", strerror(errno), errno);
                    }

                    //Get the connected address
                    if(((sockaddr*)&addrs[iA])->sa_family == AF_INET6) {
                        char aBuf[INET6_ADDRSTRLEN];
                        getnameinfo((sockaddr*)&addrs[iA], sizeof(addrs[iA]), aBuf, sizeof(aBuf), 0, 0, NI_NUMERICHOST);
                        connAddr = aBuf;
                    } else connAddr = inet_ntoa(((sockaddr_in*)&addrs[iA])->sin_addr);

Then, without the lock "MtxAlloc aRes(*SYS->commonLock("getaddrinfo"), true);" I have such replacing into &res and the difference from a different threaded parallel connection into the same code. And I have up to ten such parallel connections why I observes this problem!


  * Also I have seen once for very long locking into the function
    getaddrinfo()->poll() for some VPN (FortiClient in the case), see to
    the crash report, got after the program termination by SIGSEGV.
poll() has nothing to do with locking, it just hang there waiting for an
answer to a DNS request sent by the functions called through
getaddrinfo(). According to the trace, the timeout is set to about 5
seconds. The others thread waiting for poll() are called from
libglib-2.0 and from libxcb.so.1.

Sometime, on FortiVPN, the time is forever one and I have more time to catch this problem sending the signal SIGSEGV and more, once I close the FortiVPN connection this program successfully finished also.

I think the second poll() is a different case since it is a generic one function,

As for the segmentation fault, it happens in pthread_cond_timedwait.S
called directly from libQt5Core.so.5. Without more info, it's difficult
to say if it's due to a bug in glibc or if the argument passed to this
function are corrupted, for example because the data pointed by QMutex*
are corrupted. Do you have another way to reproduce the issue that is
actually easier than using openscada?

No, this is also different lock never related with the primary problem.

I have updated the package libc6 up to version 2.24 on Debian 8 and both of
the two last item, RTL8192eu and WIFI HotSpot, continue to work.

Where can I move then the problems with RTL8192eu and WIFI HotSpot on Debian
9?
The best would be to look the logs in /var/log/syslog to check what is
the issue. It could be a dhcp issue, a network-manager issue or a
wpasupplicant issue depending what you are using.

So, RTL8192eu does not work after upgrading NM on Debian 8 up to the version 1.6.2 from Debian 9.

And WIFI HotSpot does not work after upgrading "wpasupplicant" on Debian 8 up to the version 2.4 from Debian 9.

After I rebuilt "wpasupplicant" 2.3 for Debian 9 the WIFI HotSpot also works on Debian 9.

Both of this issues I going to report for "wpasupplicant" and NM, respectively.

Regards, Roman


Reply to: