JDK-8273158 : Tests failing with "SocketException: No buffer space available" [macos-aarch64]
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 17,18,19,20,23
  • Priority: P4
  • Status: In Progress
  • Resolution: Unresolved
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2021-08-31
  • Updated: 2024-02-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8286273 :  
Description
A mitigation was put in place in JDK-8269772 but the problem still crops up.

Note that this is the compilation that failed (javac) not the test itself.
Comments
This latest failure is within jtreg, so it's not the test per se that is failing netstat shows the pathology, the drains, delays and mbuf memory allocations can be seen and system kernel parameter kern.ipc.mb_memory_pressure_percentage: 80 which triggers an expansion of mbuf memory. With the likelihood of more and more non blocking networking i/o being used, ENOBUF increases in probablility when network usage is heavy. there is naught that can be done except attempt to handle this condition in tests and in the jtreg framework, with retry logic. A derived SocketException, such as NoBufferSocketException would assist in this respect. ---------------------------------------- [2024-02-07 20:17:16] [/usr/sbin/netstat, -mm] timeout=20000 ---------------------------------------- class buf active ctotal total cache cached uncached memory name size bufs bufs bufs state bufs bufs usage ---------- ----- -------- -------- -------- ----- -------- -------- --------- mbuf 256 6749 35745 35968 purge 0 29221 8.7 MB cl 2048 64 6025 6088 purge 0 6024 11.8 MB bigcl 4096 160 4564 4724 purge 0 4564 17.8 MB 16kcl 16384 5462 0 5462 on 0 0 0 mbuf_cl 2304 63 63 63 purge 0 0 141.8 KB mbuf_bigcl 4352 160 160 160 purge 0 0 680.0 KB mbuf_16kcl 16640 3109 5462 5462 on 2353 0 86.7 MB 4400/6749 mbufs in use: 4384 mbufs allocated to data 16 mbufs allocated to packet headers 2349 mbufs allocated to caches 64/6088 mbuf 2KB clusters in use 160/4724 mbuf 4KB clusters in use 3109/5462 mbuf 16KB clusters in use 128821 KB allocated to network (41.1% in use) 0 KB returned to the system 0 requests for memory denied 2008 requests for memory delayed 115 calls to drain routines ---------------------------------------- [2024-02-07 20:17:16] exit code: 0 time: 4 ms
08-02-2024

Here's a log file snippet from the jdk-23+9-635-tier2 sighting: java/nio/channels/Channels/ReadByte.java #section:main ----------messages:(7/218)---------- command: main ReadByte reason: Assumed action based on file name: run main ReadByte started: Wed Feb 07 20:16:49 GMT 2024 Mode: agentvm Agent id: 13 finished: Wed Feb 07 20:16:50 GMT 2024 elapsed time (seconds): 1.13 ----------configuration:(12/1537)---------- <snip> result: Error. Agent communication error: java.net.SocketException: No buffer space available; check console log for any additional details
07-02-2024

sysctl kern.ipc.mb_memory_pressure_percentage: 80 ---------------------------------------- [2022-12-05 02:50:02] [/usr/sbin/netstat, -mm] timeout=20000 ---------------------------------------- class buf active ctotal total cache cached uncached memory name size bufs bufs bufs state bufs bufs usage ---------- ----- -------- -------- -------- ----- -------- -------- --------- mbuf 256 5987 16280 16704 on 6107 4610 4.0 MB cl 2048 423 2026 2448 on 0 2025 4.0 MB bigcl 4096 2 5834 5836 on 0 5834 22.8 MB 16kcl 16384 5462 0 5462 on 0 0 0 mbuf_cl 2304 59 422 422 on 363 0 949.5 KB mbuf_bigcl 4352 0 2 2 on 2 0 8.5 KB mbuf_16kcl 16640 0 5462 5462 on 5462 0 86.7 MB 160/5987 mbufs in use: 144 mbufs allocated to data 16 mbufs allocated to packet headers 5827 mbufs allocated to caches 60/2448 mbuf 2KB clusters in use 0/5836 mbuf 4KB clusters in use 0/5462 mbuf 16KB clusters in use 121173 KB allocated to network (1.3% in use) 0 KB returned to the system 0 requests for memory denied 519 requests for memory delayed 181 calls to drain routines ---------------------------------------- [2022-12-05 02:50:02] exit code: 0 time: 3 ms ----------System.err:(34/2558)---------- DNSServer: Error: java.net.SocketException: No buffer space available java.net.SocketException: No buffer space available at java.base/sun.nio.ch.DatagramChannelImpl.send0(Native Method) at java.base/sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:935) at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:897) at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:855) at java.base/sun.nio.ch.DatagramChannelImpl.blockingSend(DatagramChannelImpl.java:887) at java.base/sun.nio.ch.DatagramSocketAdaptor.send(DatagramSocketAdaptor.java:220) at java.base/java.net.DatagramSocket.send(DatagramSocket.java:662) at DNSServer.sendResponse(DNSServer.java:189) at DNSServer.run(DNSServer.java:137) javax.naming.CommunicationException: DNS error [Root exception is java.net.SocketTimeoutException]; remaining name 'sdffdfsfgsfsf.com' at jdk.naming.dns/com.sun.jndi.dns.DnsClient.query(DnsClient.java:341) at jdk.naming.dns/com.sun.jndi.dns.Resolver.query(Resolver.java:81) at jdk.naming.dns/com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:434) at java.naming/com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:235) at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:141) at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:129) at java.naming/javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:171) at ExhaustXIDs.runTest(ExhaustXIDs.java:58) at ExhaustXIDs.main(ExhaustXIDs.java:30) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125) at java.base/java.lang.Thread.run(Thread.java:1599) Caused by: java.net.SocketTimeoutException at jdk.naming.dns/com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:472) at jdk.naming.dns/com.sun.jndi.dns.DnsClient.query(DnsClient.java:242) ... 12 more
05-12-2022

Unless the Socket/DatagramSocket and the NIO network Channel frameworks are modified to detect, and handle the ENOBUFS scenarios -- no buffer space available, then the failure above relating to java/nio/channels/Channels/ShortWrite.java is an issue for jtreg to handle and to execute some form of remedial/recovery strategy when a compile action has failed due to Error. Agent communication error: java.net.SocketException: No buffer space available This might be a wait for few seconds and retry strategy for a jtreg Action. Nonetheless, the handling of ENOBUFS within the JDK networking warrants a refreshed debate and discussion as to how best to provide relevant and appropriate Exceptions for handling these failure scenarios an possible user story is: "As a developer I would like the JDK networking APIs to provide an appropriate Exception abstraction that conveys operating system call error such as ENOBUFS so that my application code can invoke a recovery handling strategy."
15-04-2022

the latest failure in this failure series is com/sun/jndi/dns/ExhaustXIDs.java where the Exception is in the DNSServer , which is part of test library. The no Buffer space is typically a transient condition, where the system can provide an mbuf for a "non blocking" memory request. In the test scenario the DNS interactions are over UDP. As this is a UDP send, memory requests tend to be M_DONTWAIT (as in Xnu udp_output). /* * Calculate data length and get a mbuf * for UDP and IP headers. */ M_PREPEND(m, sizeof(struct udpiphdr), M_DONTWAIT, 1); if (m == 0) { error = ENOBUFS; goto abort; } Thus, if there is memory pressure at a particular instance then there exists the possibility that an ENOBUFS error will be returned, as such the socket (DatagramChannelImpl) send throws an SocketException within the DNSServer, and the test fails. In this particular scenario it is possible to address the "no buffer space" condition with some send retry logic. Assuming the condition is transient, apply a short wait and retry the send. The retry can be applied at two possible points in the call flow DNSServer sendResponse method private void sendResponse(DatagramPacket reqPacket, int playbackIndex) throws IOException { byte[] payload = generateResponsePayload(reqPacket, playbackIndex); socket.send(new DatagramPacket(payload, payload.length, reqPacket.getSocketAddress())); System.out.println("DNSServer: send response message to " + reqPacket .getSocketAddress()); } catch (SocketException soEx) { String exMessage = soEx.getMessage(); if ((exMessage != null) && (exMessage.contain("No buffer space available")) { pauseAWhile(); socket.send(new DatagramPacket(payload, payload.length, reqPacket.getSocketAddress())); } else { throw soEx; } } by applying a try catch to the socket.send, if a SocketException is caught and it is the results of no buffer space error then retry the socket.send. Alternative, in the DatagramChannelImpl::sendFromNativeBuffer, extend the existing try catch on the native send0 method private int sendFromNativeBuffer(FileDescriptor fd, ByteBuffer bb, InetSocketAddress target) throws IOException { int pos = bb.position(); int lim = bb.limit(); assert (pos <= lim); int rem = (pos <= lim ? lim - pos : 0); int written; try { int addressLen = targetSocketAddress(target); written = send0(fd, ((DirectBuffer)bb).address() + pos, rem, targetSockAddr.address(), addressLen); } catch (PortUnreachableException pue) { if (isConnected()) throw pue; written = rem; } if (written > 0) bb.position(pos + written); return written; } catch a SocketException and check that the exception message contains "No buffer space available", and retry the send catch (SocketException soEx) { String exMessage = soEx.getMessage(); if ((exMessage != null) && (exMessage.contain("No buffer space available")) { waitForAWhile(); retrySendFromNativeBuffer(fd, bb, target); } else { throw soEx; } } The netstat capture shows which has been observed to be indicative of this transient no buffer space condition. Also there is a system kernel parameter mb_memory_pressure_percentage , which has also been observed, while trying to replicate this no buffer space condition, to be influential, in that when the mbuf usage reaches 80% then the probability of a ENOBUF error on a network system call increases. Also note that kern.ipc.nmbclusters: 131072 is 256MB, while the current usage is at 106 MB approx, and a significant proportion of main memory remains unused PhysMem: 6943M used (1275M wired), 8919M unused, so mbuf expansion is possible. The netstat -mm capture indicates network memory pressure exists in this system. ---------------------------------------- [2022-04-13 16:30:21] [/usr/sbin/netstat, -mm] timeout=20000 ---------------------------------------- class buf active ctotal total cache cached uncached memory name size bufs bufs bufs state bufs bufs usage ---------- ----- -------- -------- -------- ----- -------- -------- --------- mbuf 256 6010 25080 25536 on 15342 4184 6.1 MB cl 2048 456 2641 3096 on 6 2634 5.2 MB bigcl 4096 1 1975 1976 on 0 1975 7.7 MB 16kcl 16384 5462 0 5462 on 0 0 0 mbuf_cl 2304 55 455 455 on 400 0 1023.8 KB mbuf_bigcl 4352 0 1 1 on 1 0 4.2 KB mbuf_16kcl 16640 0 5462 5462 on 5462 0 86.7 MB 147/6010 mbufs in use: 131 mbufs allocated to data 16 mbufs allocated to packet headers 5863 mbufs allocated to caches 56/3096 mbuf 2KB clusters in use 0/1976 mbuf 4KB clusters in use 0/5462 mbuf 16KB clusters in use 109237 KB allocated to network (1.4% in use) 0 KB returned to the system 0 requests for memory denied 1577 requests for memory delayed 678 calls to drain routines the kernel config parameter kern.ipc.mb_memory_pressure_percentage: 80 indicates that when the current network memory is at 80% usage, either some shuffling of mbufs is activated or a memory expansion takes place kern.ipc.nmbclusters: 131072 indicates a configuration of 256 MB alloted to network memory -- there is currently less than that allocated. This "No Buffer space available" type of failure should prompt a discussion on whether there is sufficient semantics in the SocketException to encapsulate this low level error condition in the OS networking stack, such as ENOBUFS, which can be set due to a failed network system call. Should a more specialised derivation of SocketException be created to express the no buffer space condition (ENOBUFS), and thus provide an application the capability to catch the condition and apply some appropriate remedial or recovery action,
15-04-2022

Spotted in the jdk-19+18-1211-tier2 CI job set: com/sun/jndi/dns/ExhaustXIDs.java https://mach5.us.oracle.com/mdash/jobs/mach5-one-jdk-19+18-1211-tier2-20220413-1624-31181934/tasks/mach5-one-jdk-19+18-1211-tier2-20220413-1624-31181934-closed_test_jdk_tier2-macosx-aarch64-176/results?search=status%3Afailed%20AND%20-state%3Ainvalid https://mach5.us.oracle.com:10060/api/v1/results/mach5-one-jdk-19+18-1211-tier2-20220413-1624-31181934-closed_test_jdk_tier2-macosx-aarch64-176-1649868143-36/log macosx-aarch64: jpg-mac-arm-707.oraclecorp.com Here's a log file snippet: DNSServer: send response message to /127.0.0.1:61920 DNSServer: received query message from /127.0.0.1:64605 Got exception: javax.naming.CommunicationException: DNS error [Root exception is java.net.SocketTimeoutException: Receive timed out]; remaining name 'sdffdfsfgsfsf.com' retrying once ----------System.err:(38/2980)---------- DNSServer: Error: java.net.SocketException: No buffer space available java.net.SocketException: No buffer space available at java.base/sun.nio.ch.DatagramChannelImpl.send0(Native Method) at java.base/sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:901) at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:863) at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:821) at java.base/sun.nio.ch.DatagramChannelImpl.blockingSend(DatagramChannelImpl.java:853) at java.base/sun.nio.ch.DatagramSocketAdaptor.send(DatagramSocketAdaptor.java:218) at java.base/java.net.DatagramSocket.send(DatagramSocket.java:665) at DNSServer.sendResponse(DNSServer.java:189) at DNSServer.run(DNSServer.java:137) javax.naming.CommunicationException: DNS error [Root exception is java.net.SocketTimeoutException: Receive timed out]; remaining name 'sdffdfsfgsfsf.com' at jdk.naming.dns/com.sun.jndi.dns.DnsClient.query(DnsClient.java:321) at jdk.naming.dns/com.sun.jndi.dns.Resolver.query(Resolver.java:81) at jdk.naming.dns/com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:434) at java.naming/com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:235) at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:141) at java.naming/com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:129) at java.naming/javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:171) at ExhaustXIDs.runTest(ExhaustXIDs.java:58) at ExhaustXIDs.main(ExhaustXIDs.java:30) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:828) Caused by: java.net.SocketTimeoutException: Receive timed out at java.base/sun.nio.ch.DatagramChannelImpl.trustedBlockingReceive(DatagramChannelImpl.java:703) at java.base/sun.nio.ch.DatagramChannelImpl.blockingReceive(DatagramChannelImpl.java:633) at java.base/sun.nio.ch.DatagramSocketAdaptor.receive(DatagramSocketAdaptor.java:238) at java.base/java.net.DatagramSocket.receive(DatagramSocket.java:701) at jdk.naming.dns/com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:430) at jdk.naming.dns/com.sun.jndi.dns.DnsClient.query(DnsClient.java:216) ... 12 more JavaTest Message: Test threw exception: javax.naming.CommunicationException: DNS error [Root exception is java.net.SocketTimeoutException: Receive timed out]; remaining name 'sdffdfsfgsfsf.com' JavaTest Message: shutting down test
13-04-2022

Here's a log file snippet from the jdk-19+13-757-tier2 sighting: java/nio/channels/Channels/ShortWrite.java #section:build ----------messages:(5/133)---------- command: build ShortWrite reason: Named class compiled on demand Test directory: compile: ShortWrite elapsed time (seconds): 0.014 result: Error. Agent communication error: java.net.SocketException: No buffer space available; check console log for any additional details So this test failure also happened in the build phase.
04-03-2022

ok will organise that. :+1
01-09-2021

[~msheppar] maybe you should just push a simple changeset that changes this test to use /othervm (you could use a subtask of this bug for that). This way next time the test fail in the CI we might see something in the log for System.err/System.out. Most of the time these seem to be empty - and I am blaming the agent VM for that too! Incremental improvements to the diagnosability would be welcome :-)
01-09-2021

[~alanb] that is a good point and it could be possible. The ifconfig is regularly checked for these type of failures, but we have not seen any evidence of such reconfigurations, unlike some of the linux machine which seem to retain deprecated autonconf IPv6 global address config - the autoconf is typically for global IPv6 address allocations. For the sibling bug JDK-8264385, which is the scenario where the receive is moribund, I have used othervm and this allows outputting of the address bindings being in the test. The IPv6 address is the same as that represented in the ifconfig output. It is something that is always worth checking to see if there are any anomalies in the network interface configurations.
01-09-2021

[~msheppar] Is there any evidence that the system is being reconfigured while these tests run? I wonder if there is a DHCPv6 or something else that is changing.
01-09-2021