JDK-8318721 : Provide os wrapper for posix_memalign
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2023-10-24
  • Updated: 2025-01-10
  • Resolved: 2025-01-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Relates :  
Description
Used in ZGC. We should have a wrapper for that.

This is deceptively complex, since it introduces a gap between the allocation handed to the caller - who expects an aligned address - and whatever we get from the libc. 
Comments
Runtime Triage: According to Stefan's comment "Nothing in the JVM is currently using posix_memalign. ". It is no longer an issue, closing.
10-01-2025

[~kbarrett] JDK-8342504 is not needed for this. I don't think we need to support alignments larger than page size. If you need larger alignments, use os::reserve_memory - it has the capability to allocate with larger alignments. At the moment, we waste 8 bytes on the malloc size. This is immense. I doubt we ever malloc more than 4GB. But okay, let's be generous and say we never allocate more than 2^48 bytes. That is absurdly large - as large or larger than the whole user address space on most platforms. 48 bits for malloc size leaves us with 16 bits to store the alignment up to 64K. I don't know a platform with a system page size larger than that.
02-12-2024

[~azafari] yes, your way is what I had in mind (header following the user pointer). Minor correction, I think we cannot encode the alignment size as log2. With this: ``` |--(padding)----------|---(header)----|----(user payload)----> os_ptr user_ptr ``` os_ptr is not necessarily aligned (unless we use posix_memalign for the underlying allocation, but that would worsen memory waste). So we need to store the distance between user pointer and OS pointer directly as number of bytes. Still, I think 16 bits should be enough as I argued before.
02-12-2024

FTR, the Bug description says "Used in ZGC. We should have a wrapper for that.", but note that ZGC removed its usage of posix_memalign in favor of rolling it's own NMT-aware memory alignment mechanism. Nothing in the JVM is currently using posix_memalign.
02-12-2024

One possible solution to the avoid reading backward looking for a specific byte corresponding to Header: If we pad the value zero (A - H) number of bytes and then writes the H-bytes header, the Header is adjacent to the pointer that passed to the user (caller). Instead of this layout <--Header--><---- Gap---><-----User Content----> we can use this one: <----Padding----><--Header--><-----User Content----> ^..............................................^ os_ptr.......................................user_ptr Padding and Gap are at the same size. Then we don't need to search for Header, it is at ((char*)user_ptr - H). Since A is power of 2, 7 bits or 1 byte is enough to have a field for it in the Header. void free(char *user_ptr) { Header* hp = (Header*)(user_ptr - H); size_t A = 1 << hp->alignment_as_power_of_2; A = MAX2(A, H); char *os_ptr = (user_ptr - A); os::free(os_ptr); }
02-12-2024

It would help to have more details about your plans for the header. For this to work, I think A >= H must be true, or A must be normalized to make that true. That's fine, but should be stated. It's like how the natural alignment for an allocation is implicitly rounded up to at least max_align_t alignment by malloc or global operator new. Maybe the aligned allocator should fall back to os::malloc when the requested alignment is sufficiently small. This always allocates an extra (A - H) bytes than would an ordinary NMT-supporting allocation. That's on top of any space overhead imposed by posix_memalign itself. (Naively, O(A), but I don't know what clever tricks implementations of posix_memalign and the like might use.) Depending on the values of A and N, and the number of affected objects, that could be quite significant. Maybe the expectation is that relatively few objects will be affected? Maybe that's correct. I like the concept of a header + optional filler. It applies nicely to all calls to free, so we don't need to have an aligned_free and keep track of what kind of allocation was used. It does add a small amount of overhead to freeing an object allocated normally (rather than with special alignment), but I'm guessing it's not really measurable in the overall scheme of things. (See below.) The description of the header, filler, and F seems to have some confusion about bits vs bytes? Also, s/F(char* outer)/F(char* inner)/ ? I'd make the filler value be 0 and have the marker be in the header, to make scanning for the header simpler (just test for zero to skip filler). Even assuming the memory tag space were expanded to 2**32 from the current 256, this doesn't require taking a bit from the flag space. It's sufficient to just exclude values with zero in the appropriate byte. And that avoids any need for masking. Also, I think the scan for the header doesn't need to examine every byte in the filler, only every H bytes working backward from Inner - 1. Or maybe even better, if memory tag is 32 bits, avoid using the 0 tag and scan backward from Inner - 4, reading 4 byte values, in 16 byte steps, looking for non-zero.
21-11-2024

[~azafari], because the pointer returned isn't aligned to `alignment`, it's aligned to the MallocHeader.
20-11-2024

Why cannot we do this? void* allocate(size_t S, size_t alignment) { void *ptr; posix_memalign(&ptr, alignment, sizeof(MallocHeader) + S); return ptr+sizeof(MallocHeader); } //-------- MallocHeader* get_header(char* addr) { return (MallocHeader*)(addr - sizeof(MallocHeader); }
20-11-2024

Hi, There is surely no complexity to this if NMT is turned off? So, let's assume that NMT is turned on, and that the canaries are deleted. This means that there is a 16 byte header, the last 4 bytes of which is the memory tag. Let N be an allocation of size n bytes, H a header of size 16, and A an alignment of size a. Let char* P = posix_memalign(A, 16 + N + (A - 16)) P is aligned to A, and &P[16+ (A - 16)] is aligned to A. Let Outer = P Inner = &P[16 + (A - 16)] Now we need to have a function F(Inner) -> Outer. The complexity appears here, how much do we need to slide back in order to find Outer? We can solve this by imposing a structure on the header, and the space between the header's last byte and Inner. I impose that the last byte in the header will have its most significant byte set to 0. I impose that the bytes between the last byte in the header and Inner will have their most significant byte set to 1. Now finding the header is easy. char* F(char* outer) { char* o = outer-1; while (o & 0b10000000 == 0b10000000) o--; return o - 16; } The only loss is 1 bit in the memory tag space. I was planning to widen that space to 2**32, losing 1 bit there doesn't matter. We just need to remember to mask it out when parsing headers. We have to iterate over at most (A - 16) bytes when freeing an allocation, that's fine. Am I missing something?
20-11-2024

We probably need something like posix_memalign in order to support C++17 dynamic allocation of overaligned types.
20-11-2024

If the header/footer canaries are removed by JDK-8342504, this may become much easier.
20-11-2024