The code that made aligned_alloc work with the 4-byte heap headers was
requesting a block of the correctly padded size, and correctly
aligning the output buffer within that memory, but it was using the
UNALIGNED chunk size for the buffer as the final size of the block
with splitting off the unused suffix. So the final chunk in the
buffer was could be incorrectly returned to the heap and reused,
leading to overlap.
Compute the chunk size of the output buffer based on the
already-aligned output pointer instead.
Initial investigation and fix from Andy Ross <andrew.j.ross@intel.com>.
I reworked his fix, created a test case, and stolen his commit log.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>