The size of the allocated length at the end depends on the block size:
a block of 16 to 256 bytes has an 8-bit length.
a block with 512 to pagesize/2 bytes has a 16-bit length.
For blocks >= pagesize, the length is a size_t and is at the beginning of the block. The reason we have to do this is because the block can extend into more pages, so we cannot trust the block length if it sits at the end of the block, because it might have just been extended. If we can prove in the future that the block is unshared, we may be able to change this, but I'm not sure it's important.
In order to do put the length at the front, we have to provide 16 bytes buffer space in case the block has to be aligned properly. In x86, certain SSE instructions will only work if the data is 16-byte aligned. In addition, we need the sentinel byte to prevent accidental pointers to the next block. Because of the extra overhead, we only do this for page size and above, where the overhead is minimal compared to the block size.
So for those blocks, it looks like:
|N*elemsize|padding|elem0|elem1|...|elemN-1|emptyspace|sentinelbyte|
where elem0 starts 16 bytes after the first byte.
Set the allocated length of the array block. This is called any time an array is appended to or its length is set.
The allocated block looks like this for blocks < PAGESIZE:
|elem0|elem1|elem2|...|elemN-1|emptyspace|N*elemsize|