zephyr/kernel/mempool.c

225 lines
4.7 KiB
C
Raw Normal View History

k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/*
* Copyright (c) 2017 Intel Corporation
*
* SPDX-License-Identifier: Apache-2.0
*/
#include <kernel.h>
#include <ksched.h>
#include <wait_q.h>
#include <init.h>
#include <string.h>
#include <misc/__assert.h>
#include <stdbool.h>
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/* Linker-defined symbols bound the static pool structs */
extern struct k_mem_pool _k_mem_pool_list_start[];
extern struct k_mem_pool _k_mem_pool_list_end[];
static struct k_spinlock lock;
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
static struct k_mem_pool *get_pool(int id)
{
return &_k_mem_pool_list_start[id];
}
static int pool_id(struct k_mem_pool *pool)
{
return pool - &_k_mem_pool_list_start[0];
}
static void k_mem_pool_init(struct k_mem_pool *p)
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
{
z_waitq_init(&p->wait_q);
z_sys_mem_pool_base_init(&p->base);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
int init_static_pools(struct device *unused)
{
ARG_UNUSED(unused);
struct k_mem_pool *p;
for (p = _k_mem_pool_list_start; p < _k_mem_pool_list_end; p++) {
k_mem_pool_init(p);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
return 0;
}
SYS_INIT(init_static_pools, PRE_KERNEL_1, CONFIG_KERNEL_INIT_PRIORITY_OBJECTS);
int k_mem_pool_alloc(struct k_mem_pool *p, struct k_mem_block *block,
size_t size, s32_t timeout)
{
int ret;
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
s64_t end = 0;
__ASSERT(!(z_is_in_isr() && timeout != K_NO_WAIT), "");
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
if (timeout > 0) {
end = z_tick_get() + z_ms_to_ticks(timeout);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
while (true) {
u32_t level_num, block_num;
/* There is a "managed race" in alloc that can fail
* (albeit in a well-defined way, see comments there)
* with -EAGAIN when simultaneous allocations happen.
* Retry exactly once before sleeping to resolve it.
* If we're so contended that it fails twice, then we
* clearly want to block.
*/
for (int i = 0; i < 2; i++) {
ret = z_sys_mem_pool_block_alloc(&p->base, size,
&level_num, &block_num,
&block->data);
if (ret != -EAGAIN) {
break;
}
}
if (ret == -EAGAIN) {
ret = -ENOMEM;
}
block->id.pool = pool_id(p);
block->id.level = level_num;
block->id.block = block_num;
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
if (ret == 0 || timeout == K_NO_WAIT ||
ret != -ENOMEM) {
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
return ret;
}
z_pend_curr_unlocked(&p->wait_q, timeout);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
if (timeout != K_FOREVER) {
timeout = end - z_tick_get();
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
if (timeout < 0) {
break;
}
}
}
return -EAGAIN;
}
void k_mem_pool_free_id(struct k_mem_block_id *id)
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
{
int need_sched = 0;
struct k_mem_pool *p = get_pool(id->pool);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
z_sys_mem_pool_block_free(&p->base, id->level, id->block);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/* Wake up anyone blocked on this pool and let them repeat
* their allocation attempts
*
* (Note that this spinlock only exists because z_unpend_all()
* is unsynchronized. Maybe we want to put the lock into the
* wait_q instead and make the API safe?)
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
*/
k_spinlock_key_t key = k_spin_lock(&lock);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
need_sched = z_unpend_all(&p->wait_q);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
if (need_sched) {
z_reschedule(&lock, key);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
} else {
k_spin_unlock(&lock, key);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
}
void k_mem_pool_free(struct k_mem_block *block)
{
k_mem_pool_free_id(&block->id);
}
void *k_mem_pool_malloc(struct k_mem_pool *pool, size_t size)
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
{
struct k_mem_block block;
/*
* get a block large enough to hold an initial (hidden) block
* descriptor, as well as the space the caller requested
*/
if (__builtin_add_overflow(size, sizeof(struct k_mem_block_id),
&size)) {
return NULL;
}
if (k_mem_pool_alloc(pool, &block, size, K_NO_WAIT) != 0) {
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
return NULL;
}
/* save the block descriptor info at the start of the actual block */
(void)memcpy(block.data, &block.id, sizeof(struct k_mem_block_id));
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/* return address of the user area part of the block to the caller */
return (char *)block.data + sizeof(struct k_mem_block_id);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
void k_free(void *ptr)
{
if (ptr != NULL) {
/* point to hidden block descriptor at start of block */
ptr = (char *)ptr - sizeof(struct k_mem_block_id);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
/* return block to the heap memory pool */
k_mem_pool_free_id(ptr);
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
}
}
#if (CONFIG_HEAP_MEM_POOL_SIZE > 0)
/*
* Heap is defined using HEAP_MEM_POOL_SIZE configuration option.
*
* This module defines the heap memory pool and the _HEAP_MEM_POOL symbol
* that has the address of the associated memory pool struct.
*/
K_MEM_POOL_DEFINE(_heap_mem_pool, CONFIG_HEAP_MEM_POOL_MIN_SIZE,
CONFIG_HEAP_MEM_POOL_SIZE, 1, 4);
#define _HEAP_MEM_POOL (&_heap_mem_pool)
void *k_malloc(size_t size)
{
return k_mem_pool_malloc(_HEAP_MEM_POOL, size);
}
void *k_calloc(size_t nmemb, size_t size)
{
void *ret;
size_t bounds;
if (__builtin_mul_overflow(nmemb, size, &bounds)) {
return NULL;
}
ret = k_malloc(bounds);
if (ret != NULL) {
(void)memset(ret, 0, bounds);
}
return ret;
}
void k_thread_system_pool_assign(struct k_thread *thread)
{
thread->resource_pool = _HEAP_MEM_POOL;
}
k_mem_pool: Complete rework This patch amounts to a mostly complete rewrite of the k_mem_pool allocator, which had been the source of historical complaints vs. the one easily available in newlib. The basic design of the allocator is unchanged (it's still a 4-way buddy allocator), but the implementation has made different choices throughout. Major changes: Space efficiency: The old implementation required ~2.66 bytes per "smallest block" in overhead, plus 16 bytes per log4 "level" of the allocation tree, plus a global tracking struct of 32 bytes and a very surprising 12 byte overhead (in struct k_mem_block) per active allocation on top of the returned data pointer. This new allocator uses a simple bit array as the only per-block storage and places the free list into the freed blocks themselves, requiring only ~1.33 bits per smallest block, 12 bytes per level, 32 byte globally and only 4 bytes of per-allocation bookeeping. And it puts more of the generated tree into BSS, slightly reducing binary sizes for non-trivial pool sizes (even as the code size itself has increased a tiny bit). IRQ safe: atomic operations on the store have been cut down to be at most "4 bit sets and dlist operations" (i.e. a few dozen instructions), reducing latency significantly and allowing us to lock against interrupts cleanly from all APIs. Allocations and frees can be done from ISRs now without limitation (well, obviously you can't sleep, so "timeout" must be K_NO_WAIT). Deterministic performance: there is no more "defragmentation" step that must be manually managed. Block coalescing is done synchronously at free time and takes constant time (strictly log4(num_levels)), as the detection of four free "partner bits" is just a simple shift and mask operation. Cleaner behavior with odd sizes. The old code assumed that the specified maximum size would be a power of four multiple of the minimum size, making use of non-standard buffer sizes problematic. This implementation re-aligns the sub-blocks at each level and can handle situations wehre alignment restrictions mean fewer than 4x will be available. If you want precise layout control, you can still specify the sizes rigorously. It just doesn't break if you don't. More portable: the original implementation made use of GNU assembler macros embedded inline within C __asm__ statements. Not all toolchains are actually backed by a GNU assembler even when the support the GNU assembly syntax. This is pure C, albeit with some hairy macros to expand the compile-time-computed values. Related changes that had to be rolled into this patch for bisectability: * The new allocator has a firm minimum block size of 8 bytes (to store the dlist_node_t). It will "work" with smaller requested min_size values, but obviously makes no firm promises about layout or how many will be available. Unfortunately many of the tests were written with very small 4-byte minimum sizes and to assume exactly how many they could allocate. Bump the sizes to match the allocator minimum. * The mbox and pipes API made use of the internals of k_mem_block and had to be ported to the new scheme. Blocks no longer store a backpointer to the pool that allocated them (it's an integer ID in a bitfield) , so if you want to "nullify" them you have to use the data pointer. * test_mbox_api had a bug were it was prematurely freeing k_mem_blocks that it sent through the mailbox. This worked in the old allocator because the memory wouldn't be touched when freed, but now we stuff list pointers in there and the bug was exposed. * Remove test_mpool_options: the options (related to defragmentation behavior) tested no longer exist. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2017-05-09 19:42:39 +02:00
#endif
void *z_thread_malloc(size_t size)
{
void *ret;
if (_current->resource_pool != NULL) {
ret = k_mem_pool_malloc(_current->resource_pool, size);
} else {
ret = NULL;
}
return ret;
}