-
Notifications
You must be signed in to change notification settings - Fork 0
License
Xarepo/libmc-open-source
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Xarepo Micro Containers (MC) IN A NUTSHELL ------------- The Xarepo micro containers is a set of containers (lists, trees, hashes etc) for C with an API similar to the popular STL containers in C++. Unlike for C++ there are for C no widespread standard all-around containers that can be used in any program. This library intends to fill this gap. While being accessible for the casual user, a lot of effort has been put in to make the containers perform well since performance is often a key aspect when using C. Through inlining, preprocessor code generation and compact and aligned memory management the performance is at peak of what the given data structure type allows. Memory management can optionally be adapted per container to fit special needs, for example static allocation for critical realtime sections. By making use of the C preprocessor typed containers can be generated, rather than just using void * for everything, overcoming the somewhat weak typing of the C language. This also allows for customizing the containers to exactly fit a special purpose. Due to the nature of the code, it requires C99 support from the compiler, and C11 for the parts using <stdatomic.h>. Builtins are used in places (bit operations etc), and the code is primarily developed and tested for GCC and Clang on Intel or ARM 32 or 64 bit, but should be easily portable to other compilers / hardware as needed. Note: currently only little endian mode is supported. LICENSE ------- This code is distributed under the ISC license which is a simplified BSD license. This means that you can use this code basically any way you want in both open source and commercial closed source projects, just keep the copyright statements intact. The twist with the ISC licence compared to the BSD license is that you do not need to put credits in the documentation, which makes it easier to include the code in any project. For your convenience you may also choose the more well-known MIT license if that suits your project better. For any other types of licensing, contact us. As usual there is no warranty. EXAMPLES -------- Want to start using key-to-value mappings, lists etc in the simplest possible way? See src/examples/example_basic.c Do you want to generate custom versions of the data types? See src/examples/example_advanced.c BENCHMARKS ---------- See separate file "BENCHMARKS" to get some performance numbers and comparisons for some of the provided containers. LIMITATIONS ----------- C does not have polymorphism so every function for each type needs to be unique with a unique name. This library uses the pre-processor to generate this from common code. If you refactor code and change container type you will unlike in C++ need to replace function names too. In other words, it's not practical to write generic code that can accept and work on any container type (like C++ algorithms library). This library makes it easier to be a C programmer in many situations, but it does not transform C into C++. (It would be possible to extend this library with an algorithms template header generating code using same techniques like the containers. Maybe in a future release.) USER DOCUMENTATION ------------------ MC can be used in two modes, either link to it as a static lib or just include it as source code in the project. The less complex data structures are implemented 100% through inline functions directly in the included header file and thus do not need a separate library or object file. Micro container types are defined in template files, 'mxx_tmpl.h'. The template contains inline functions which implement the container type. If the implementation is large there are references to helper functions in a corresponding 'mxx_base.h' that do the major part of the work). The inline functions in the 'mxx_tmpl.h' files are just templates, to make them complete a set of pre-processor #define directives need to be filled in before #include <mxx_tmpl.h>. This allows for customizing the type to exactly fit the user's needs. However, if you leave out the directives a default datatype will be generated, usually an integer key (intptr_t) and a void * value. Overview of containers ---------------------- mrb - red-black tree, associative container with all-around properties. In nearly all cases you need a key to value mapping, this is the container you should use. Only when you have special performance needs you should look at other types. A property of this type of tree is that keys are always sorted, which can be handy at times. mrx - radix tree, associative container very good at searches especially for dynamically sized keys for example strings. Ideal for large dictionaries with low change rates and high search rates. Not suitable for small data sets as the initial memory overhead is quite large. The radix tree is typically faster in searching than the red-black tree also for fixed size keys. When there are many keys in the tree and many share the same prefix it can be much more space efficient than for example a red-black tree, since the prefix is stored in only once and shared by all keys, while in a red-black tree the keys are always stored in full length regardless if they share prefix with other keys. A drawback of this property is that the iterator is not just a pointer but an allocated structure which must be freed if not used in mrx_next() until it returns mrx_end(). The radix tree sticks out by being the most complex data type in the library by a wide margin as in not "micro" in terms of code. It's however very fast and is included for that reason. mht - hash table which is single hashing, associative container with extremely high performance in the best case, but if used in an unsuitable way the performance can be very poor. Should only be used by those familiar with hash table properties. Since it is single hashing you need to match the hash function with the keys to avoid collisions. You also need good control of the keys, or else malicious key input could cause performance to plunge. mv - a dynamically resized array. Suitable for lists where insert and erase can be done from the back. It can also be used with a static pre-allocated array, and static size too in which case it becomes an array view (span). mld - double-linked list, sequence container. The most flexible list and the typical to use when there are no special needs. mls - single-linked list, sequence container. Since it is single-linked it is not as flexible as the double-linked list, but has the advantage that it is more cache efficient to add and remove elements since there is only one link that needs to be updated. Use this instead of the double-linked list if there is a need for the extra performance, and it works having only single links (not possible to traverse backwards in the list etc). mq - lifo/fifo queue, sequence container. Supports only static allocation (fixed size). Implemented as an array with head and tail index, and is thus very efficient. In short, 90% of the time you will be using the mrb if you need and associative container and mld or mv if you need a sequence container. The header files contain some documentation as well, together with some design notes of how the containers are implemented and which trade-offs there are. Additional components --------------------- bitops.h - a self-contained header with common (and not so common) bit operations like bit scan forward/reverse, bit count and swap etc. Uses builtins (ie purpose-made hardware instructions) when possible. buddyalloc.h - buddy allocator which is used as backing to the node pool by the containers in performance memory management mode. This buddy allocator supports lock-free multi-threaded allocations. nodepool.h - node allocator to be run on top of buddyalloc, used by the containers in performance management mode. May be useful when making own higd performance data structures. Configure syntax ---------------- The micro containers are configured by setting up a few defines before including the header file. The same header can be included several times in the same file with different configuration settings. All the defines are undefined after used in the included header, so you don't need to undefine them manually before defining them again for a new include. /* no defines before the include - default datatype will be generated */ #include <mld_tmpl.h> /* The following example sets up the double-linked list as a list of constant strings, with copies. Note the use of the deconst macro. Since the prefix is set to 'strls' functions will be named 'strls_*()' and will thus not collide with the default prefix for this container which is 'mld'. */ #define MC_PREFIX strls #define MC_VALUE_T const char * #define MC_COPY_VALUE(dest, src) dest = strdup(src) #define MC_FREE_VALUE(val) free(MC_DECONST(void *, val)) #include <mld_tmpl.h> See the example code for more examples, src/examples/. Choosing container properties ----------------------------- Here follows a step-by-step howto of how to pick a container and how to configure it to best suit your needs: 1. Choose a suitable container type. Should it be a sequence (only values in some order, a list for example) or associative (key-to-value mapping)? Most associative containers can be configured to use only keys, that this to be a set of keys/values that can be looked up. Choose a proper container for your needs, based on what operations you need to perform, and what type of performance you need. Concerning performance, this requires some understanding of how hashtables, trees etc work and what their properties are. Each data structure has pros and cons, there's no single data structure that is best at everything. If performance is important and/or there are realtime requirements it is important to make an adequate choice. Before going to the next step and configuring the container, consider if you can use the default settings. For a sequence container this will be void * values, and for associative containers the key will (typically) be intptr_t and values void *. It may be tempting to create specific types for all situations, especially if you're used to a language with strong typing, but experience tells us that creating many similar types does not make the code easier to follow. The "least bad solution" is typically to use the default void pointer types when it fits the needs to avoid creating a huge amount of types. 2. Choose memory management model. If maximum performance is not of importance than you should use the compact mode, which uses the least memory (at least for smaller data sets). For all containers that supports it the compact mode is the default and then you don't need to give any directives. The other modes are "performance" and "static", and this normally only used in real-time software or where performance is very important. The performance mode makes block allocations of nodes using a fast buddy allocator. Since it allocates in blocks there is some overhead, but if there are a few large instances instead of many small it will actually consume less memory than the compact mode in most cases since there is no header overhead per node. Memory debuggers may work a bit less good together with the buddy allocator since it does not use libc malloc. The performance compared to compact mode is typically quite small, since libc malloc (used im compact mode) is already highly tuned. But if you need the fastest, performance mode is what you should choose. There is also a static allocation mode, which means that all memory is pre-allocated when the container is created. This gives both high performance and good real-time properties, but the obvious drawback is that you need to pre-allocate for the maximum amount of nodes the container should be able to hold. The mode is typically only used in real-time software, or for those containers that only support this mode (such as the LIFO/FIFO queue). 3. If associative container - choose key type. Some containers may only support a specific type of key (for example only strings). Here we assume the generic case. Pointer to key, or direct storage? Direct storage require fixed size keys (examples: 'unsigned int', 'struct my_key_struct'). Note that since functions are inline (and key argument is constant), there will be no stack copy of the key even though the value rather than the reference is given as argument. In other words, you don't not need to worry about performance in argument passing when using direct storage. Pointer to keys should only be used when the situation requires. Such a situation is when keys have dynamic size such as a string ('const char *'), or if the storage of keys is handled outside of the container, for example when several containers use the same keys. Concerning using the same keys in several containers, it can be messy and is generally not recommended. If keys are say 64 bits or smaller, there is no significant space (=performance) to save by re-using keys in several containers - then use direct storage instead. If the type is a pointer to the key then there are some extra considerations. First, should it be constant or non-constant? Since keys should not be changed when owned by the container, it is recommended to use const whenever possible. Const is however only strictly required when the key points to read-only memory, such as static strings. Compile time warnings may occur when ..._key() and other functions that return keys will return pointers to constant values, for example if keys need to be passed to a function which does not take a constant argument (despite not changing the contents). In some of these cases a deconst macro (MC_DECONST()) can be used as a work-around. Second, should the key be copied or not? Let the container copy/free the key itself unless there is some strong reason for handling key memory management externally. Reasons for not copying could be that the same keys are used in many containers or keys refer to static memory (such as constant strings). Do not use copying just to work around a const issue, if it can be solved safely with MC_DECONST() instead. 3. Choose settings for container value. Perhaps no value at all? Value-less associative containers = sets. Fixed size values stored directly in the container nodes is more efficient than same values allocated separately. A small value type is more efficient than large. What type should mxx_insert(), mxx_val(), mxx_find() etc return? The default this to return the same type as the value is, which is often suitable if value is a pointer type (example: value is 'char *', returns 'char *'). An undefined value may have to be specified (default is 0/NULL), which is used for indicating failures mxx_find() miss for example. In some cases it may not be possible to define an undefined value, for example if the value is an integer and the whole integer range represent valid values. Then it may be suitable to instead return a reference to the value, which is often suitable for non-pointer type of values where there may be no natural "undefined" value. Example: value is 'int', returns 'int *'. Returning the reference to the value also allows direct manipulation of it in the container (without using mxx_setval()). Normally the insert functions take an argument for the value, but it is possible to skip that if the container is configured to return the reference to the value. Then the insert function will return a reference to the value which the user can fill in. This may be suitable for large fixed values (structs). Since the insert functions are inline, there will be no extra stack copy of the value even if it is given as an argument. Thus, to gain an advantage by skipping the insert argument the user should not make a stack copy of the value at all, but only directly fill in the value in the referenced position. Should the values be copied and freed by the container? This is only sensible when the value is of pointer type. If the value type is a pointer type (example 'char *') should it be const or not (that is 'const char *' or just 'char *')? If it is const, the container functions will return const types too, in other words - const in => const out. In some cases it is a problem if the container returns const values, but do not copy values just to get around that if it can be safely solved with MC_DECONST() instead. Iterators --------- The containers use the iterator concept to traverse them. Example: for (mld_it_t *it = mld_begin(tt); it != mld_end(); it = mld_next(it)) { ... } In practice the iterator is typically a pointer to a node in the container, but it depends, for some containers the iterator can be a structure totally unrelated to a node. It also depends on the container if an iterator is still valid after the container has been modified. For the two most common containers, the double-linked list and the red-black tree, the iterators are indeed pointers to nodes and they are still valid after other nodes have been added or removed. The iterators can thus be stored to be used later as direct pointers into the container. For example, the function *_iterase() can be used instead of *_erase() which means that there is no search involved in the erase so it will be much more efficient. You need to know the properties of the iterator for a specific container before you know if you can store it till later and it can therefore be considered risky, but the gain is so large that we still recommend use of it. A typical use is to store the iterator to the node so it can be later erased without searching or scanning the container. To get the first iterator you call the mxx_begin() function, and to test if you have reached the end you compare with mxx_end(), as seen in the for loop above. The mxx_end() function is most often a macro for NULL, but can also point out a position in memory specific for the given container, typically the position past the last element in an array. Variations between interfaces to different containers ----------------------------------------------------- The functions provided with the red-black tree gives a good example of what can be done by a associative container, and the double-linked list for a sequence container. More specialised containers are typically more limited in what functions they can provide, which means that not all of functions are available for all containers, and some have specialised functions made possible through the special type of algorithm used. For example, while the double-linked list has an erase() function to delete a specific node, the single-linked list instead has an erase_after() function because the list does not have any link to the previous node. The parameters required for the same function may also vary between containers. The design is that no more parameters than required for the underlying data structure should be passed onto the user, accepting the drawback that there will be some minor differences between containers. For example one container may be able to get the next iterator by only getting the pointer to the current while an other container may need a pointer to that container object as well. Multithreading -------------- The micro containers does not contain any locks, so it is the responsibility of the user to handle contention. However, any global data structures (if any) used by the implementation is thread safe. The containers are thus designed to be used in a multithreaded environment. DESIGNER'S NOTES ---------------- Actual container implementation can in whole or parts be generated by the C pre-processor from templates, this makes other types than 'void *' as values for containers is possible. In addition to better type checking, the compiler can often do better optimization with the code visible in headers. Large generic code blocks are put in helper functions to avoid too much code bloat through extensive inlining. There will however still be some more code in the binary than typical container implementations lead to. Compare callback functions (=some overhead) are avoided, macros are used instead. There is little or no encapsulation, but it is obvious which functions/data that should be used and not so mistakingly using internal data is not seen as a problem. When a container is generated, the function names are pre-pended with a user-specified prefix, so several containers with different key/value types can be generated from the same template. If no directives are given when including a container template a default type is generated (typically intptr_t key and void * value) with a default prefix. Function names are generally borrowed from C++ STL where applicable to make the API more familiar to STL users. Each function uses as few parameters as possible, despite that it can lead to that paramaters for the same function may differ between containers (example: mxx_next() may require both type and iterator for one container but only iterator for the other) Standard names for iterator functions: - mxx_begin(), mxx_rbegin() - mxx_end(), mxx_rend() - mxx_next() - mxx_prev() - mxx_val() - mxx_setval() - mxx_key() If there is only an iterator version, no iterator prefix is used (example: mxx_next() instead of mxx_itnext()). If there are both iterator and non-iterator versions, the 'it' prefix is used for the iterator versions (example: mxx_erase() and mxx_iterase()). The 'it' prefix is without underscore to make function name more compact. New / delete: - mxx_new() new using malloc(). - mxx_delete() delete using free(). - mxx_init() init pre-allocated area. Symbols which are not intended to be used by the API user but still visible in header files have names ending with underscore (examples: mxx_link_node_(), MXX_BLOCK_SIZE_). Depending on container type it may or may not be possible to delete while traversing, there is no strive to make all containers behave exactly the same, since it is not possible when striving for highest performance implementation of a specific data structure. Node allocation is a key aspect of container performance and there is therefore various memory management models. There is a compact mode, which focuses on low memory consumption rather than performance (uses libc malloc directly), a static allocation mode for pre-allocated data structures (can be useful in realtime contexts) and a performance block-based allocation scheme for maximum performance. It differs between containers which models that are supported, some support all, some only a subset. Regarding 'const': if the value type is made const (example: 'const char *'), also the return value will be const. This was a design choice. It is technically possible to do const in and non-const out, but we do not see a large enough need for that. Use the macro MC_DECONST() if a special case arises.
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published