# How Bloaty Works At a high level, Bloaty's goal is to create a map of the binary where every byte has a label attached to it. Every byte starts out as unknown (unattributed). As we scan the binary we assign labels to different ranges of the file. For example, if the user selected the "sections" data source we scan the section table and use the section name as the label for each range. Ideally these labeled ranges will cover the entire file by the time we are done. In practice we usually can't achieve perfect 100% coverage. To compensate for this, we have various kinds of "fallback" labels we attach to mystery regions of the file. This is how we guarantee an important invariant of Bloaty: the totals given in Bloaty's output will always match the total size of the file. This ensures that we always account for the entire file, even if we don't have detailed information for every byte. The ELF/Mach-O/etc. data structures we are traversing were not designed to enable size profiling. They were designed to assist linkers, loaders, debuggers, stack unwinders, etc. to run and debug the binary. This means that Bloaty's size analysis is inherently an unconventional use of ELF/Mach-O metadata. Bloaty has to be clever about how to use the available information to achieve its goal. This can pose a challenge, but also makes Bloaty fun to work on. Getting the coverage close to 100% requires a lot of ingenuity (and some heuristics). ## Range Map RangeMap (as defined in [range_map.h](https://github.com/google/bloaty/blob/master/src/range_map.h)) is the core data structure of Bloaty. It is a sparse map of `[start, end) -> std::string` that associates regions of VM or file space to a label. By the time Bloaty is finished, it has built a complete map of both VM and file space for the binary. You can view these maps by running Bloaty with '-v': ``` $ ./bloaty bloaty -v -d sections FILE MAP: 0000000-00002e0 736 [LOAD #2 [R]] 00002e0-00002fc 28 .interp 00002fc-0000320 36 .note.gnu.build-id 0000320-0000340 32 .note.ABI-tag 0000340-0000510 464 .gnu.hash 0000510-0001db8 6312 .dynsym 0001db8-0003c8d 7893 .dynstr 0003c8d-0003c8e 1 [LOAD #2 [R]] 0003c8e-0003e9c 526 .gnu.version 0003e9c-0003ea0 4 [LOAD #2 [R]] 0003ea0-0004020 384 .gnu.version_r 0004020-0066f30 405264 .rela.dyn 0066f30-00680b8 4488 .rela.plt 00680b8-0069000 3912 [Unmapped] 0069000-0069017 23 .init 0069017-0069020 9 [LOAD #3 [RX]] 0069020-0069be0 3008 .plt 0069be0-0069c40 96 .plt.got 0069c40-02874d1 2218129 .text [...] VM MAP: 000000-0002e0 736 [LOAD #2 [R]] 0002e0-0002fc 28 .interp 0002fc-000320 36 .note.gnu.build-id 000320-000340 32 .note.ABI-tag 000340-000510 464 .gnu.hash 000510-001db8 6312 .dynsym 001db8-003c8d 7893 .dynstr 003c8d-003c8e 1 [LOAD #2 [R]] 003c8e-003e9c 526 .gnu.version 003e9c-003ea0 4 [LOAD #2 [R]] 003ea0-004020 384 .gnu.version_r 004020-066f30 405264 .rela.dyn 066f30-0680b8 4488 .rela.plt 0680b8-069000 3912 [-- Nothing mapped --] 069000-069017 23 .init 069017-069020 9 [LOAD #3 [RX]] [...] ``` The file map refers to file offsets, and these always run from 0 to the size of the file. The VM map refers to VM addresses; these start at 0 for shared libraries and position-independent binaries, but these will start at some non-zero address if the binary was linked to be loaded at a fixed address. Note that some of the regions in the map have labels like `[LOAD #2 [R]]` instead of a true section name. This is because the section table does not always cover every byte of the file. Bloaty gives these regions a fallback label that contains the segment name instead. We must attach some kind of label to every byte of the file, otherwise Bloaty's totals would not match the file size. Also notice that there is an entry in the VM map that says `[-- Nothing mapped --]`. This is calling attention to the fact that there is a gap in the address space here. Since nothing is mapped, these regions of the VM space don't actually need to accessible in the target process image. However, unless this unused space aligns with page boundaries, it will probably end up getting mapped anyway. Sometimes we know a region's start but not its end. For example, Mach-O symbols have an address but not a size (whereas ELF symbols have both). To support this case, `RangeMap` supports adding an address with `kUnknownSize`. A range with unknown size will automatically extend to the beginning of the next region, even if the next region is added later. If we try to add a label to a range of the binary that has already been assigned a label, the first label assigned takes precedence. This means that the order in which we scan data structures is significant. So our general strategy is to scan our most granular and detailed information first. We scan generic information as a last resort, to give at least some information for parts of the binary that we couldn't find any more specific information about. ## VM Space and File Space Loadable binaries have two fundamental domains of space we are trying to map: *VM space* and *file space*. File space is the bytes of the input file. VM space is the bytes of memory when the executable is loaded at runtime. Some regions of the binary exist only in file space (like debug info) and some regions exist only in VM space (like `.bss`, zero-initialized data). Even entities that exist in both spaces can have different sizes in each. We create two separate `RangeMap` structures for these two domains. For convenience, we put them together into a single structure called `DualMap`: ```cpp struct DualMap { RangeMap vm_map; RangeMap file_map; }; ``` We populate these two maps simultaneously as we scan the file. We must populate both maps even if we only care about one of them, because most of the metadata we're scanning gives us VM addresses *or* file offsets, not both. For example, debug info always refers to VM addresses, because it's intended for debugging at runtime. Even if we only care about file size, we still have to scan VM addresses and translate them to file offsets. Bloaty's overall analysis algorithm (in pseudo-code) is: ```c++ for (auto f : files) { // Always start by creating the base map. DualMap base_map = ScanBaseMap(f); // Scan once for every data source the user selected with '-d'. std::vector maps; for (auto s : data_sources) { maps.push_back(ScanDataSource(f, s)); } } ``` ## Base Map To translate between VM and file space, we always begin by creating a "base map." The base map is just a `DualMap` like any other, but we give it special meaning: * It defines what ranges of file and VM space constitute "the entire binary" (ie. the "TOTALS" row of the final report). * We use it to translate between VM space and File space. This means that the base map must be exhaustive, and must also provide translation for any entity that exists in both VM and file space. For example, suppose we are scanning the "symbols" data source and we see in the symbol table that address `0x12345` corresponds to symbol `foo`. We will add that to VM map immediately, but we will also use the base map to translate address `0x12355` to a file offset so we can add that range to the file map. How does the base map store translation info? I left one thing out about `RangeMap` above. In addition to storing a label for every region, it can also (optionally) store a member called `other_start`. This stores the corresponding offset in the other space, and lets you translate addresses from one to the other. The `other_start` member is only used in the base map. We build the base map by scanning either the segments (program headers) or sections of the binary. These give both VM address and file offset for regions of the binary that are loaded into memory. To make sure we cover the entire file space, we use `[Unmapped]` as a last ditch fallback for any regions of the on-disk binary that didn't have any segment/section data associated with them. This ensures that Bloaty always accounts for the entire physical binary, even if we can't find any information about it. ## Scanning Data Sources Once we have built the base map, we can get on to the meat of Bloaty's work. We can now scan the binary according to whatever data source(s) the user has selected. ### Segments and Sections The `segments` and `sections` data sources are relatively straightforward. For the most part we can simply scan the segments/sections table and call it a day. For ELF, segments and sections have distinct tables in the binary that can be scanned independently. This means that technically a section could span multiple segments, but in practice segments/sections form a 1:many relationship, where each section is contained entirely within a single segment. Currently Bloaty only reports `PT_LOAD` and `PT_TLS` segments. We scan `PT_LOAD` segments first, so if there is overlap with `PT_TLS` the `PT_LOAD` label will win. In the future It may make sense to scan `PT_TLS` first, as this is more granular data that can give insight into the per-thread runtime overhead of TLS variables. It may also make sense to scan other segment types, to give more granular info. ELF segments do not have names. To distinguish between different `PT_LOAD` segments, we include both a segment offset and the segment flags in the label, eg. `LOAD #2 [R]`. For Mach-O, segments are contained within a file-level table of "load commands." Each load command has a type, and technically speaking, segments are a subset of all load commands. However Bloaty's `segments` data source reports many non-segment load commands such as the symbol table (`LC_SYMTAB`, `LC_DYSYMTAB`), code signature (`LC_CODE_SIGNATURE`), and more. Segments can have zero or more sections, so in Mach-O files the 1:many nature of segments and sections is enforced by the file format. For `segments` and `sections` we have to decide how to attribute the regions of the file that correspond to the segment/section headers themselves. Bloaty's general philosophy is to include the metadata with the data, so each label shows the true weight of everything associated with that label. This would suggest that the `.text` label should include the `.text` section as well as the section header entry for the `.text` section. However this would hide the overhead of the ELF headers, which can be significant if there are many sections. Bloaty currently has no higher-level data source that could show the ELF headers separately from the ELF data, and even if there was such a data source it would have narrow usefulness so people would probably not think to use it very often. There is not an easy answer to this question. At the moment Bloaty will include section headers with the corresponding section, but will *not* include segment headers with the corresponding segment. This may or may not be the best solution to this problem, and this may change if another solution proves to work better. ### Symbols The `symbols` data source is where Bloaty's deep parsing of the binary delivers the most benefit, as it provides detailed information that you cannot get from a linker map or symbol table. For example, take the following data from running Bloaty on itself: ``` $ ./bloaty bloaty -d symbols,sections FILE SIZE VM SIZE -------------- -------------- [...] 0.2% 116Ki 1.6% 116Ki AArch64_printInst 84.9% 98.8Ki 84.9% 98.8Ki .text 14.9% 17.4Ki 14.9% 17.4Ki .rodata 0.1% 156 0.1% 156 .eh_frame 0.0% 24 0.0% 0 .symtab 0.0% 18 0.0% 0 .strtab 0.0% 8 0.0% 8 .eh_frame_hdr [...] 0.1% 50.1Ki 0.7% 49.8Ki reg_name_maps 59.6% 29.8Ki 59.8% 29.8Ki .rela.dyn 40.0% 20.0Ki 40.2% 20.0Ki .data.rel.ro 0.4% 216 0.0% 0 .symtab 0.0% 14 0.0% 0 .strtab ``` I excerpted two symbols from the report. Between these two symbols, Bloaty has found seven distinct kinds of data that contributed to these two symbols. If you wrote a tool that naively just parsed the symbol table, you would only find the first of these seven: 1. `.text.`/`.data.rel.ro`: this is the data we obtain by simply following the symbol table entry. This is the primary code or data emitted by the function or variable. 2. `.eh_frame`: this is the "unwind information" for a function. [It is used for many things](https://stackoverflow.com/a/26302715/77070), including C++ exceptions and stack traces when no frame pointer is available. 3. `.eh_frame_hdr`: this is metadata about the `.eh_frame` section. 4. `.symtab`: this is the function/variable's symbol table entry itself. It is a fixed size for every entry. The fact that `reg_name_maps` above has a `.symtab` size of 216 indicates that there must actually be 9 different symbols being represented by this entry. Bloaty has combined them because they all have the same name. We can break them apart if we want using: ``` $ ./bloaty bloaty -d compileunits,symbols --source-filter=reg_name_maps$ FILE SIZE VM SIZE -------------- -------------- 20.3% 10.2Ki 20.3% 10.1Ki ../third_party/capstone/arch/AArch64/AArch64Mapping.c 100.0% 10.2Ki 100.0% 10.1Ki reg_name_maps 18.9% 9.45Ki 18.9% 9.43Ki ../third_party/capstone/arch/X86/X86Mapping.c 100.0% 9.45Ki 100.0% 9.43Ki reg_name_maps 16.4% 8.20Ki 16.4% 8.18Ki ../third_party/capstone/arch/PowerPC/PPCMapping.c 100.0% 8.20Ki 100.0% 8.18Ki reg_name_maps 10.7% 5.35Ki 10.7% 5.33Ki ../third_party/capstone/arch/Mips/MipsMapping.c 100.0% 5.35Ki 100.0% 5.33Ki reg_name_maps 9.1% 4.57Ki 9.1% 4.55Ki ../third_party/capstone/arch/SystemZ/SystemZMapping.c 100.0% 4.57Ki 100.0% 4.55Ki reg_name_maps 8.7% 4.35Ki 8.7% 4.31Ki ../third_party/capstone/arch/ARM/ARMMapping.c 100.0% 4.35Ki 100.0% 4.31Ki reg_name_maps 7.0% 3.52Ki 7.0% 3.49Ki ../third_party/capstone/arch/TMS320C64x/TMS320C64xMapping.c 100.0% 3.52Ki 100.0% 3.49Ki reg_name_maps 6.9% 3.44Ki 6.9% 3.41Ki ../third_party/capstone/arch/Sparc/SparcMapping.c 100.0% 3.44Ki 100.0% 3.41Ki reg_name_maps 2.0% 1.02Ki 2.0% 1016 ../third_party/capstone/arch/XCore/XCoreMapping.c 100.0% 1.02Ki 100.0% 1016 reg_name_maps 100.0% 50.1Ki 100.0% 49.8Ki TOTAL Filtering enabled (source_filter); omitted file = 46.7Mi, vm = 7.08Mi of entries ``` 5. `.strtab`: this is the text of the function/variables's name itself in the string table. Longer names take up more space in the binary, and Bloaty's analysis here reflects that (though the symbol table is not loaded at runtime, so it's not costing RAM). 6. `.rela.dyn`: these are relocations embedded into the executable. Normally we would associate relocations with `.o` files and not the final linked binary. However shared objects and position-independent executables must also emit relocations for any global variable that is initialized to an address of some other data. These relocations can take up a significant amount of space, indeed more space than the data itself in this case! Without this deep analysis of the binary, this cost would be invisible. Bloaty scans all relocation tables and "charges" each relocation entry to the function/data that requires the relocation (*not* the function being pointed to). 7. `.rodata`: Bloaty has found some data associated with this function. Sometimes data doesn't get its own symbol table entry, for whatever reason. Bloaty can attribute anonymous data to the function that uses it by disassembling the binary looking for instructions that reference a different part of the binary. If the same anonymous data is used by more than one function, then the first one scanned will "win" and assume the whole cost, as Bloaty has no concept of sharing the cost. Every byte of the file must have exactly one label associated with it. Note that this is more granular information than you can get from a linker map file. A linker map file will break down some of these sections by compile unit, but the symbol-level granularity is limited to the primary code/data for each symbol (#1 in the list above). ### Compile Units Like symbols, we can see that Bloaty is capable of breaking down lots of sections by compile unit: ``` $ ./bloaty bloaty -d compileunits,sections FILE SIZE VM SIZE -------------- -------------- 37.9% 17.7Mi 49.4% 3.52Mi [160 Others] 15.0% 7.04Mi 3.4% 246Ki ../third_party/protobuf/src/google/protobuf/descriptor.cc 33.9% 2.38Mi 0.0% 0 .debug_info 32.6% 2.29Mi 0.0% 0 .debug_loc 17.2% 1.21Mi 0.0% 0 .debug_str 6.5% 468Ki 0.0% 0 .debug_ranges 5.3% 381Ki 0.0% 0 .debug_line 2.8% 204Ki 83.1% 204Ki .text 1.0% 70.9Ki 0.0% 0 .strtab 0.4% 25.7Ki 10.4% 25.7Ki .eh_frame 0.2% 13.3Ki 0.0% 0 .symtab 0.1% 10.6Ki 4.3% 10.6Ki .rodata 0.1% 3.97Ki 1.6% 3.97Ki .eh_frame_hdr 0.0% 1.03Ki 0.4% 1.03Ki .rela.dyn 0.0% 368 0.1% 368 .data.rel.ro 0.0% 0 0.0% 81 .bss [...] ``` To attribute all of the different `.debug_*` sections, Bloaty includes parsers for all of the different DWARF formats that live in these sections. We also use the DWARF data to find which symbols belong to which compile units. The `compileunits` data source contains much of the same data that you could get from a linker map. Since each compile unit generally comes from a separate `.o` file, a linker map can often give good data about which parts of the binary came from which translation units. However Bloaty is able to derive this data without needing a linker map file, which may be tricky to obtain. The `compileunits` data source is also useful when combined with other data sources in hierarchical profiles.