The following are used to encode binding information.
The following are used on the flags byte of a terminal node in the export information.
An indirect symbol table entry is simply a 32bit index into the symbol table to the symbol that the pointer or stub is referring to. Unless it is for a non-lazy symbol pointer section for a defined symbol which strip(1) as removed. In which case it has the value INDIRECT_SYMBOL_LOCAL. If the symbol was also absolute INDIRECT_SYMBOL_ABS is or'ed with that.
Load a dynamically linked shared library that is allowed to be missing (all symbols are weak imported).
After MacOS X 10.1 when a new load command is added that is required to be understood by the dynamic linker for the image to execute properly the LC_REQ_DYLD bit will be or'ed into the load command constant. If the dynamic linker sees such a load command it it does not understand will issue a "unknown load command required for execution" error and refuse to use the image. Other load commands without this bit that are not understood will simply be ignored.
Constants for the cmd field of all load commands, the type.
Constant for the magic field of the mach_header (32-bit architectures)
Constant for the magic field of the mach_header_64 (64-bit architectures)
Constants for the flags field of the mach_header
The layout of the file depends on the filetype. For all but the MH_OBJECT file type the segments are padded out and aligned on a segment alignment boundary for efficient demand pageing. The MH_EXECUTE, MH_FVMLIB, MH_DYLIB, MH_DYLINKER and MH_BUNDLE file types also have the headers included as part of their first segment.
Known values for the platform field above.
The following are used to encode rebasing information.
Constants for the section attributes part of the flags field of a section structure.
The flags field of a section structure is separated into two parts a section type and section attributes. The section types are mutually exclusive (it can only have one type) but the section attributes are not (it may have more than one attribute).
The names of segments and sections in them are mostly meaningless to the link-editor. But there are few things to support traditional UNIX executables that require the link-editor and assembler to use some names agreed upon by convention.
Constants for the flags field of the segment_command.
Constants for the type of a section.
Known values for the tool field above.
The build_version_command contains the min OS version on which this binary was built to run for its platform. The list of known platforms and tool values following it.
The LC_DATA_IN_CODE load commands uses a linkedit_data_command to point to an array of data_in_code_entry entries. Each entry describes a range of data in a code section.
The dyld_info_command contains the file offsets and sizes of the new compressed form of the information dyld needs to load the image. This information is used by dyld on Mac OS X 10.6 and later. All information pointed to by this command is encoded using byte streams, so no endian swapping is needed to interpret it.
Dynamically linked shared libraries are identified by two things. The pathname (the name of the library as found for execution), and the compatibility version number. The pathname must match and the compatibility number in the user of the library must be greater than or equal to the library being used. The time stamp is used to record the time a library was built and copied into user so it can be use to determined if the library used at runtime is exactly the same as used to built the program.
A dynamically linked shared library (filetype == MH_DYLIB in the mach header) contains a dylib_command (cmd == LC_ID_DYLIB) to identify the library. An object that uses a dynamically linked shared library also contains a dylib_command (cmd == LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, or LC_REEXPORT_DYLIB) for each library it uses.
A module table entry.
A 64-bit module table entry.
The entries in the reference symbol table are used when loading the module (both by the static and dynamic link editors) and if the module is unloaded or replaced. Therefore all external symbols (defined and undefined) are listed in the module's reference table. The flags describe the type of reference that is being made. The constants for the flags are defined in <mach-o/nlist.h> as they are also used for symbol table entries.
A table of contents entry.
A program that uses a dynamic linker contains a dylinker_command to identify the name of the dynamic linker (LC_LOAD_DYLINKER). And a dynamic linker contains a dylinker_command to identify the dynamic linker (LC_ID_DYLINKER). A file can have at most one of these. This struct is also used for the LC_DYLD_ENVIRONMENT load command and contains string for dyld to treat like environment variable.
This is the second set of the symbolic information which is used to support the data structures for the dynamically link editor.
The encryption_info_command contains the file offset and size of an of an encrypted segment.
The encryption_info_command_64 contains the file offset and size of an of an encrypted segment (for use in x86_64 targets).
The entry_point_command is a replacement for thread_command. It is used for main executables to specify the location (file offset) of main(). If -stack_size was used at link time, the stacksize field will contain the stack size need for the main thread.
The fvmfile_command contains a reference to a file to be loaded at the specified virtual address. (Presently, this command is reserved for internal use. The kernel ignores this command when loading a program into memory).
Fixed virtual memory shared libraries are identified by two things. The target pathname (the name of the library as found for execution), and the minor version number. The address of where the headers are loaded is in header_addr. (THIS IS OBSOLETE and no longer supported).
A fixed virtual shared library (filetype == MH_FVMLIB in the mach header) contains a fvmlib_command (cmd == LC_IDFVMLIB) to identify the library. An object that uses a fixed virtual shared library also contains a fvmlib_command (cmd == LC_LOADFVMLIB) for each library it uses. (THIS IS OBSOLETE and no longer supported).
The ident_command contains a free format string table following the ident_command structure. The strings are null terminated and the size of the command is padded out with zero bytes to a multiple of 4 bytes/ (THIS IS OBSOLETE and no longer supported).
The linkedit_data_command contains the offsets and sizes of a blob of data in the __LINKEDIT segment.
The load commands directly follow the mach_header. The total size of all of the commands is given by the sizeofcmds field in the mach_header. All load commands must have as their first two fields cmd and cmdsize. The cmd field is filled in with a constant for that command type. Each command type has a structure specifically for it. The cmdsize field is the size in bytes of the particular load command structure plus anything that follows it that is a part of the load command (i.e. section structures, strings, etc.). To advance to the next load command the cmdsize can be added to the offset or pointer of the current load command. The cmdsize for 32-bit architectures MUST be a multiple of 4 bytes and for 64-bit architectures MUST be a multiple of 8 bytes (these are forever the maximum alignment of any load commands). The padded bytes must be zero. All tables in the object file must also follow these rules so the file can be memory mapped. Otherwise the pointers to these tables will not work well or at all on some machines. With all padding zeroed like objects will compare byte for byte.
The 32-bit mach header appears at the very beginning of the object file for 32-bit architectures.
The 64-bit mach header appears at the very beginning of object files for 64-bit architectures.
LC_NOTE commands describe a region of arbitrary data included in a Mach-O file. Its initial use is to record extra data in MH_CORE files.
The prebind_cksum_command contains the value of the original check sum for prebound files or zero. When a prebound file is first created or modified for other than updating its prebinding information the value of the check sum is set to zero. When the file has it prebinding re-done and if the value of the check sum is zero the original check sum is calculated and stored in cksum field of this load command in the output file. If when the prebinding is re-done and the cksum field is non-zero it is left unchanged from the input file.
A program (filetype == MH_EXECUTE) that is prebound to its dynamic libraries has one of these for each library that the static linker used in prebinding. It contains a bit vector for the modules in the library. The bits indicate which modules are bound (1) and which are not (0) from the library. The bit for module 0 is the low bit of the first byte. So the bit for the Nth module is: (linked_modules[N/8] >> N%8) & 1
The routines command contains the address of the dynamic shared library initialization routine and an index into the module table for the module that defines the routine. Before any modules are used from the library the dynamic linker fully binds the module that defines the initialization routine and then calls it. This gets called before any module initialization routines (used for C++ static constructors) in the library.
The 64-bit routines command. Same use as above.
The rpath_command contains a path which at runtime should be added to the current run path used to find @rpath prefixed dylibs.
A segment is made up of zero or more sections. Non-MH_OBJECT files have all of their segments with the proper sections in each, and padded to the specified segment alignment when produced by the link editor. The first segment of a MH_EXECUTE and MH_FVMLIB format file contains the mach_header and load commands of the object file before its first section. The zero fill sections are always last in their segment (in all formats). This allows the zeroroed segment padding to be mapped into memory where zero fill sections might be. The gigabyte zero fill sections, those with the section type S_GB_ZEROFILL, can only be in a segment with sections of this type. These segments are then placed after all other segments.
The segment load command indicates that a part of this file is to be mapped into the task's address space. The size of this segment in memory, vmsize, maybe equal to or larger than the amount to map from this file, filesize. The file is mapped starting at fileoff to the beginning of the segment in memory, vmaddr. The rest of the memory of the segment, if any, is allocated zero fill on demand. The segment's maximum virtual memory protection and initial virtual memory protection are specified by the maxprot and initprot fields. If the segment has sections then the section structures directly follow the segment command and their size is reflected in cmdsize.
The source_version_command is an optional load command containing the version of the sources used to build the binary.
For dynamically linked shared libraries that are subframework of an umbrella framework they can allow clients other than the umbrella framework or other subframeworks in the same umbrella framework. To do this the subframework is built with "-allowable_client client_name" and an LC_SUB_CLIENT load command is created for each -allowable_client flag. The client_name is usually a framework name. It can also be a name used for bundles clients where the bundle is built with "-client_name client_name".
A dynamically linked shared library may be a subframework of an umbrella framework. If so it will be linked with "-umbrella umbrella_name" where Where "umbrella_name" is the name of the umbrella framework. A subframework can only be linked against by its umbrella framework or other subframeworks that are part of the same umbrella framework. Otherwise the static link editor produces an error and states to link against the umbrella framework. The name of the umbrella framework for subframeworks is recorded in the following structure.
A dynamically linked shared library may be a sub_library of another shared library. If so it will be linked with "-sub_library library_name" where "library_name" is the name of the sub_library shared library. When statically linking when -twolevel_namespace is in effect a twolevel namespace shared library will only cause its subframeworks and those frameworks listed as sub_umbrella frameworks and libraries listed as sub_libraries to be implicited linked in. Any other dependent dynamic libraries will not be linked it when -twolevel_namespace is in effect. The primary library recorded by the static linker when resolving a symbol in these libraries will be the umbrella framework (or dynamic library). Zero or more sub_library shared libraries may be use by an umbrella framework or (or dynamic library). The name of a sub_library framework is recorded in the following structure. For example /usr/lib/libobjc_profile.A.dylib would be recorded as "libobjc".
A dynamically linked shared library may be a sub_umbrella of an umbrella framework. If so it will be linked with "-sub_umbrella umbrella_name" where "umbrella_name" is the name of the sub_umbrella framework. When statically linking when -twolevel_namespace is in effect a twolevel namespace umbrella framework will only cause its subframeworks and those frameworks listed as sub_umbrella frameworks to be implicited linked in. Any other dependent dynamic libraries will not be linked it when -twolevel_namespace is in effect. The primary library recorded by the static linker when resolving a symbol in these libraries will be the umbrella framework. Zero or more sub_umbrella frameworks may be use by an umbrella framework. The name of a sub_umbrella framework is recorded in the following structure.
The symseg_command contains the offset and size of the GNU style symbol table information as described in the header file <symseg.h>. The symbol roots of the symbol segments must also be aligned properly in the file. So the requirement of keeping the offsets aligned to a multiple of a 4 bytes translates to the length field of the symbol roots also being a multiple of a long. Also the padding must again be zeroed. (THIS IS OBSOLETE and no longer supported).
The symtab_command contains the offsets and sizes of the link-edit 4.3BSD "stab" style symbol table information as described in the header files <nlist.h> and <stab.h>.
Thread commands contain machine-specific data structures suitable for use in the thread state primitives. The machine specific data structures follow the struct thread_command as follows. Each flavor of machine specific data structure is preceded by an uint32_t constant for the flavor of that data structure, an uint32_t that is the count of uint32_t's of the size of the state data structure and then the state data structure follows. This triple may be repeated for many flavors. The constants for the flavors, counts and state data structure definitions are expected to be in the header file <machine/thread_status.h>. These machine specific data structures sizes must be multiples of 4 bytes. The cmdsize reflects the total size of the thread_command and all of the sizes of the constants for the flavors, counts and state data structures.
Sections of type S_THREAD_LOCAL_VARIABLES contain an array of tlv_descriptor structures.
The entries in the two-level namespace lookup hints table are twolevel_hint structs. These provide hints to the dynamic link editor where to start looking for an undefined symbol in a two-level namespace image. The isub_image field is an index into the sub-images (sub-frameworks and sub-umbrellas list) that made up the two-level image that the undefined symbol was found in when it was built by the static link editor. If isub-image is 0 the symbol is expected to be defined in library and not in the sub-images. If isub-image is non-zero it is an index into the array of sub-images for the umbrella with the first index in the sub-images being 1. The array of sub-images is the ordered list of sub-images of the umbrella that would be searched for a symbol that has the umbrella recorded as its primary library. The table of contents index is an index into the library's table of contents. This is used as the starting point of the binary search or a directed linear search.
The twolevel_hints_command contains the offset and number of hints in the two-level namespace lookup hints table.
The uuid load command contains a single 128-bit unique random number that identifies an object produced by the static link editor.
The version_min_command contains the min OS version on which this binary was built to run.
A variable length string in a load command is represented by an lc_str union. The strings are stored just after the load command structure and the offset is from the start of the load command structure. The size of the string is reflected in the cmdsize field of the load command. Once again any padded bytes to bring the cmdsize field to a multiple of 4 bytes must be zero.
This file describes the format of Mach-O object files.
D header file for mach-o/loader.h from the macOS 10.15 SDK.