Diagnosing a Failed Kernel Link
When you try to recompile a kernel and the linker refuses to cooperate, the first instinct is often to feel frustrated and stuck. The output you see is a long list of cryptic symbols, almost unreadable, and you have no idea where the problem lies. The trick is to treat the build like a detective case: start from the cleanest possible state and work your way out. A reliable habit is to run a kernel link before making any changes. If that link fails, you already know the kernel was in a broken state, and you’re not introducing new faults.
Assume you’re using a script that drives the entire build chain - compile, link, package. When the script stops with errors, exit the script cleanly (CTRL‑D or the appropriate kill command) and grab the temporary error file the linker writes to /tmp/linkerr. The file will look like a jumble of hexadecimal and non‑ASCII characters, but underneath lies a treasure trove of hints. For instance, if you see a reference to a symbol like clnopen without any context, that usually points to a driver that should have been present but was not linked in.
One of the simplest sanity checks is to ask the system whether the kernel you’re about to build will boot by default. When prompted, answer “N”. The kernel you just built will now have a distinct image file name. Compare that image to the one that is currently configured to boot. A straightforward diff of the two files should show zero differences if no code or configuration changes were made. Any discrepancy indicates that a file changed or a driver was switched off somewhere.
Now imagine you’re at a client site, and the user says, “I increased a kernel variable but the link failed.” The first thing to do is look at the symbol list in /tmp/linkerr. Pick the first symbol the linker complains about, such as str, and search the entire /etc/conf tree for it. On the source machine you might run grep -R str /etc/conf or use strings on object files to locate it. If you find /etc/conf/sdevice.d/str marked with an N, it’s off. Turn it to Y and re-run the linker. The error list will shrink, but you may still have other missing symbols.
For a deeper look, your local build system likely has the nm utility. Run nm /etc/conf/Driver.o | grep clnopen to see if the driver object contains the needed symbol. If you discover that clone is the driver providing clnopen and it is also disabled in /etc/conf/sdevice.d, enable it. Once all referenced symbols are accounted for, the linker will finish successfully. If you don’t have nm, you can use strings as a fallback, though it is less precise. In any case, the underlying strategy remains the same: locate the missing symbol, trace it back to its driver, and enable that driver in the configuration.
After the kernel links, the system will replace the old sdevice file only if the new image is built successfully. This means that a failed link will leave the previous working set intact, so you can safely roll back or re‑enable drivers without corrupting the build environment. The diff trick becomes very powerful when you suspect a driver has been accidentally turned off during a previous change. If you see that the old sdevice differs from the one you just built, investigate the specific driver lines that differ.
In practice, the most common mistake is enabling or disabling a driver without realizing the dependency chain it creates. For example, turning off the str driver will break the network stack because the Streams subsystem relies on it for routing and packet handling. Likewise, disabling clnopen can prevent the clone driver from working, which is required for many device interfaces. The lesson is to keep an eye on the symbol table each time you change a driver flag. The build system’s own scripts are designed to catch these errors early; you just need to interpret the output correctly.
When you’re troubleshooting, always keep a copy of the previous working kernel image. It’s the quickest way to confirm whether a new change introduced a failure or if the failure was already present. If you suspect a driver file is corrupted, you can copy a fresh version from a known good build environment or from the vendor’s distribution. By following this methodical approach, you’ll avoid many of the frustrations that come with kernel linking.
Common Linking Errors and How to Fix Them
Kernel linking errors rarely arise from a single source. Often they are the result of a cascade of small misconfigurations. Below are some recurring patterns that you’ll encounter and practical ways to resolve them.
One frequent culprit is a half‑installed or partially removed device driver. When you add a new device to the system, the build process writes its definition to /etc/conf/cf.d/mdevice. If the driver is removed but the entry remains, the linker will try to include it, fail, and emit a slew of errors. To catch this, examine the tail of mdevice - the most recent changes usually sit at the bottom. If you suspect an entry is wrong, comment it out with a hash (#) at the start of the line and re‑link. If the build succeeds, you’ve found the culprit. Then decide whether to delete the entry completely or restore the missing driver files.
Another source of trouble is a mismatch between the driver object files and the configuration you’ve set. For instance, you might have a driver built for a previous release but still present in /etc/conf/pack.d. The linker will treat it as a valid object, but the symbols inside may not match the current kernel expectations, leading to errors such as “undefined reference” or “multiple definition.” The best way to spot this is to run a checksum comparison of the driver objects against a known good set. If the checksums differ, you likely have a stale object file.
Sometimes the issue is not a missing symbol at all but a duplicated one. If you see messages like “multiple definition of foo” it means two driver objects are providing the same symbol. This can happen when you inadvertently include a duplicate driver file or when a driver is included through two different configuration paths. Inspect the offending symbol in the error output, then use nm on all drivers that contain that symbol to identify the overlap. Once you locate both drivers, decide which one should stay in the build. You might need to adjust /etc/conf/sdevice.d to enable only the correct driver.
Driver order matters as well. The linker processes drivers in the order they appear in the configuration files. If a later driver depends on a symbol defined by an earlier one, the order must be preserved. An easy way to test this is to temporarily move a driver line up or down and observe whether the link succeeds. However, be cautious - changing order can affect runtime behavior, so only adjust after confirming the symbol dependencies.
When you encounter an error that mentions a system file, such as “cannot open file Wed” or “cannot open file ELF,” you should suspect corruption or a file that is not in the expected format. For example, the linker expects all driver object files to be in COFF format. If you mistakenly place an ELF file or a text file in the driver directory, the build will fail. The fix is simple: replace the offending file with the correct binary or delete it if it’s not needed.
To quickly locate the file that caused the corruption, use the file command on each driver object in the /etc/conf/pack.d directory. This will tell you whether the file is a valid COFF, ELF, or something else. Once you find the misformatted file, you can either recover it from backup, download a fresh copy, or delete it if it’s no longer relevant.
In many environments, the kernel build system provides an optional “dry run” mode. Run the linker with the --dry-run flag (if supported) to see which driver objects it intends to link without actually producing the final image. This gives you a quick overview of the dependency tree and can reveal missing or duplicated drivers before you commit to a full build.
Finally, remember that the build environment itself can introduce problems. Make sure your toolchain, especially the assembler and linker, are up to date and match the kernel version you’re compiling. Mismatched versions can produce subtle errors that are hard to diagnose. If you’ve recently upgraded your toolchain, consider rolling back or rebuilding the entire toolchain to the same release as the kernel source.
Dealing with Corrupted Drivers
Corruption in driver object files is one of the hardest errors to spot, because the linker’s error messages can be generic. The key is to test the integrity of the driver files before the linker even starts.
A simple experiment is to run file /etc/conf/pack.d/.o. All proper driver objects should return “COFF” or “ELF” depending on your architecture. If any file shows “ASCII text” or another format, that file is corrupted or the wrong type. Another quick check is md5sum /etc/conf/pack.d/.o and compare the checksums against those from a known good build. A mismatch signals corruption.
Sometimes corruption manifests as a “cannot open file Wed” error. This happens when the driver object contains a stray newline or an unexpected character, causing the linker to interpret part of the filename incorrectly. Running strings /etc/conf/pack.d/*.o | grep Wed can reveal hidden content. If the strings command finds the word “Wed,” you’ve probably got a corrupted file that was edited or transferred incorrectly.
When the linker complains about “cannot open file ELF,” it means the driver object contains ELF headers where COFF headers are expected. In a SCO or older Unix environment, the kernel build expects COFF. The simplest fix is to replace the offending driver with a fresh copy from the vendor distribution or from a clean build environment.





No comments yet. Be the first to comment!