Demystifying Unix Error Codes
When a program crashes or refuses to open a file, the most common complaint a developer sees is a simple numeric value, often shown as “Error 5” or “errno 13”. That number is a direct window into the kernel’s internal state: it tells the programmer exactly why the requested operation failed. The kernel does not send back a human‑readable message; instead, it writes an integer into the errno variable and returns –1 from the system call that caused the failure. Understanding how that number maps to a real meaning is the first step toward writing reliable Unix code.
The relationship between the raw number and the descriptive string is defined in the header file errno.h, which is included by most standard libraries. The file declares a series of macros that translate the numeric code into a symbolic constant such as EPERM for “Operation not permitted” or EACCES for “Permission denied”. When you compile a program, the preprocessor replaces each macro with its numeric value, allowing the code to run correctly on any system that follows the same error numbering scheme. In practice, most modern Linux, BSD, and System V derivatives keep the first 32 or so error codes identical, but subtle differences exist that can trip up portable code.
Because the kernel uses negative return values to signal errors, the calling process must check the result of every system call. In C, the idiom is something like:
Here, perror prints the string associated with the current value of errno. If you prefer to use the shell, you can examine the exit status with echo $? after a failed command, but that only tells you whether the command succeeded or not; it does not reveal the underlying error number. On the other hand, the shell’s built‑in error command, or utilities like strace, can show the exact system call and its return code, which is invaluable when debugging.
It is important to remember that errno is thread‑local: each thread has its own instance of the variable. This means that a signal handler or a concurrent thread may change its value while you are reading it, so you must capture it immediately after a failure. In C, the standard approach is to assign errno to a local variable before calling perror or strerror:
The set of error codes is large, but the first 32 or so are the ones that almost every program will encounter at least once. Even so, developers often overlook the subtle differences between them. Some codes are reserved for system call failures, while others are intended for library functions that may not invoke the kernel directly. A systematic understanding of each code’s typical use case will reduce the time spent chasing elusive bugs.
In the following sections, we’ll walk through the most common numeric errors, compare how different Unix variants implement them, and give practical advice for turning these numbers into actionable information. The goal is to help you move from vague “Error 13” messages to concrete, reproducible diagnostics.
Common Error Codes that Every Unix Developer Should Know
When you first start writing programs that touch the file system, process control, or network stack, the first handful of error codes will appear repeatedly. Below is a quick reference that pairs each code with a typical scenario and a short example. Knowing the patterns will let you anticipate failures before they crash your application.
EPERM (1) – Operation not permittedThe kernel refuses to execute a privileged action. For instance, calling
setuid() to switch to another user’s ID while you are not root triggers this error. Similarly, attempting to change the owner of a file you do not own will also yield EPERM. In a shell script, you might see this when trying to bind to a port below 1024 as a non‑privileged user.ENOENT (2) – No such file or directoryThe path you supplied does not exist. This is the most common failure when opening a file: either the file was misspelled or the directory in the path is missing. The same code can appear when looking up a non‑existent socket or trying to delete an IPC object that has already been removed.ESRCH (3) – No such process
You asked the kernel to act on a process that isn’t alive. The classic example is
kill(pid, SIGKILL) where pid has already exited. In networking code, attempting to close a connection that was already closed can surface ESRCH via the underlying close() call.EINTR (4) – Interrupted system callA blocking operation was interrupted by a signal. If your program uses
read() or sleep() and receives a signal, the call will return –1 and set errno to EINTR. The typical fix is to retry the operation in a loop that checks for this condition. In POSIX, many I/O functions are defined to be restartable, but only if you install the right signal handler.EIO (5) – I/O errorHardware or device problems cause this catch‑all error. For example, reading from a disk that has a bad sector may return EIO. Another case is attempting to read from a closed pipe: the kernel cannot perform the read and reports an I/O error. In userland, orphaned child processes that try to read from
/dev/stdin may also see EIO.ENXIO (6) – No such device or addressYou asked the kernel to access a device that does not exist or is not attached. Opening a FIFO for writing with
O_NONBLOCK and no reader present will generate ENXIO. Trying to talk to a nonexistent serial port also falls under this category.E2BIG (7) – Argument list too longWhen
exec() receives too many arguments or environment variables, it fails with E2BIG. This often surfaces when a shell expands a glob that yields thousands of file names. In interprocess communication, pushing a message that exceeds the queue’s size limit may also trigger this code.
These eight examples illustrate a pattern: the code tells you what part of the system you touched and why it rejected your request. When you learn how each code is generated, you can anticipate failures in other parts of your program as well.
Beyond the first 32 codes, the list becomes more specialized. Codes like EACCES (Permission denied) and EBADF (Bad file descriptor) appear in almost every file‑handling routine. EMFILE (Too many open files) warns you when a process hits its per‑process limit, while ENOSPC (No space left on device) tells you the file system is full. Even seemingly obscure codes such as ENOTTY (“Not a typewriter”) surface when you call terminal‑specific ioctl operations on a regular file.
By mapping these error numbers to real world scenarios, you can write error handlers that are both precise and useful. Instead of a generic “Permission denied” message, you can advise the user to check ownership or use sudo. For EAGAIN on a non‑blocking socket, you can instruct the program to wait or use epoll. The goal is to turn a single integer into actionable advice.
Why the Same Number Can Mean Different Things on Different Unix Systems
At first glance, it may seem that the error numbers are universal, and that errno.h provides a one‑to‑one mapping across all Unix flavors. In practice, subtle differences can creep in. The first 32 error codes are largely preserved across Linux, BSD, and System V derivatives, but the textual comments, ordering, and even the meaning of a few codes can vary. For developers who target multiple platforms, this nuance is critical.
Take EPERM as an example. On Linux, the header defines it as “Operation not permitted”, while on SCO Unix it is listed as “Not owner”. The semantic difference may seem minor, but the comment can guide developers toward the right debugging path. If a process receives EPERM on Linux after a setuid attempt, the developer knows the kernel denied the privilege change. On SCO, seeing “Not owner” might prompt a review of file ownership rather than the setuid logic.
Another illustration involves EAGAIN. In Linux, EAGAIN consistently means “Try again”, used by non‑blocking I/O when data isn’t ready. On older BSD releases, the same number was repurposed to indicate “No more processes” when a system call could not find an available process slot. Modern BSDs have largely reconciled these differences, but older codebases can still exhibit divergent behavior.
Beyond textual comments, the numeric range itself can shift. Some vendor‑specific systems introduce additional codes beyond 32, while others reserve certain values for proprietary use. For instance, HP-UX historically defined ENOPKG (value 65) for “Package not installed”. In contrast, Linux never uses that code. If your program expects a standard set of error numbers and runs on a system that defines different codes in that range, it can misinterpret errors entirely.
When you need to verify the exact mapping on a target system, the most reliable method is to inspect the header file directly. On a Linux system, you can run:
On a BSD, the path may be /usr/include/sys/errno.h. For systems that ship multiple versions of the C library, you may find separate headers in /usr/include/glibc or /usr/include/bionic. If you are working with a kernel source, the definitions live under include/uapi/asm-generic/errno-base.h and include/uapi/asm-generic/errno.h for Linux, and similar directories for other kernels.
Once you know the local mapping, you can write portable code by checking In short, the most effective way to handle cross‑platform error codes is to treat Having understood what each error code means and how it can vary across systems, the next challenge is to turn that knowledge into a systematic debugging workflow. The core of this workflow is a consistent pattern of checking return values, capturing 1. Always inspect the return value. In POSIX, a return value of –1 indicates failure. Skip the rest of the function body until you have verified success. For example, when opening a file, do not proceed to read or write unless 2. Capture 3. Use 4. Implement retry loops for EINTR. System calls interrupted by signals often return EINTR. The POSIX standard recommends retrying the call unless the signal handler explicitly sets a flag. A typical pattern is:errno against the macros provided by the system’s own errno.h. Avoid hard‑coding numeric values in your source; instead, let the preprocessor fill them in. If you need to support a particular vendor’s extended codes, you can conditionally include them using #ifdef guards that check for defined macros such as __GLIBC__ or __APPLE__
errno.h as the authoritative source, query it at compile time, and then translate the resulting integer into a message at runtime using strerror() or a custom lookup table. That practice guarantees that your error handling remains accurate no matter which Unix variant your code runs on.Debugging Strategy: Turning Numbers into Meaningful Messages
errno immediately, and using helper functions to produce readable output. Below are several concrete tactics that can be applied in both C/C++ programs and shell scripts.open() returned a non‑negative file descriptor. This simple guard prevents cascading errors that are harder to trace.errno early. The value of errno can change as soon as you call another function. Immediately after a failure, store it in a local variable: int err = errno;. This guarantees that any later debugging output refers to the correct error. For multi‑threaded programs, remember that errno is thread‑local, so you do not need a lock, but the same variable must be used for all subsequent checks.perror() or strerror() These functions translate an errno value into a human‑readable string. In C, perror() prints the string prefixed with a custom message. For scripts, printf '%s: %s
' "operation" "$(errno_str $?)" can emulate the same behavior.
5. Use strace for low‑level insight. If an unexpected error occurs, run the program under strace -e trace=all -p PID or strace -o log -f ./program. The trace will show every system call, its arguments, and return values. Look for the call that produced the failure and confirm that the arguments match what you intended.
6. Add comprehensive logging around error paths. In production code, log the operation name, the arguments, and the error string. For example, before a connect() call, log the destination IP and port. If it fails, log errno and the human‑readable message. This contextual information speeds up post‑mortem analysis.
7. Leverage compile‑time diagnostics. Compile with -Wall -Werror -D_GNU_SOURCE to catch uninitialized variables and other subtle bugs that could lead to erroneous errno values. Tools like cppcheck or clang-tidy can flag suspicious patterns.
8. Validate assumptions with assertions. If you expect a file descriptor to be valid before a read, assert that fd >= 0. If an assertion fails, it indicates that the earlier code path didn’t guard correctly, and you can investigate why the descriptor was closed.





No comments yet. Be the first to comment!