Search

File Formats Recovered By Photorec

9 min read 2 views
File Formats Recovered By Photorec

Introduction

PhotoRec is a free and open‑source file data recovery utility that is part of the TestDisk suite. Unlike traditional file system–based recoverers, PhotoRec works by ignoring the file system metadata and focusing on known file signatures to reconstruct data. It can recover a wide range of file formats from storage media such as hard drives, USB flash drives, SD cards, and memory cards. The recovery capabilities of PhotoRec are largely determined by the internal database of file signatures, which lists the magic numbers and patterns that identify specific file types. This article surveys the file formats that PhotoRec can recover, outlines the categories of files supported, and discusses the mechanisms that enable the recovery of each format.

History and Development of PhotoRec

Origins and Evolution

PhotoRec was created by Christophe de Dinechin in 2004 as a companion to TestDisk, an application designed for partition recovery. While TestDisk operates by manipulating the file system structures, PhotoRec was conceived to address the limitations of file system–based recovery, especially in cases where the file system is heavily corrupted or completely erased. Over time, PhotoRec has evolved through successive releases, with each version expanding the signature database and improving the robustness of file detection algorithms.

Signature Database

The core of PhotoRec’s functionality lies in its signature database, which contains patterns for a broad spectrum of file types. The database is maintained by a community of contributors and updated regularly to include new formats and revisions of existing formats. Because the database is open, developers can add custom signatures to extend support for niche or proprietary file types.

Cross‑Platform Availability

PhotoRec is written in ANSI C, ensuring portability across operating systems. It is available for Windows, macOS, Linux, FreeBSD, and several other Unix‑like systems. The command‑line interface and the graphical user interface (GUI) available on Windows share the same recovery engine, guaranteeing consistent results across platforms.

How PhotoRec Recovers Data

Signature‑Based Detection

PhotoRec scans raw disk sectors for patterns that match entries in the signature database. When a match is found, the utility extracts the data from the sector where the signature begins to the next signature or to the end of the file, whichever occurs first. This approach allows the recovery of files even when the file system structures are corrupted or absent.

Block Size and Sector Alignment

During a scan, PhotoRec reads sectors of a specified block size, typically 512 or 4096 bytes. It then aligns the scan to sector boundaries, reducing the chance of misidentifying file starts. After locating a signature, PhotoRec attempts to read the file in continuous blocks until a terminating signature or a file size limit is reached.

Handling of File System Variants

Because PhotoRec disregards file system metadata, it can recover files from a variety of file systems, including FAT, NTFS, ext2/3/4, HFS+, exFAT, and many others. It is also capable of recovering data from partitions that have been overwritten or where the file system has been removed entirely.

Categories of File Formats Recovered by PhotoRec

PhotoRec’s database is organized into logical groups based on the type of data the files contain. The following sections enumerate the main categories and list representative file formats within each.

Image Formats

  • JPEG (.jpg, .jpeg)
  • Portable Network Graphics (.png)
  • Graphic Interchange Format (.gif)
  • Windows Bitmap (.bmp)
  • TIFF (.tif, .tiff)
  • Raw Image Files (CR2, NEF, ARW, DNG)
  • WebP (.webp)
  • HEIF/HEIC (.heif, .heic)
  • JPEG 2000 (.jp2, .j2k)
  • Adobe Photoshop (.psd)

Audio Formats

  • MP3 (.mp3)
  • WAV (.wav)
  • AIFF (.aiff, .aif)
  • FLAC (.flac)
  • Ogg Vorbis (.ogg)
  • WMA (.wma)
  • ALAC (.alac)
  • M4A/MP4 audio (.m4a)
  • AAC (.aac)
  • Opus (.opus)

Video Formats

  • MPEG‑4 (.mp4, .m4v)
  • AVI (.avi)
  • MKV (.mkv)
  • WebM (.webm)
  • WMV (.wmv)
  • FLV (.flv)
  • MOV (.mov)
  • 3GP (.3gp)
  • MPG (.mpg, .mpeg)
  • H.264 (.h264, .264)

Document Formats

  • Microsoft Word (.doc, .docx)
  • Microsoft Excel (.xls, .xlsx)
  • Microsoft PowerPoint (.ppt, .pptx)
  • Portable Document Format (.pdf)
  • Rich Text Format (.rtf)
  • Plain Text (.txt)
  • OpenDocument Text (.odt)
  • OpenDocument Spreadsheet (.ods)
  • OpenDocument Presentation (.odp)
  • HTML (.html, .htm)
  • XML (.xml)

Archive and Compressed Formats

  • ZIP (.zip)
  • RAR (.rar)
  • 7‑Zip (.7z)
  • GZIP (.gz)
  • BZIP2 (.bz2)
  • TAR (.tar)
  • XZ (.xz)
  • ISO (.iso)
  • DMG (.dmg)
  • RAR5 (.r5)

System and Executable Formats

  • Windows Executable (.exe, .dll, .sys)
  • Portable Executable (.pe)
  • Mac OS X Binary (.app, .dylib)
  • Linux ELF (.elf, .out)
  • Android APK (.apk)
  • iOS IPA (.ipa)
  • Java Archive (.jar)
  • Python Bytecode (.pyc)
  • JavaScript (.js)
  • PHP (.php)

Database and Email Formats

  • Microsoft Access (.mdb, .accdb)
  • SQLite (.sqlite, .db)
  • MySQL Dump (.sql)
  • Microsoft Outlook (.pst, .ost)
  • Mozilla Thunderbird (.eml)
  • Exchange (.msg)
  • Postscript (.ps)

Specialized and Proprietary Formats

  • Adobe Lightroom (.xmp)
  • Autodesk AutoCAD (.dwg, .dxf)
  • CorelDraw (.cdr)
  • Sketch (.sketch)
  • QuarkXPress (.qxd)
  • CAD (.dxf, .dwg)
  • 3D Mesh (.obj, .fbx)
  • CAD (.stl)
  • Game Assets (various proprietary formats)

Miscellaneous Formats

  • Log Files (.log)
  • Configuration Files (.conf, .cfg)
  • Font Files (.ttf, .otf)
  • Video Game Saves (.sav)
  • Electronic Health Records (.ehr)
  • Geographic Data (.kml, .shp)
  • CAD drawings (.dgn)
  • Financial Data (.csv, .xlsx)
  • CAD (.dwg)

Mechanisms for Recovering Specific File Types

Image File Recovery

PhotoRec identifies image files by matching the start of file signatures such as the JPEG “FF D8 FF” marker or the PNG “89 50 4E 47 0D 0A 1A 0A” header. For raw image files, signatures include the camera‑specific headers (e.g., CR2 begins with “49 49 42 4D” followed by a unique pattern). After detecting a header, PhotoRec reads successive blocks until it encounters an end marker or reaches a size threshold defined for that format.

Audio File Recovery

MP3 files are identified via the “FF FB” or “FF F3” start bytes, whereas FLAC files begin with “66 4C 61 43” (ASCII “fLaC”). PhotoRec reads the header to determine the stream length, then continues extraction. For formats with variable length encoding (e.g., Ogg), the utility follows the logical page structure defined by the format’s specification.

Video File Recovery

Video recovery is more complex due to fragmented container structures. PhotoRec relies on container signatures such as “52 49 46 46” for AVI or “1A 45 DF A3” for MKV. Once a signature is found, the utility attempts to read through the container structure until it reaches a terminator or the file’s reported length. For raw video streams (e.g., H.264), detection is based on the NAL unit start codes (00 00 00 01).

Document and Spreadsheet Recovery

Microsoft Office Open XML files (.docx, .xlsx, .pptx) are ZIP archives containing XML parts. PhotoRec detects the ZIP header “50 4B 03 04” and then recovers the entire archive. For legacy binary formats, the signature patterns include the “D0 CF 11 E0 A1 B1 1A E1” marker used by Microsoft’s Compound File Binary Format.

Archive Recovery

Archive formats provide built‑in self‑describing headers. For example, ZIP files start with “50 4B 03 04”, RAR files with “52 61 72 21 1A 07 00”, and 7‑Zip with “37 7A BC AF 27 1C”. PhotoRec reads the header to ascertain the compression method and then extracts data sequentially. In some cases, nested archives are detected when the extraction process yields another recognizable header.

Executable and System File Recovery

Executable files often contain magic numbers that uniquely identify the format. The Windows Portable Executable format starts with the DOS header “4D 5A” (MZ). For Linux ELF files, the header begins with “7F 45 4C 46”. After matching these signatures, PhotoRec extracts the binary data as a continuous stream.

Database Recovery

SQLite databases start with the header “53 51 4C 69 74 65” (ASCII “SQLite”). PhotoRec identifies this header and recovers the database file. For MySQL dumps, the recovery relies on textual markers such as “CREATE TABLE” found at the beginning of the file. In many cases, these are detected as plain text signatures.

Limitations and Challenges

Fragmentation and Overwrites

When data is fragmented across non‑contiguous sectors or partially overwritten, PhotoRec may recover only portions of the file. The recovery process stops at the first sector that does not match the expected pattern, which can truncate the resulting file.

Large Files and Storage Constraints

Recovering very large files may consume significant disk space and processing time. PhotoRec imposes a maximum file size limit configurable by the user to prevent resource exhaustion. Files larger than this limit are typically discarded unless the limit is increased.

False Positives

Because PhotoRec matches only the beginning of files, it may occasionally extract data that is not a complete file but begins with a recognizable signature. These false positives can result in corrupted or incomplete recovered files, requiring manual inspection.

Encrypted and Proprietary Formats

Files that are encrypted, such as encrypted ZIP archives or encrypted document formats, cannot be fully recovered because the content is not readable without the key. PhotoRec can still recover the outer container, but the internal data remains inaccessible.

Usage and Best Practices

Choosing the Destination Directory

PhotoRec writes recovered files to a specified output directory. It is recommended to use a separate, non‑targeted storage medium to avoid overwriting data that may still be recoverable.

Scanning Parameters

Users can specify block size, the type of file system to ignore, and the set of file types to recover. Limiting the scan to relevant file types can reduce scan time and output volume.

Post‑Recovery Verification

Recovered files should be verified against known checksums when possible. For documents and images, opening the files in an editor or viewer confirms integrity. For binary executables, checksum comparison or signature verification can identify incomplete recoveries.

Community and Extensions

Adding Custom Signatures

The signature database is text‑based, allowing users to append new signatures for formats not already included. The format of the database entries includes the file type, MIME type, and a sequence of hexadecimal bytes representing the signature. Community members contribute signatures via mailing lists and forums.

Integration with Other Tools

PhotoRec’s output can be processed by other forensic tools. For example, recovered SQLite databases can be imported into SQLite browsers, while recovered ZIP archives can be opened with standard decompression utilities. In forensic workflows, PhotoRec is often combined with metadata‑analysis tools to provide context for the recovered data.

Open Source Governance

TestDisk and PhotoRec are maintained under the GPLv3 license. The project repository hosts source code, issue trackers, and documentation. Community contributions are reviewed by the core developers to ensure compatibility and stability.

Future Directions

Expanding Format Coverage

As new file formats emerge, especially in mobile and cloud storage ecosystems, the signature database will need periodic updates. Automated parsing of format specifications could accelerate the addition of new signatures.

Improved Fragmentation Handling

Research into advanced file reconstruction algorithms could allow PhotoRec to piece together fragmented files by analyzing inter‑sector metadata or using heuristics based on file system patterns.

Integration with Machine Learning

Applying machine learning techniques to detect file boundaries or to classify partially corrupted data may enhance the accuracy of recovered files, especially in challenging scenarios where signatures are insufficient.

Enhanced User Interface

While the command‑line interface remains the core, developing a more intuitive graphical front‑end could broaden PhotoRec’s accessibility to non‑technical users, while preserving the advanced options for experienced investigators.

References & Further Reading

  • Christophe de Dinechin. TestDisk and PhotoRec User Guide. 2023.
  • Open Source Software Licensing: GPLv3. 2022.
  • Forensic Imaging Techniques. Journal of Digital Forensics, Security and Law, Vol. 15, No. 2, 2021.
  • File Formats Specification Documents, Publicly Available.
  • Forensic Tool Integration Practices. 2021.
Was this helpful?

Share this article

See Also

Suggest a Correction

Found an error or have a suggestion? Let us know and we'll review it.

Comments (0)

Please sign in to leave a comment.

No comments yet. Be the first to comment!