The PDF/A standard defines specific requirements and constraints on the PDF file format to ensure that PDF/A-compliant documents are self-contained, device-independent and suitable for long-term archiving (the “A” stands for archive). These requirements include guidelines for fonts, color, metadata, transparency and other aspects of PDF document creation and storage, with one of the goals to ensure that PDF/A-compliant documents are reliably rendered in the same way regardless of system or software. This is why fonts and images, for example, must be embedded in the PDF/A-compliant documents and encryption is not allowed.
Why to use PDF/A for archiving?
Two main reasons are PDF/A’s advantages over other electronic formats and its industry acceptance.
The most widely used alternative for digital archiving is TIFF, a raster image format that promises the same guaranteed visual appearance of the document as PDF/A. However, TIFF does not support vector shapes, gradients and fonts, and it often takes up more disk space than PDF/A. Unlike TIFF, PDF/A supports Unicode, which makes text extractable and searchable, as well as Digital Signature (PDF signing), which prevents tampering with the content of a document.
PDF/A also maintains a high level of industry acceptance. Since the standard was published in 2005, many institutions, especially in Europe, have mandated PDF/A as the required file format for archiving.
PDF/A Standard Versions
- The most restrictive. Missing features: JPEG2000, transparency, layers and attachments.
- Conformance levels: a & b
- Based on PDF 1.4
- New & permitted features: Improved image compression (JPEG 2000 and JBIG2), transparency, layers and attachments (only other PDF/A files)
- Conformance levels: a, b & u
- Based on PDF 1.7
PDF/A-3 is virtually identical to PDF/A-2 with a single difference – PDF/A-3 permits any file type as an attachment.
- New & permitted features: Attachments (any filetype)
- Conformance levels: a, b & u
- Based on PDF 1.7
Sometimes referred to as PDF/A-NEXT, PDF/A-4 is the next iteration of the PDF/A standard, created to align with PDF 2.0, the latest version of the PDF ISO standard.
- New features: PDF 2.0 Compatibility
- Conformance levels: e & f
- Based on PDF 2.0
Conformance levels a, b, and u are not used in PDF/A-4. Instead, PDF/A-4 encourages but does not require addition of higher-level logical structures, and it requires Unicode mappings for all fonts.
Additionally, PDF/A-4 introduces two new conformance levels, e & f. PDF/A-4f allows file types of any other format to be embedded, whereas PDF/A-4e introduces support for Rich Media and 3D type annotations as well as embedded files to create a PDF/A version compatible with modern geospatial, construction and engineering workflows. (The ‘e‘ stands for engineering).
PDF/A Conformance Levels
Level b (Basic)
PDF/A-1b, PDF/A-2b, PDF/A-3b
B-level conformance requires only that documents conform with guidelines for reliable viewing and therefore, is the easiest level to achieve.
Level a (Accessible)
PDF/A-1a, PDF/A-2a, PDF/A-3a
A-level conformance is a superset of b-level conformance. It adds requirements for information intended to preserve a document’s logical structure, semantic content, and natural reading order.
In other words, a-level conformance not only ensures documents will look the same in the future; it also helps machines and people better understand and re-purpose its content. A valid a-level PDF/A will have text that can be reliably searched and copied, and content that is more accessible to technologies like screen readers for the blind.
- Content must be tagged with a hierarchical structure tree, meaning elements such as reading order, figures and tables are explicitly identified through metadata.
- The natural language of the document must be identified.
- Images and symbols must have alternative descriptive text.
- The file must include character mappings to Unicode for reliable search and copy.
Note: none of these requirements will change the visual appearance of a document.
Level u (Unicode)
Like a-level, u-level conformance requires character mapping to Unicode. However, it drops a-level requirements including embedded logical structure (i.e. tags and a structure tree). Therefore, a PDF/A meeting u-level conformance will have text that can be reliably searched and copied, but the reading order will not be guaranteed.
U-level was introduced with PDF/A-2, to allow organizations to guarantee that document text can be reliably searched and copied, without the file having to conform to other a-level requirements.
ZUGFeRD as an implementation of PDF/A-3
ZUGFeRD is a specific implementation of the PDF/A-3 standard that was developed in Germany for electronic invoicing. It includes a PDF/A-3 document and an XML file with the invoice data that is embedded within it (as PDF Attachment). This enables that invoice data to be easily extracted and processed by automated systems. ZUGFeRD uses the CII standard (Core Cross Industry Invoice) for embedded XML files with invoice data.