Publishing

ONIX 3.0 Metadata: Why Ebooks Don't Sell Without It

· 7 min read · By the Emayyam Infotech team

A publisher can spend weeks producing a clean, accessible ebook and still watch it sell nothing, because the file the retailer ranks, displays and lets shoppers buy is not the EPUB at all. It is the metadata feed. ONIX is the standard that carries that feed, and when the ONIX is thin, stale or malformed, the book is effectively invisible: it surfaces for the wrong searches, shows the wrong price, or never appears on the shelf the buyer is browsing. The production quality of a title and its commercial performance are decided by two different files, and only one of them is the file most teams obsess over.

ONIX metadata for ebooks is unglamorous, rule-bound and easy to defer to a spreadsheet someone updates by hand, which is exactly why it goes wrong so often. This post covers what ONIX 3.0 is, how a product record is organized, which fields retailers act on, why accessibility metadata has become gating, and how a conversion house should deliver it so the data is as trustworthy as the file it describes.

What ONIX 3.0 actually is

ONIX for Books is the international XML standard for communicating book-trade product information, maintained by EDItEUR, which develops standards for the book and serials supply chain. An ONIX message is not a description of how a book looks; it is a structured statement of everything a trading partner needs to list, classify, price and sell a title: identifiers, contributors, subjects, descriptions, prices, availability and rights. Publishers send it to aggregators, wholesalers, retailers and library platforms, and each partner ingests it into systems the publisher never sees.

The version that matters now is ONIX 3.0. The earlier 2.1 release has been retired and the trade has standardized on 3.0, which restructured the record to handle digital products, complex rights and accessibility in ways 2.1 could not [HV: verify before publish — exact ONIX 2.1 end-of-support timeline and any current retailer-specific 3.0 mandates]. For ebooks the move was not optional: digital product forms, usage constraints and accessibility declarations live in structures that only 3.0 provides, so a publisher still emitting the old version is excluded from precisely the metadata that digital retail now depends on.

Inside a product record: the blocks

A 3.0 message has a header identifying the sender and the feed, followed by one product record per title or saleable format. Within each record the data is grouped into blocks, each owning a different part of the commercial story, so a partner can pull subjects from one place, prices from another and availability from a third without guessing. Seeing the blocks is the quickest way to understand why a single missing field can break a listing while the rest looks complete.

At a high level the blocks cover the product's descriptive detail, the marketing collateral that sells it, the content it contains, the publishing detail behind it, related products, and the product-supply terms that make it buyable. The supply block is where many digital problems hide: it carries price, currency, market and availability, and it is the part most often left stale when a promotion ends or a title changes status.

  • Descriptive detail: identifiers, title, contributors, product form, subjects
  • Collateral: descriptions, author biographies, cover images, review quotes
  • Content: tables of contents and chapter-level detail where supplied
  • Publishing detail: publisher, imprint, publication dates, rights territories
  • Related material: other editions, formats and related works
  • Product supply: supplier, market, availability status, price and currency [HV: verify before publish — exact ONIX 3.0 block names and ordering against the current EDItEUR specification]

The fields retailers actually require

Retailers do not act on every element a publisher can send; they act on the handful that drive discovery, display and transaction, and they reject or quarantine records that omit them. The non-negotiable spine is a valid identifier, almost always a 13-digit ISBN assigned to the specific digital format, because a print ISBN reused for the ebook corrupts sales reporting on both products. Title and subtitle, contributors with correctly coded roles, and a product form declaring a digital book in EPUB or PDF rather than a physical one complete the minimum a record needs to exist.

Discovery then depends on subject classification, and here ebooks live or die by their codes. Standard subject schemes — BISAC in North American retail and Thema for international trade — map a title onto the categories shoppers actually browse, so a vague or missing classification is the difference between appearing in a relevant list and appearing nowhere. The commercial fields then finish the job: price and currency per market, the availability status that decides whether a retailer shows a buy button, and the territorial rights that set where a title may be sold. Each is a controlled codelist value rather than free text, so an almost-right entry is treated as wrong, not approximately right.

  • ISBN-13 unique to the digital format, never shared with print
  • Title, subtitle and series, where applicable
  • Contributors with correctly coded contributor roles
  • Digital product form and format detail (EPUB, PDF)
  • Subject codes: BISAC and Thema [HV: verify before publish — required subject schemes per target retailer]
  • Price, currency, market and availability status
  • Territorial sales rights

Accessibility metadata now gates the listing

The most consequential recent change is that accessibility metadata has moved from a nicety to a condition of sale in important markets. With the European Accessibility Act applying to ebooks sold into the EU, retailers and library platforms increasingly require records to declare a title's accessibility characteristics, and a record that stays silent can be filtered out of the very catalogues it needs to reach. The metadata is no longer merely informative; it is gating.

ONIX carries these declarations using a controlled vocabulary aligned with the same schema.org accessibility properties referenced by the EPUB Accessibility specification: the access modes a reader can use, features such as structured navigation, alternative text or MathML, any hazards such as flashing content, and a plain-language summary of what the book offers an assistive-technology user [HV: verify before publish — current ONIX accessibility codelist and the specific feature and summary values required, which have shifted as the EAA has come into force]. The data exists for discoverability: it lets a reader who needs reflowable text or read-aloud find a book that provides it, and lets a procurement team filter for conformance.

It is worth separating two things that are often conflated: the accessibility block describes the book; it does not make the book accessible. Producing a genuinely born-accessible EPUB is a separate production discipline, and the metadata must tell the truth about what that production delivered. Claiming features the file lacks is worse than silence, because it fails the reader at the moment of trust and invites a complaint.

Delivering ONIX from production, not a spreadsheet

The reliable way to get ONIX right is to generate it from the same structured source the book is produced from, rather than rekeying it into a spreadsheet downstream. When a title is held as structured content — the XML-first approach we use across conversion work — identifiers, contributors, subjects and format details are derived from the source and emitted as conformant ONIX, so the metadata and the file describe the same book by construction rather than by luck, and a correction made once flows into both.

From there the discipline mirrors the one that governs the files themselves. Almost every field is a coded value from an EDItEUR codelist, and EDItEUR revises those lists on a regular cycle [HV: verify before publish — current ONIX codelist issue cadence and latest issue number], so validation must run against the current lists: a feed that was clean two years ago can drift out of conformance by standing still. We validate EPUBs with epubcheck before they ship; metadata deserves an equivalent gate, with a QA and validation report on each batch and the accessibility block checked against the EPUB it describes. For a backlist the same pipeline runs in batch, which is how our publishing service handles volume without losing consistency.

  • Generate ONIX from the structured source, not by rekeying
  • Validate against the schema and current codelists before sending
  • Match accessibility declarations to the book's real features
  • Use correct, current codelist values for every coded field
  • Deliver a QA and validation report with each batch
  • Re-emit and resend when prices, status or rights change

The bottom line

Metadata is not the paperwork that follows the book; in digital retail it is the product the store actually sees. An ONIX 3.0 feed that is complete, correctly coded, accessibility-aware and current lets a well-made ebook be found, classified, priced and bought; a feed that is none of those things quietly cancels the investment in the book itself. The two files have to be equally good, because a reader can only buy the one the metadata lets them find.

If metadata has been an afterthought, start where the leverage is: fix the pipeline that generates ONIX from your source, validate against current codelists at the gate, and make accessibility declarations honest and complete. The work is unglamorous and invisible when it goes right — which is exactly the discipline that separates titles that sell from titles that merely exist.

Need help putting this into practice?

Our team does this work every day — get a free consultation on your project.