Skip to content

DITA Package Processor

The DITA Package Processor deterministically analyzes, plans, and transforms DITA 1.3 packages into a controlled, publication-ready structure.

It operates as a strict, multi-phase batch pipeline with explicit boundaries between:

  • Discovery – observe and describe what exists
  • Planning – derive a validated, auditable execution plan
  • Execution – dispatch deterministic actions
  • Materialization – finalize and persist execution results

Every run is explicit, repeatable, and explainable.

There is no inference.
There is no runtime guessing.
Nothing mutates unless explicitly permitted.

What It Does

Given a bulk-generated DITA package, the processor executes a deterministic workflow.

Each phase:

  • has a single responsibility
  • produces a durable, machine-readable artifact
  • is independently testable
  • refuses to proceed on invalid or ambiguous input

The system is designed so that structure is proven before behavior is allowed.

Phase 1: Discovery (Read-only)

Discovery scans the input package without mutating any files.

It:

  • Locates maps, topics, and media
  • Classifies artifacts using declared patterns
  • Builds a dependency graph
  • Records structural relationships
  • Detects ambiguous or unsupported conditions

Discovery produces a Discovery Inventory, describing:

  • artifacts and their observable roles
  • dependency edges between artifacts
  • structural invariants
  • evidence used for classification

Discovery answers one question:

What is actually in this package?

If something cannot be observed safely, it is not acted upon later.

Discovery never mutates the filesystem.

Phase 2: Planning (Deterministic, Non-destructive)

Planning converts discovery output into an explicit Plan.

The plan:

  • contains no executable logic
  • describes what actions would be taken
  • preserves deterministic ordering
  • is schema-validated
  • is serializable and reviewable

Planning does not touch the filesystem.

It does not infer new structure.
It does not “fix” ambiguity.

If planning cannot prove that an action is structurally valid, it refuses to produce a plan.

Planning answers a different question:

What would we do if execution were allowed?

Phase 3: Execution (Explicit and Bounded)

Execution consumes a validated plan and dispatches actions using a concrete executor.

Execution properties:

  • Dry-run is the default
  • Filesystem mutation requires explicit --apply
  • All writes are sandboxed to a declared target directory
  • Actions execute exactly once, in order
  • Results are captured as structured data

Two executors exist:

DryRunExecutor (noop)

  • Simulates execution
  • Performs no mutation
  • Used when --apply is absent

FilesystemExecutor (filesystem)

  • Performs real filesystem operations
  • Enforces sandbox boundaries
  • Requires explicit --apply

Execution produces an immutable ExecutionReport containing:

  • execution identity
  • mode (dry-run or apply)
  • ordered action results
  • success, skip, and failure states

Logs are not the contract.
The ExecutionReport is.

Phase 4: Materialization

Materialization operates strictly on execution results.

It:

  • performs preflight validation
  • ensures target directory safety
  • finalizes output artifacts
  • writes execution reports when requested

Materialization never performs discovery or planning logic.

It is the final boundary before persistence.

Quick Start

Run the full pipeline (safe by default)

dita_package_processor run \
  --package /path/to/dita/package \
  --docx-stem OutputDoc

This performs:

  • Discovery
  • Planning
  • Dry-run execution

No filesystem mutation occurs.

Apply changes explicitly

dita_package_processor run \
  --package /path/to/dita/package \
  --target build \
  --docx-stem OutputDoc \
  --apply

Mutation is:

  • explicit
  • bounded
  • sandboxed

--apply requires --target.

Execute an existing plan

dita_package_processor execute \
  --plan plan.json \
  --target build \
  --apply

This:

  • skips discovery
  • skips planning
  • executes the validated plan

Configuration Model

Runtime behavior may be influenced via pyproject.toml:

[tool.dita_package_processor]

Configuration may control:

  • optional planning behaviors
  • naming conventions
  • feature enablement

Precedence:

CLI arguments > pyproject.toml > defaults

Configuration never overrides structural validation.
It only enables predefined, safe behaviors.

Execution Model

The pipeline is linear and explicit:

Discovery → Planning → Execution → Materialization

There is:

  • no implicit branching
  • no hidden retries
  • no heuristic repair
  • no runtime inference

Each boundary is enforced by schema and contract validation.

Why This Exists

This tool exists because real DITA corpora are inconsistent.

In practice:

  • Bulk exports contain structural ambiguity
  • Scripts mutate content without proof
  • Assumptions fail silently
  • Errors surface too late

The DITA Package Processor favors:

  • Observation before action
  • Plans before mutation
  • Determinism over cleverness
  • Explicit boundaries over implicit behavior
  • Systems that survive hostile corpora

If you want deterministic batch processing over real DITA packages, this tool exists for you.

If you want inference, auto-repair, or magic, it does not.

Documentation Map

  • Getting Started
  • CLI Reference
  • Discovery Architecture
  • Planning Contracts
  • Execution Model
  • Materialization
  • Configuration (pyproject.toml)
  • Design Rationale
  • Extension Guide
  • Testing Strategy

Each document is intentionally narrow and non-overlapping.

Summary

The DITA Package Processor is conservative by design:

  • Discovery before planning
  • Planning before execution
  • Dry-run before mutation
  • Explicit permission required
  • Deterministic ordering
  • Schema-validated artifacts
  • No hidden behavior

This system is built to survive real-world DITA packages, not ideal ones.