DITA Package Processor¶
The DITA Package Processor deterministically analyzes, plans, and transforms DITA 1.3 packages into a controlled, publication-ready structure.
It operates as a strict, multi-phase batch pipeline with explicit boundaries between:
- Discovery – observe and describe what exists
- Planning – derive a validated, auditable execution plan
- Execution – dispatch deterministic actions
- Materialization – finalize and persist execution results
Every run is explicit, repeatable, and explainable.
There is no inference.
There is no runtime guessing.
Nothing mutates unless explicitly permitted.
What It Does¶
Given a bulk-generated DITA package, the processor executes a deterministic workflow.
Each phase:
- has a single responsibility
- produces a durable, machine-readable artifact
- is independently testable
- refuses to proceed on invalid or ambiguous input
The system is designed so that structure is proven before behavior is allowed.
Phase 1: Discovery (Read-only)¶
Discovery scans the input package without mutating any files.
It:
- Locates maps, topics, and media
- Classifies artifacts using declared patterns
- Builds a dependency graph
- Records structural relationships
- Detects ambiguous or unsupported conditions
Discovery produces a Discovery Inventory, describing:
- artifacts and their observable roles
- dependency edges between artifacts
- structural invariants
- evidence used for classification
Discovery answers one question:
What is actually in this package?
If something cannot be observed safely, it is not acted upon later.
Discovery never mutates the filesystem.
Phase 2: Planning (Deterministic, Non-destructive)¶
Planning converts discovery output into an explicit Plan.
The plan:
- contains no executable logic
- describes what actions would be taken
- preserves deterministic ordering
- is schema-validated
- is serializable and reviewable
Planning does not touch the filesystem.
It does not infer new structure.
It does not “fix” ambiguity.
If planning cannot prove that an action is structurally valid, it refuses to produce a plan.
Planning answers a different question:
What would we do if execution were allowed?
Phase 3: Execution (Explicit and Bounded)¶
Execution consumes a validated plan and dispatches actions using a concrete executor.
Execution properties:
- Dry-run is the default
- Filesystem mutation requires explicit
--apply - All writes are sandboxed to a declared target directory
- Actions execute exactly once, in order
- Results are captured as structured data
Two executors exist:
DryRunExecutor (noop)¶
- Simulates execution
- Performs no mutation
- Used when
--applyis absent
FilesystemExecutor (filesystem)¶
- Performs real filesystem operations
- Enforces sandbox boundaries
- Requires explicit
--apply
Execution produces an immutable ExecutionReport containing:
- execution identity
- mode (dry-run or apply)
- ordered action results
- success, skip, and failure states
Logs are not the contract.
The ExecutionReport is.
Phase 4: Materialization¶
Materialization operates strictly on execution results.
It:
- performs preflight validation
- ensures target directory safety
- finalizes output artifacts
- writes execution reports when requested
Materialization never performs discovery or planning logic.
It is the final boundary before persistence.
Quick Start¶
Run the full pipeline (safe by default)¶
dita_package_processor run \
--package /path/to/dita/package \
--docx-stem OutputDoc
This performs:
- Discovery
- Planning
- Dry-run execution
No filesystem mutation occurs.
Apply changes explicitly¶
dita_package_processor run \
--package /path/to/dita/package \
--target build \
--docx-stem OutputDoc \
--apply
Mutation is:
- explicit
- bounded
- sandboxed
--apply requires --target.
Execute an existing plan¶
dita_package_processor execute \
--plan plan.json \
--target build \
--apply
This:
- skips discovery
- skips planning
- executes the validated plan
Configuration Model¶
Runtime behavior may be influenced via pyproject.toml:
[tool.dita_package_processor]
Configuration may control:
- optional planning behaviors
- naming conventions
- feature enablement
Precedence:
CLI arguments > pyproject.toml > defaults
Configuration never overrides structural validation.
It only enables predefined, safe behaviors.
Execution Model¶
The pipeline is linear and explicit:
Discovery → Planning → Execution → Materialization
There is:
- no implicit branching
- no hidden retries
- no heuristic repair
- no runtime inference
Each boundary is enforced by schema and contract validation.
Why This Exists¶
This tool exists because real DITA corpora are inconsistent.
In practice:
- Bulk exports contain structural ambiguity
- Scripts mutate content without proof
- Assumptions fail silently
- Errors surface too late
The DITA Package Processor favors:
- Observation before action
- Plans before mutation
- Determinism over cleverness
- Explicit boundaries over implicit behavior
- Systems that survive hostile corpora
If you want deterministic batch processing over real DITA packages, this tool exists for you.
If you want inference, auto-repair, or magic, it does not.
Documentation Map¶
- Getting Started
- CLI Reference
- Discovery Architecture
- Planning Contracts
- Execution Model
- Materialization
- Configuration (
pyproject.toml) - Design Rationale
- Extension Guide
- Testing Strategy
Each document is intentionally narrow and non-overlapping.
Summary¶
The DITA Package Processor is conservative by design:
- Discovery before planning
- Planning before execution
- Dry-run before mutation
- Explicit permission required
- Deterministic ordering
- Schema-validated artifacts
- No hidden behavior
This system is built to survive real-world DITA packages, not ideal ones.