Understanding the DITA Package Processor Pipeline¶

A Practical Narrative for Users

When you use the DITA Package Processor, you are not running “a script that fixes DITA.”
You are running a pipeline with clearly separated responsibilities.

Understanding those responsibilities is the key to using the tool confidently, safely, and repeatably.

This document explains what each stage is responsible for, what it produces, and how you should interpret its output as a user.

The Big Picture¶

The processor is intentionally divided into stages:

Discovery → Normalize → Plan → Execute

Each stage answers a different question.
No stage tries to answer more than one.

This separation is what makes the system predictable instead of surprising.

Stage 1: Discovery¶

“What is actually in this package?”

Discovery is read-only. It does not change files. Ever.

When you run discovery, the tool:

scans maps, topics, and media
records what files exist
extracts relationships (map references, topic references, images)
builds a dependency graph

What discovery does not do:

guess intent
fix problems
rename files
restructure content

How to think about discovery output¶

Discovery produces a JSON report that answers:

What files are present?
How are they connected?
Which map appears to be the main entry point?
Are there patterns the tool recognizes?
Are there things it cannot safely interpret?

As a user, you should read discovery output as a factual inventory, not a decision.

If discovery cannot prove something safely, it records uncertainty instead of guessing.

That restraint is deliberate.

Stage 2: Normalize¶

“Is this discovery result safe to plan from?”

Normalization is a quiet but important step.

It takes the raw discovery report and:

validates it against a strict contract
ensures a single, unambiguous main map exists
confirms that artifacts and relationships are internally consistent

If normalization fails, the pipeline stops.

What normalization means for you¶

Normalization is the tool saying:

“I understand this package well enough to make decisions about it.”

If normalization succeeds, you know:

discovery was structurally coherent
downstream steps are operating on stable ground

You usually don’t need to read the normalized output unless something fails later.

Stage 3: Planning¶

“What would we do to this package?”

Planning is where intent is declared.

Based on discovery results, the planner produces a plan: an explicit list of actions, in order.

Examples of actions: - copy a map - copy a topic - copy media - rename an artifact - wrap map content - refactor glossary entries

What a plan is (and is not)¶

A plan is:

deterministic
explicit
reviewable
safe to inspect
free of execution logic

A plan is not: - execution - a script - a guess - a best effort

Planning answers one question:

“If we were allowed to change things, what exactly would we do, and why?”

As a user, this is your decision checkpoint.

You can: - inspect the plan - version it - diff it - approve it - reject it

Nothing has been mutated yet.

Stage 4: Execute (Dry-Run)¶

“Can this plan be carried out?”

Execution defaults to dry-run mode.

In dry-run: - every action is dispatched - handlers are resolved - ordering is tested - failures surface early - no files are written

What dry-run tells you¶

A successful dry-run means:

the plan is internally coherent
required handlers exist
execution order is valid
nothing obvious will fail immediately

Dry-run does not guarantee filesystem success.
It guarantees logical executability.

Think of it as a rehearsal, not the performance.

Stage 5: Execute (`--apply`)¶

“Now actually do it.”

When you rerun execution with --apply, the tool switches to a filesystem executor.

At this point:

files are copied
directories are created
content is materialized into the target location
every mutation is logged and reported

Execution produces an Execution Report, which records:

which actions ran
which handler handled them
whether each action succeeded or failed
whether the run was dry-run or apply

What execution means for you¶

Execution is intentionally boring.

It does exactly what the plan said.
Nothing more. Nothing less.

If execution surprises you, the plan is where the explanation lives.

Why This Separation Matters (For You)¶

Most tools blur these steps together. That makes them convenient, but fragile.

This tool does not.

Because concerns are separated, you can:

inspect reality before decisions are made
approve intent before files are touched
dry-run before risking mutation
explain why a change happened after the fact

If something goes wrong, you know where to look:

Problem	Where to look
Missing files	Discovery
Wrong assumptions	Discovery / Normalize
Wrong intent	Plan
Wrong mutation	Execute
Unexpected change	Plan (not execution)

That clarity is the real feature.

How to Use This as a User¶

A safe, recommended workflow is:

Run discovery
Learn what the tool sees.
Generate a plan
Learn what the tool intends.
Run execute without --apply
Confirm it is executable.
Run execute with --apply
Materialize the result.

You are always in control of when reality changes.

Final Takeaway¶

The DITA Package Processor is not optimized for speed or cleverness.

It is optimized for: - predictability - explainability - repeatability - safety at scale

Once you understand the pipeline as a set of separate concerns, the tool stops feeling strict and starts feeling reliable.

That reliability is what lets you trust it with real documentation.