Core¶

The Core module defines the shared abstractions and contracts used across the entire system. This includes base models, shared utilities, registries, and cross-cutting concerns that must remain stable across pipeline stages.

Core code is intentionally minimal and conservative. It does not implement discovery, planning, or execution logic directly. Instead, it provides the structural glue that allows those layers to interoperate without tight coupling.

This layer exists to prevent duplication, enforce consistency, and provide a stable foundation for extension. Changes here have wide impact and are treated as schema-level decisions, not implementation details.

`dita_package_processor.config` ¶

Configuration loading utilities for the DITA Package Processor.

This module is responsible for loading runtime configuration from pyproject.toml and extracting the tool-specific configuration namespace.

`load_config(path)` ¶

Load and return the DITA Package Processor configuration.

The configuration is read from a TOML file and must contain a [tool.dita_package_processor] section. Only that subsection is returned to the caller.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to the `pyproject.toml` configuration file.	required

Returns:

Type	Description
`Dict[str, Any]`	Parsed configuration dictionary for the processor.

Raises:

Type	Description
`FileNotFoundError`	If the configuration file does not exist.
`KeyError`	If the required tool configuration section is missing.

`dita_package_processor.dita_xml` ¶

DITA XML helper utilities built on top of lxml.

This module provides a small, opinionated set of helpers for reading, writing, and transforming DITA XML safely and consistently.

`XmlDocument` `dataclass` ¶

Representation of an XML document on disk.

This class couples an lxml ElementTree with its filesystem path and provides convenience access to the document root.

`root` `property` ¶

Return the root XML element.

Returns:

Type	Description
`_Element`	Root element of the document.

`create_concept_topic_xml(path, topic_id, title)` ¶

Create a minimal DITA concept topic XML document.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Destination file path for the topic.	required
`topic_id`	`str`	Value for the `id` attribute.	required
`title`	`str`	Topic title text.	required

Returns:

Type	Description
`XmlDocument`	New `XmlDocument` instance.

`find_first_topicref_href(doc)` ¶

Find the first topicref element with an href attribute.

Parameters:

Name	Type	Description	Default
`doc`	`XmlDocument`	Map XML document.	required

Returns:

Type	Description
`Optional[str]`	`href` value if found, otherwise `None`.

`first_href_to_map(doc)` ¶

Find the first href pointing to a .ditamap file.

Intended for resolving the main map referenced by index.ditamap.

Parameters:

Name	Type	Description	Default
`doc`	`XmlDocument`	Index map document.	required

Returns:

Type	Description
`Optional[str]`	`href` value if found, otherwise `None`.

`get_map_title(doc)` ¶

Extract a human-readable map title.

Title resolution order: 1. <title> 2. <topicmeta><navtitle>

Parameters:

Name	Type	Description	Default
`doc`	`XmlDocument`	Map XML document.	required

Returns:

Type	Description
`str`	Title text, or an empty string if not found.

`get_top_level_topicrefs(doc)` ¶

Return direct child topicref or mapref elements of a map.

Parameters:

Name	Type	Description	Default
`doc`	`XmlDocument`	Map XML document.	required

Returns:

Type	Description
`List[_Element]`	List of top-level topicref/mapref elements.

`read_xml(path)` ¶

Read and parse an XML file from disk.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to the XML file.	required

Returns:

Type	Description
`XmlDocument`	Parsed `XmlDocument` instance.

`transform_to_glossentry(doc)` ¶

Transform a topic into a minimal glossentry topic in place.

Heuristic mapping rules: - glossentry/@id: existing topic @id (fallback: gloss) - glossterm: existing <title> - glossdef/p: derived from <shortdesc> or body text

Parameters:

Name	Type	Description	Default
`doc`	`XmlDocument`	Topic XML document to transform.	required

Returns:

Type	Description
`XmlDocument`	Updated `XmlDocument` instance.

`write_xml(doc, path=None)` ¶

Write an XML document back to disk.

Parameters:

Name	Type	Description	Default
`doc`	`XmlDocument`	`XmlDocument` to write.	required
`path`	`Optional[Path]`	Optional destination path. Defaults to `doc.path`.	`None`

`dita_package_processor.orchestration` ¶

Orchestration layer.

Thin glue between:

discovery → planning → execution

Design principles¶

This layer is intentionally boring.

It: - wires concrete implementations together - performs zero business logic - performs zero semantic inference - does not reinterpret discovery internals - delegates invariant enforcement to lower layers

If something changes, update this file explicitly. Do not make it smart.

`ExecutorProtocol` ¶

Bases: Protocol

Minimal execution contract exposed to orchestration.

`get_executor(name, *, apply, source_root, sandbox_root)` ¶

Resolve execution backend.

Parameters¶

name : str Executor name. Supported: - "filesystem" - "noop" apply : bool Whether filesystem mutation is allowed. source_root : Path Source artifact root. sandbox_root : Path Output root.

Returns¶

ExecutorProtocol

`run_discovery(*, package_path)` ¶

Execute discovery phase.

Parameters¶

package_path : Path Root directory containing DITA package.

Returns¶

DiscoveryInventory

`run_planning(*, discovery, package_path, definition_map, definition_navtitle, docx_stem)` ¶

Execute planning phase.

Explicit conversion:

DiscoveryInventory → PlanningInput → Plan

This function performs no semantic reasoning. All invariants must already be enforced by discovery.

`dita_package_processor.pipeline` ¶

pipeline.py¶

Pipeline orchestration for the DITA Package Processor.

Coordinates:

Discovery → Planning → Materialization → Execution

The pipeline is the ONLY execution boundary.

CLI layers must never directly instantiate executors.

`Pipeline` ¶

Orchestrates the full DITA processing lifecycle.

The pipeline owns: - filesystem paths - materialization - executor wiring

CLI must remain thin and only call this boundary.

`execute_plan(*, plan_path, apply=None)` ¶

Execute an already-generated plan.

Skips discovery + planning.

Returns ExecutionReport.

`run(*, apply=None)` ¶

Execute full pipeline.

Returns ExecutionReport.

`dita_package_processor.utils` ¶

General utility functions for the DITA Package Processor.

This module contains small, reusable helpers that do not belong to any specific processing step.

`slugify(value, max_len=60)` ¶

Convert a string into a filesystem-friendly slug.

The transformation: - Lowercases the input - Removes non-alphanumeric characters (except whitespace and hyphens) - Collapses whitespace, underscores, and hyphens into single underscores - Trims leading and trailing underscores - Truncates the result to max_len characters

Parameters:

Name	Type	Description	Default
`value`	`str`	Input string to convert.	required
`max_len`	`int`	Maximum length of the returned slug.	`60`

Returns:

Type	Description
`str`	Normalized slug string.

Core¶

dita_package_processor.config ¶

load_config(path) ¶

dita_package_processor.dita_xml ¶

XmlDocument dataclass ¶

root property ¶

create_concept_topic_xml(path, topic_id, title) ¶

find_first_topicref_href(doc) ¶

first_href_to_map(doc) ¶

get_map_title(doc) ¶

get_top_level_topicrefs(doc) ¶

read_xml(path) ¶

transform_to_glossentry(doc) ¶

write_xml(doc, path=None) ¶

dita_package_processor.orchestration ¶

Design principles¶

ExecutorProtocol ¶

get_executor(name, *, apply, source_root, sandbox_root) ¶

Parameters¶

Returns¶

run_discovery(*, package_path) ¶

Parameters¶

Returns¶

run_planning(*, discovery, package_path, definition_map, definition_navtitle, docx_stem) ¶

dita_package_processor.pipeline ¶

pipeline.py¶

Pipeline ¶

execute_plan(*, plan_path, apply=None) ¶

run(*, apply=None) ¶

dita_package_processor.utils ¶

slugify(value, max_len=60) ¶

`dita_package_processor.config` ¶

`load_config(path)` ¶

`dita_package_processor.dita_xml` ¶

`XmlDocument` `dataclass` ¶

`root` `property` ¶

`create_concept_topic_xml(path, topic_id, title)` ¶

`find_first_topicref_href(doc)` ¶

`first_href_to_map(doc)` ¶

`get_map_title(doc)` ¶

`get_top_level_topicrefs(doc)` ¶

`read_xml(path)` ¶

`transform_to_glossentry(doc)` ¶

`write_xml(doc, path=None)` ¶

`dita_package_processor.orchestration` ¶

`ExecutorProtocol` ¶

`get_executor(name, *, apply, source_root, sandbox_root)` ¶

`run_discovery(*, package_path)` ¶

`run_planning(*, discovery, package_path, definition_map, definition_navtitle, docx_stem)` ¶

`dita_package_processor.pipeline` ¶

`Pipeline` ¶

`execute_plan(*, plan_path, apply=None)` ¶

`run(*, apply=None)` ¶

`dita_package_processor.utils` ¶

`slugify(value, max_len=60)` ¶