Skip to content

Core

The Core module defines the shared abstractions and contracts used across the entire system. This includes base models, shared utilities, registries, and cross-cutting concerns that must remain stable across pipeline stages.

Core code is intentionally minimal and conservative. It does not implement discovery, planning, or execution logic directly. Instead, it provides the structural glue that allows those layers to interoperate without tight coupling.

This layer exists to prevent duplication, enforce consistency, and provide a stable foundation for extension. Changes here have wide impact and are treated as schema-level decisions, not implementation details.

dita_package_processor.config

Configuration loading utilities for the DITA Package Processor.

This module is responsible for loading runtime configuration from pyproject.toml and extracting the tool-specific configuration namespace.

load_config(path)

Load and return the DITA Package Processor configuration.

The configuration is read from a TOML file and must contain a [tool.dita_package_processor] section. Only that subsection is returned to the caller.

Parameters:

Name Type Description Default
path Path

Path to the pyproject.toml configuration file.

required

Returns:

Type Description
Dict[str, Any]

Parsed configuration dictionary for the processor.

Raises:

Type Description
FileNotFoundError

If the configuration file does not exist.

KeyError

If the required tool configuration section is missing.

dita_package_processor.dita_xml

DITA XML helper utilities built on top of lxml.

This module provides a small, opinionated set of helpers for reading, writing, and transforming DITA XML safely and consistently.

XmlDocument dataclass

Representation of an XML document on disk.

This class couples an lxml ElementTree with its filesystem path and provides convenience access to the document root.

root property

Return the root XML element.

Returns:

Type Description
_Element

Root element of the document.

create_concept_topic_xml(path, topic_id, title)

Create a minimal DITA concept topic XML document.

Parameters:

Name Type Description Default
path Path

Destination file path for the topic.

required
topic_id str

Value for the id attribute.

required
title str

Topic title text.

required

Returns:

Type Description
XmlDocument

New XmlDocument instance.

find_first_topicref_href(doc)

Find the first topicref element with an href attribute.

Parameters:

Name Type Description Default
doc XmlDocument

Map XML document.

required

Returns:

Type Description
Optional[str]

href value if found, otherwise None.

first_href_to_map(doc)

Find the first href pointing to a .ditamap file.

Intended for resolving the main map referenced by index.ditamap.

Parameters:

Name Type Description Default
doc XmlDocument

Index map document.

required

Returns:

Type Description
Optional[str]

href value if found, otherwise None.

get_map_title(doc)

Extract a human-readable map title.

Title resolution order: 1. <title> 2. <topicmeta><navtitle>

Parameters:

Name Type Description Default
doc XmlDocument

Map XML document.

required

Returns:

Type Description
str

Title text, or an empty string if not found.

get_top_level_topicrefs(doc)

Return direct child topicref or mapref elements of a map.

Parameters:

Name Type Description Default
doc XmlDocument

Map XML document.

required

Returns:

Type Description
List[_Element]

List of top-level topicref/mapref elements.

read_xml(path)

Read and parse an XML file from disk.

Parameters:

Name Type Description Default
path Path

Path to the XML file.

required

Returns:

Type Description
XmlDocument

Parsed XmlDocument instance.

transform_to_glossentry(doc)

Transform a topic into a minimal glossentry topic in place.

Heuristic mapping rules: - glossentry/@id: existing topic @id (fallback: gloss) - glossterm: existing <title> - glossdef/p: derived from <shortdesc> or body text

Parameters:

Name Type Description Default
doc XmlDocument

Topic XML document to transform.

required

Returns:

Type Description
XmlDocument

Updated XmlDocument instance.

write_xml(doc, path=None)

Write an XML document back to disk.

Parameters:

Name Type Description Default
doc XmlDocument

XmlDocument to write.

required
path Optional[Path]

Optional destination path. Defaults to doc.path.

None

dita_package_processor.orchestration

Orchestration layer.

Thin glue between:

discovery → planning → execution
Design principles

This layer is intentionally boring.

It: - wires concrete implementations together - performs zero business logic - performs zero semantic inference - does not reinterpret discovery internals - delegates invariant enforcement to lower layers

If something changes, update this file explicitly. Do not make it smart.

ExecutorProtocol

Bases: Protocol

Minimal execution contract exposed to orchestration.

get_executor(name, *, apply, source_root, sandbox_root)

Resolve execution backend.

Parameters

name : str Executor name. Supported: - "filesystem" - "noop" apply : bool Whether filesystem mutation is allowed. source_root : Path Source artifact root. sandbox_root : Path Output root.

Returns

ExecutorProtocol

run_discovery(*, package_path)

Execute discovery phase.

Parameters

package_path : Path Root directory containing DITA package.

Returns

DiscoveryInventory

run_planning(*, discovery, package_path, definition_map, definition_navtitle, docx_stem)

Execute planning phase.

Explicit conversion:

DiscoveryInventory → PlanningInput → Plan

This function performs no semantic reasoning. All invariants must already be enforced by discovery.

dita_package_processor.pipeline

pipeline.py

Pipeline orchestration for the DITA Package Processor.

Coordinates:

Discovery → Planning → Materialization → Execution

The pipeline is the ONLY execution boundary.

CLI layers must never directly instantiate executors.

Pipeline

Orchestrates the full DITA processing lifecycle.

The pipeline owns: - filesystem paths - materialization - executor wiring

CLI must remain thin and only call this boundary.

execute_plan(*, plan_path, apply=None)

Execute an already-generated plan.

Skips discovery + planning.

Returns ExecutionReport.

run(*, apply=None)

Execute full pipeline.

Returns ExecutionReport.

dita_package_processor.utils

General utility functions for the DITA Package Processor.

This module contains small, reusable helpers that do not belong to any specific processing step.

slugify(value, max_len=60)

Convert a string into a filesystem-friendly slug.

The transformation: - Lowercases the input - Removes non-alphanumeric characters (except whitespace and hyphens) - Collapses whitespace, underscores, and hyphens into single underscores - Trims leading and trailing underscores - Truncates the result to max_len characters

Parameters:

Name Type Description Default
value str

Input string to convert.

required
max_len int

Maximum length of the returned slug.

60

Returns:

Type Description
str

Normalized slug string.