Core¶
The Core module defines the shared abstractions and contracts used across the entire system. This includes base models, shared utilities, registries, and cross-cutting concerns that must remain stable across pipeline stages.
Core code is intentionally minimal and conservative. It does not implement discovery, planning, or execution logic directly. Instead, it provides the structural glue that allows those layers to interoperate without tight coupling.
This layer exists to prevent duplication, enforce consistency, and provide a stable foundation for extension. Changes here have wide impact and are treated as schema-level decisions, not implementation details.
dita_package_processor.config
¶
Configuration loading utilities for the DITA Package Processor.
This module is responsible for loading runtime configuration from
pyproject.toml and extracting the tool-specific configuration
namespace.
load_config(path)
¶
Load and return the DITA Package Processor configuration.
The configuration is read from a TOML file and must contain a
[tool.dita_package_processor] section. Only that subsection
is returned to the caller.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Parsed configuration dictionary for the processor. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the configuration file does not exist. |
KeyError
|
If the required tool configuration section is missing. |
dita_package_processor.dita_xml
¶
DITA XML helper utilities built on top of lxml.
This module provides a small, opinionated set of helpers for reading, writing, and transforming DITA XML safely and consistently.
XmlDocument
dataclass
¶
Representation of an XML document on disk.
This class couples an lxml ElementTree with its filesystem path
and provides convenience access to the document root.
root
property
¶
Return the root XML element.
Returns:
| Type | Description |
|---|---|
_Element
|
Root element of the document. |
create_concept_topic_xml(path, topic_id, title)
¶
Create a minimal DITA concept topic XML document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Destination file path for the topic. |
required |
topic_id
|
str
|
Value for the |
required |
title
|
str
|
Topic title text. |
required |
Returns:
| Type | Description |
|---|---|
XmlDocument
|
New |
find_first_topicref_href(doc)
¶
Find the first topicref element with an href attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
XmlDocument
|
Map XML document. |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
|
first_href_to_map(doc)
¶
Find the first href pointing to a .ditamap file.
Intended for resolving the main map referenced by index.ditamap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
XmlDocument
|
Index map document. |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
|
get_map_title(doc)
¶
Extract a human-readable map title.
Title resolution order:
1. <title>
2. <topicmeta><navtitle>
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
XmlDocument
|
Map XML document. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Title text, or an empty string if not found. |
get_top_level_topicrefs(doc)
¶
Return direct child topicref or mapref elements of a map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
XmlDocument
|
Map XML document. |
required |
Returns:
| Type | Description |
|---|---|
List[_Element]
|
List of top-level topicref/mapref elements. |
read_xml(path)
¶
Read and parse an XML file from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the XML file. |
required |
Returns:
| Type | Description |
|---|---|
XmlDocument
|
Parsed |
transform_to_glossentry(doc)
¶
Transform a topic into a minimal glossentry topic in place.
Heuristic mapping rules:
- glossentry/@id: existing topic @id (fallback: gloss)
- glossterm: existing <title>
- glossdef/p: derived from <shortdesc> or body text
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
XmlDocument
|
Topic XML document to transform. |
required |
Returns:
| Type | Description |
|---|---|
XmlDocument
|
Updated |
write_xml(doc, path=None)
¶
Write an XML document back to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
XmlDocument
|
|
required |
path
|
Optional[Path]
|
Optional destination path. Defaults to |
None
|
dita_package_processor.orchestration
¶
Orchestration layer.
Thin glue between:
discovery → planning → execution
Design principles¶
This layer is intentionally boring.
It: - wires concrete implementations together - performs zero business logic - performs zero semantic inference - does not reinterpret discovery internals - delegates invariant enforcement to lower layers
If something changes, update this file explicitly. Do not make it smart.
ExecutorProtocol
¶
Bases: Protocol
Minimal execution contract exposed to orchestration.
get_executor(name, *, apply, source_root, sandbox_root)
¶
run_discovery(*, package_path)
¶
run_planning(*, discovery, package_path, definition_map, definition_navtitle, docx_stem)
¶
Execute planning phase.
Explicit conversion:
DiscoveryInventory → PlanningInput → Plan
This function performs no semantic reasoning. All invariants must already be enforced by discovery.
dita_package_processor.pipeline
¶
pipeline.py¶
Pipeline orchestration for the DITA Package Processor.
Coordinates:
Discovery → Planning → Materialization → Execution
The pipeline is the ONLY execution boundary.
CLI layers must never directly instantiate executors.
Pipeline
¶
Orchestrates the full DITA processing lifecycle.
The pipeline owns: - filesystem paths - materialization - executor wiring
CLI must remain thin and only call this boundary.
dita_package_processor.utils
¶
General utility functions for the DITA Package Processor.
This module contains small, reusable helpers that do not belong to any specific processing step.
slugify(value, max_len=60)
¶
Convert a string into a filesystem-friendly slug.
The transformation:
- Lowercases the input
- Removes non-alphanumeric characters (except whitespace and hyphens)
- Collapses whitespace, underscores, and hyphens into single underscores
- Trims leading and trailing underscores
- Truncates the result to max_len characters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
Input string to convert. |
required |
max_len
|
int
|
Maximum length of the returned slug. |
60
|
Returns:
| Type | Description |
|---|---|
str
|
Normalized slug string. |