Discovery¶

The Discovery module performs a read-only structural scan of a DITA package. Its job is to observe what exists, not to interpret intent or make decisions.

Discovery identifies artifacts, extracts relationships, and produces a graph-based representation of the package. It does not normalize structure, enforce rules, or infer transformations. All findings are recorded with evidence and confidence where applicable.

The output of discovery is a durable, schema-validated artifact that represents the structural truth of the package at a point in time. All downstream stages depend on this output and must treat it as authoritative.

`dita_package_processor.discovery.classifiers` ¶

Discovery-time classifiers for DITA packages.

This module adapts declarative pattern evaluation into concrete classification outcomes used during discovery.

It performs no transformation and no inference beyond deterministic resolution of emitted pattern evidence.

Contract (Iteration 7 – locked):

• Evidence means something was observed. • No evidence means nothing was observed. • Fallback evidence is not evidence and must never appear in output. • If classification is None: - confidence must be None - evidence must be []

`classify_map(*, path, metadata)` ¶

Classify a DITA map using declarative pattern evaluation.

Map classifications are returned as MapType enum values.

`classify_topic(*, path, metadata)` ¶

Classify a DITA topic using declarative pattern evaluation.

Topics are classified using TopicType.

`dita_package_processor.discovery.graph` ¶

Dependency graph data structures for discovery.

This module defines the read-only graph model derived from discovery output.

Discovery is authoritative. The graph is a computed structure.

Schema contract: - discovery.relationships use: from / to / type / pattern_id - graph edges use: source / target / type / pattern_id

This module: - consumes discovery relationships - emits a stable graph contract - never invents structure

`DependencyEdge` `dataclass` ¶

Directed relationship between two artifacts.

Parameters:

Name	Type	Description	Default
`source`	`str`	Source artifact path.	required
`target`	`str`	Target artifact path.	required
`edge_type`	`str`	Relationship type (topicref, image, xref, etc).	required
`pattern_id`	`str`	Discovery pattern identifier.	required

`from_dict(data)` `classmethod` ¶

Build edge from graph serialization.

Expected keys: - source - target - type - pattern_id

`from_relationship(data)` `classmethod` ¶

Build edge from discovery.relationship entry.

Expected keys: - from - to - type - pattern_id

`to_dict()` ¶

Serialize edge to graph contract.

Output: { "source": "...", "target": "...", "type": "...", "pattern_id": "..." }

`DependencyGraph` `dataclass` ¶

Derived dependency graph.

Nodes are artifact paths. Edges are DependencyEdge instances.

`from_dict(data)` `classmethod` ¶

Deserialize from graph serialization, not discovery JSON.

`from_discovery(*, artifacts, relationships)` `classmethod` ¶

Build a graph from discovery JSON.

Discovery is authoritative. Graph must not contain unknown nodes.

Parameters:

Name	Type	Description	Default
`artifacts`	`Iterable[Dict[str, Any]]`	discovery["artifacts"]	required
`relationships`	`Iterable[Dict[str, Any]]`	discovery["relationships"]	required

`incoming(node)` ¶

Edges that point to node.

`outgoing(node)` ¶

Edges that originate from node.

`to_dict()` ¶

Serialize graph.

{ "nodes": [...], "edges": [...] }

`dita_package_processor.discovery.models` ¶

Discovery data models.

These models represent strictly observational records of what was found during DITA package discovery.

Design Principles¶

No inference
No transformation
No mutation of semantic meaning
Deterministic structure
Explicit invariants

Discovery records facts. Planning interprets them.

`DiscoveryArtifact` `dataclass` ¶

Observational record of a discovered filesystem artifact.

Rules¶

Media artifacts are structural only.
classification requires confidence.
evidence requires classification.
No semantic inference is performed here.

`classification_label()` ¶

Return normalized string label for classification.

Returns¶

Optional[str]

`to_dict()` ¶

Serialize artifact for contract transfer.

`DiscoveryInventory` `dataclass` ¶

Aggregational container of discovered artifacts.

Mutable during discovery.

`add_artifact(artifact=None, **kwargs)` ¶

Add a discovered artifact.

`resolve_main_map()` ¶

Resolve the single MAIN map.

Returns¶

Path

Raises¶

ValueError

`DiscoveryResult` `dataclass` ¶

Immutable result of discovery.

`main_map()` ¶

Return resolved MAIN map.

`DiscoverySummary` `dataclass` ¶

High-level summary of discovery.

`dita_package_processor.discovery.path_normalizer` ¶

Path normalization utilities for discovery.

This module normalizes relative and absolute references found in DITA documents into deterministic, package-root–relative paths.

It ensures that the dependency graph is stable and free from duplicate edges caused by path variation such as:

- ./topics/a.dita
- ../topics/a.dita
- topics/../topics/a.dita

All of these must resolve to the same canonical path string.

Single responsibility: Input: - source file path - raw reference string - package root Output: - normalized package-root–relative POSIX path string

This module performs: - no filesystem mutation - no semantic validation - no file existence checks

`normalize_reference_path(*, source_path, reference, package_root)` ¶

Normalize a referenced path relative to the source file and package root.

The returned value is always: - relative to the package root - using POSIX-style separators - fully normalized (no .. or . segments) - stable across platforms

Absolute references are interpreted as package-root–anchored paths.

Parameters:

Name	Type	Description	Default
`source_path`	`Path`	Path of the file containing the reference.	required
`reference`	`str`	Raw reference value (e.g. from `href` or `data`).	required
`package_root`	`Path`	Root directory of the DITA package.	required

Returns:

Type	Description
`str`	Normalized, package-root–relative path using POSIX separators.

Raises:

Type	Description
`ValueError`	If the normalized path escapes the package root.

`dita_package_processor.discovery.patterns` ¶

Pattern evaluation for DITA discovery.

This module defines the declarative pattern model and the evaluation engine that converts observed discovery signals into evidence.

It performs:

No resolution
No ranking
No inference
No mutation

It only answers:

“Given this artifact and these signals, what evidence exists?”

Fallback patterns are classification helpers only. They fire only when explicitly enabled.

`Evidence` `dataclass` ¶

Evidence emitted when a pattern matches an artifact.

`Pattern` `dataclass` ¶

Declarative structural pattern.

Parameters¶

id : str Unique pattern identifier. applies_to : str Artifact type this pattern applies to. signals : Dict[str, Any] Signal requirements. asserts : Dict[str, Any] Assertion payload. Must contain: - role - confidence rationale : List[str] Human-readable reasoning.

`PatternEvaluator` ¶

Evaluate patterns against a single discovery artifact.

Modes¶

Observation mode (default) - fallback patterns ignored

Classification mode (allow_fallback=True) - fallback patterns fire only if no semantic match

`evaluate(artifact, *, allow_fallback=False)` ¶

Evaluate all patterns against an artifact.

Parameters¶

artifact : DiscoveryArtifact Artifact to evaluate. allow_fallback : bool Enable fallback emission if no semantic match.

Returns¶

List[Evidence]

`dita_package_processor.discovery.relationships` ¶

Relationship extraction for DITA discovery.

This module extracts explicit, syntactic relationships between already- discovered artifacts by parsing DITA XML files.

It does NOT: - classify artifacts - mutate files - infer semantic intent

It ONLY records factual dependencies expressed in XML:

map → topic via
map → map via

topic → media via ,

Discovery¶

dita_package_processor.discovery.classifiers ¶

classify_map(*, path, metadata) ¶

classify_topic(*, path, metadata) ¶

dita_package_processor.discovery.graph ¶

DependencyEdge dataclass ¶

from_dict(data) classmethod ¶

from_relationship(data) classmethod ¶

to_dict() ¶

DependencyGraph dataclass ¶

from_dict(data) classmethod ¶

from_discovery(*, artifacts, relationships) classmethod ¶

incoming(node) ¶

outgoing(node) ¶

to_dict() ¶

dita_package_processor.discovery.models ¶

Design Principles¶

DiscoveryArtifact dataclass ¶

Rules¶

classification_label() ¶

Returns¶

to_dict() ¶

DiscoveryInventory dataclass ¶

add_artifact(artifact=None, **kwargs) ¶

resolve_main_map() ¶

Returns¶

Raises¶

DiscoveryResult dataclass ¶

main_map() ¶

DiscoverySummary dataclass ¶

dita_package_processor.discovery.path_normalizer ¶

normalize_reference_path(*, source_path, reference, package_root) ¶

dita_package_processor.discovery.patterns ¶

Evidence dataclass ¶

Pattern dataclass ¶

Parameters¶

PatternEvaluator ¶

Modes¶

evaluate(artifact, *, allow_fallback=False) ¶

Parameters¶

Returns¶

dita_package_processor.discovery.relationships ¶

RelationshipExtractor ¶

__init__(package_root) ¶

extract(artifacts) ¶

dita_package_processor.discovery.report ¶

DiscoveryReport dataclass ¶

summary() ¶

to_dict() ¶

dita_package_processor.discovery.scanner ¶

Responsibilities¶

DiscoveryScanner ¶

__init__(package_dir) ¶

Parameters¶

scan() ¶

Returns¶

dita_package_processor.discovery.signatures ¶

MapSignature dataclass ¶

TopicSignature dataclass ¶

extract_map_signature(map_path) ¶

extract_topic_signature(topic_path) ¶

has_maprefs(root) ¶

has_title(root) ¶

has_topicrefs(root) ¶

`dita_package_processor.discovery.classifiers` ¶

`classify_map(*, path, metadata)` ¶

`classify_topic(*, path, metadata)` ¶

`dita_package_processor.discovery.graph` ¶

`DependencyEdge` `dataclass` ¶

`from_dict(data)` `classmethod` ¶

`from_relationship(data)` `classmethod` ¶

`to_dict()` ¶

`DependencyGraph` `dataclass` ¶

`from_dict(data)` `classmethod` ¶

`from_discovery(*, artifacts, relationships)` `classmethod` ¶

`incoming(node)` ¶

`outgoing(node)` ¶

`to_dict()` ¶

`dita_package_processor.discovery.models` ¶

`DiscoveryArtifact` `dataclass` ¶

`classification_label()` ¶

`to_dict()` ¶

`DiscoveryInventory` `dataclass` ¶

`add_artifact(artifact=None, **kwargs)` ¶

`resolve_main_map()` ¶

`DiscoveryResult` `dataclass` ¶

`main_map()` ¶

`DiscoverySummary` `dataclass` ¶

`dita_package_processor.discovery.path_normalizer` ¶

`normalize_reference_path(*, source_path, reference, package_root)` ¶

`dita_package_processor.discovery.patterns` ¶

`Evidence` `dataclass` ¶

`Pattern` `dataclass` ¶

`PatternEvaluator` ¶

`evaluate(artifact, *, allow_fallback=False)` ¶

`dita_package_processor.discovery.relationships` ¶

`RelationshipExtractor` ¶

`init(package_root)` ¶

`extract(artifacts)` ¶

`dita_package_processor.discovery.report` ¶

`DiscoveryReport` `dataclass` ¶

`summary()` ¶

`to_dict()` ¶

`dita_package_processor.discovery.scanner` ¶

`DiscoveryScanner` ¶

`init(package_dir)` ¶

`scan()` ¶

`dita_package_processor.discovery.signatures` ¶

`MapSignature` `dataclass` ¶

`TopicSignature` `dataclass` ¶

`extract_map_signature(map_path)` ¶

`extract_topic_signature(topic_path)` ¶

`has_maprefs(root)` ¶

`has_title(root)` ¶

`has_topicrefs(root)` ¶