Skip to content

Knowledge

The Knowledge module provides pattern definitions and classification logic used during discovery and planning. It encodes reusable heuristics about DITA structures, naming conventions, and common package layouts.

This layer is descriptive, not prescriptive. It does not execute transformations or mutate data. Instead, it supplies structured knowledge that other stages can reference when making decisions.

By isolating heuristics in this module, the system avoids hard-coding assumptions into discovery or planning logic and makes those assumptions explicit, testable, and replaceable.

dita_package_processor.knowledge.invariants

Invariant definitions for DITA package processing.

This module defines invariants: conditions that must hold true for a DITA package to be safely processed by the pipeline.

Invariants differ from validation rules:

  • Validation answers: "Is this document well-formed or schema-valid?"
  • Invariants answer: "Is this package structurally processable?"

Invariant violations are always fatal and must halt execution.

InvariantViolation dataclass

Represents a violation of a structural invariant.

Violations are immutable and descriptive. They do not attempt to recover or suggest fixes.

assert_invariants(package_dir)

Assert that all filesystem invariants hold for the DITA package.

Parameters:

Name Type Description Default
package_dir Path

Root directory of the DITA package.

required

Raises:

Type Description
RuntimeError

If any invariant is violated.

evaluate_invariants(package_dir)

Evaluate all filesystem-level invariants for a DITA package.

Parameters:

Name Type Description Default
package_dir Path

Root directory of the DITA package.

required

Returns:

Type Description
List[InvariantViolation]

List of invariant violations.

invariant_contains_ditamap(package_dir)

Ensure the package contains at least one .ditamap file.

Parameters:

Name Type Description Default
package_dir Path

Root directory of the DITA package.

required

Returns:

Type Description
List[InvariantViolation]

List of invariant violations (empty if none).

invariant_package_root_exists(package_dir)

Ensure the package root directory exists and is a directory.

Parameters:

Name Type Description Default
package_dir Path

Root directory of the DITA package.

required

Returns:

Type Description
List[InvariantViolation]

List of invariant violations (empty if none).

invariant_topics_directory_present(package_dir)

Ensure the topics/ directory exists under the package root.

Parameters:

Name Type Description Default
package_dir Path

Root directory of the DITA package.

required

Returns:

Type Description
List[InvariantViolation]

List of invariant violations (empty if none).

validate_single_main_map(inventory)

Validate that exactly one main map exists in the discovery inventory.

Accepts either: - MapType enum values - string contract values ("MAIN_MAP")

This keeps invariants tolerant of internal vs serialized classification representations.

dita_package_processor.knowledge.known_patterns

Known structural patterns for DITA discovery.

This module loads and validates declarative discovery patterns defined in known_patterns.yaml. Patterns are normalized into immutable :class:Pattern objects suitable for deterministic evaluation.

This module performs: - no classification - no inference - no filesystem inspection beyond loading the YAML file

Its responsibility is limited to: YAML → validated data → normalized Pattern objects

load_normalized_patterns()

Load, validate, and normalize all discovery patterns.

This is the canonical entry point used by discovery classifiers.

Returns:

Type Description
List[Pattern]

List of normalized :class:Pattern instances.

Raises:

Type Description
ValueError

If any pattern is invalid.

load_patterns()

Load declarative discovery patterns from known_patterns.yaml.

Expected YAML structure::

version: 1
patterns:
  - id: ...
    applies_to: ...
    signals: ...
    asserts:
      role: ...
      confidence: ...
    rationale:
      - ...

This function only loads and validates the raw YAML structure. Pattern normalization is handled by :func:load_normalized_patterns.

Returns:

Type Description
Dict[str, Any]

Parsed YAML document.

Raises:

Type Description
ValueError

If structure is invalid.

dita_package_processor.knowledge.map_types

Canonical DITA artifact classification types.

This module defines the authoritative classification enums used during discovery and planning.

Design Principles
  • Deterministic
  • Explicit
  • Stable string values
  • No inference logic
  • No transformation logic
  • No mutation

These types represent observed structural intent only. They do not imply correctness.

ArtifactCategory

Bases: str, Enum

High-level artifact categories recognized by the processor.

Categories distinguish maps, topics, and other structural artifacts during discovery and inventory construction.

__str__()

Return stable string value.

MapType

Bases: str, Enum

Canonical DITA map classifications.

MapType values are assigned during discovery based on observed structural patterns.

A map may match multiple patterns during discovery, but must resolve to exactly one MapType.

__str__()

Return stable string value.

TopicType

Bases: str, Enum

Canonical DITA topic classifications.

TopicType values are derived from root element inspection and contextual usage.

__str__()

Return stable string value.