Skip to content

Testing and Validation

The DITA Package Processor is validated using pytest with a focus on structural correctness, observability, and deterministic behavior.

Tests are designed to confirm not just that transformations work, but that:

  • the input corpus is correctly understood
  • assumptions are made explicit
  • unsafe situations are detected before mutation
  • failures are loud, early, and explainable

This is not a mock-heavy unit test suite.
It is a system-level validation of how the processor behaves against real data.


Running the Test Suite

From the repository root:

pytest -q

This runs the full test suite, including:

  • discovery and classification
  • knowledge invariants
  • CLI contract validation
  • end-to-end pipeline execution

There is no separate “unit-only” mode.
The system is tested the way it is actually used.


Testing Strategy Overview

The project follows a layered testing strategy aligned with the architecture:

Discovery  →  Knowledge  →  Transformation  →  CLI

Each layer has its own tests and failure semantics.


Discovery Tests

Discovery tests validate observation, not mutation.

Located under:

tests/discovery/

What Discovery Tests Assert

  • Maps and topics are correctly scanned from disk
  • Structural signatures are detected reliably
  • Artifacts are classified deterministically
  • Unknown or ambiguous structures are reported, not guessed
  • Discovery reports are complete and internally consistent

These tests ensure that the processor understands what is in the package before any transformation is attempted.

Key Properties

  • No files are modified
  • No assumptions are inferred
  • Classification is explainable and repeatable

Discovery is allowed to be incomplete, but never incorrect.


Knowledge Layer Tests

Knowledge tests validate encoded domain assumptions.

Located under:

tests/knowledge/

What Knowledge Tests Assert

  • Known map patterns are loaded correctly
  • Structural invariants are enforced consistently
  • Unsupported patterns are rejected explicitly
  • Knowledge rules remain stable over time

This layer is intentionally conservative.

If a rule exists here, it must be: - documented - tested - justified by real corpus evidence

There are no “soft” rules in the knowledge layer.


Transformation and Pipeline Tests

Transformation behavior is validated through integration and end-to-end tests.

Primary coverage lives in:

tests/test_end_to_end.py

What the End-to-End Test Covers

The end-to-end test constructs a realistic DITA package on disk and verifies:

  • index.ditamap is resolved and removed
  • The main map is renamed to the configured DOCX stem
  • Abstract content is injected into the renamed main map
  • Remaining maps are numbered deterministically
  • Wrapper concept topics are generated correctly
  • Existing topicref elements are reparented
  • Glossary topics are refactored into glossentry when configured

Assertions validate semantic structure, not fragile ordering or formatting details.

If the structure is wrong, the test fails.


CLI Contract Tests

CLI behavior is validated separately to ensure user-facing stability.

Covered in:

tests/test_cli_contract.py

These tests assert that:

  • Required arguments are enforced
  • -h and -v behave correctly
  • Invalid flags fail fast
  • Logging overrides are respected

The CLI is treated as a public contract, not an implementation detail.


Error Condition Coverage

Explicit tests exist for known failure scenarios, including:

  • Missing index.ditamap
  • Unresolvable main map references
  • Missing or invalid definition maps
  • Missing glossary navtitles
  • Unsupported or ambiguous structures

These tests assert that:

  • Structural impossibilities fail immediately
  • Recoverable issues emit warnings, not crashes
  • The pipeline never proceeds on invalid assumptions

Silent failure is considered a bug.


Fixtures and Test Organization

Reusable fixtures live under:

tests/fixtures/

Fixtures provide:

  • minimal realistic DITA packages
  • controlled variations for edge cases
  • shared setup logic without hiding behavior

Fixtures support clarity.
They do not replace assertions.


Logging Expectations

Logging is always enabled.

  • Default verbosity is configured in pyproject.toml
  • CLI flags may override logging for one-off runs
  • Tests do not suppress logs unless explicitly required

For diagnostics:

pytest -q --log-cli-level=DEBUG

Logs are part of the behavioral contract, not incidental output.


What Is Not Tested (By Design)

The test suite intentionally avoids:

  • Mocked XML trees detached from the filesystem
  • Snapshot-based output comparisons
  • Performance benchmarking
  • External schema validation (DTD/RNG)

The goal is correctness, clarity, and survivability, not theoretical coverage.


Summary

The testing strategy enforces the project’s core principles:

  • Observe before transforming
  • Encode knowledge explicitly
  • Fail loudly on invalid assumptions
  • Test real behavior against real data

If a behavior matters, it is tested structurally.
If an assumption exists, it is documented and enforced.