Testing and Validation¶
The DITA Package Processor is validated using pytest with a focus on structural correctness, observability, and deterministic behavior.
Tests are designed to confirm not just that transformations work, but that:
- the input corpus is correctly understood
- assumptions are made explicit
- unsafe situations are detected before mutation
- failures are loud, early, and explainable
This is not a mock-heavy unit test suite.
It is a system-level validation of how the processor behaves against real data.
Running the Test Suite¶
From the repository root:
pytest -q
This runs the full test suite, including:
- discovery and classification
- knowledge invariants
- CLI contract validation
- end-to-end pipeline execution
There is no separate “unit-only” mode.
The system is tested the way it is actually used.
Testing Strategy Overview¶
The project follows a layered testing strategy aligned with the architecture:
Discovery → Knowledge → Transformation → CLI
Each layer has its own tests and failure semantics.
Discovery Tests¶
Discovery tests validate observation, not mutation.
Located under:
tests/discovery/
What Discovery Tests Assert¶
- Maps and topics are correctly scanned from disk
- Structural signatures are detected reliably
- Artifacts are classified deterministically
- Unknown or ambiguous structures are reported, not guessed
- Discovery reports are complete and internally consistent
These tests ensure that the processor understands what is in the package before any transformation is attempted.
Key Properties¶
- No files are modified
- No assumptions are inferred
- Classification is explainable and repeatable
Discovery is allowed to be incomplete, but never incorrect.
Knowledge Layer Tests¶
Knowledge tests validate encoded domain assumptions.
Located under:
tests/knowledge/
What Knowledge Tests Assert¶
- Known map patterns are loaded correctly
- Structural invariants are enforced consistently
- Unsupported patterns are rejected explicitly
- Knowledge rules remain stable over time
This layer is intentionally conservative.
If a rule exists here, it must be: - documented - tested - justified by real corpus evidence
There are no “soft” rules in the knowledge layer.
Transformation and Pipeline Tests¶
Transformation behavior is validated through integration and end-to-end tests.
Primary coverage lives in:
tests/test_end_to_end.py
What the End-to-End Test Covers¶
The end-to-end test constructs a realistic DITA package on disk and verifies:
index.ditamapis resolved and removed- The main map is renamed to the configured DOCX stem
- Abstract content is injected into the renamed main map
- Remaining maps are numbered deterministically
- Wrapper concept topics are generated correctly
- Existing
topicrefelements are reparented - Glossary topics are refactored into
glossentrywhen configured
Assertions validate semantic structure, not fragile ordering or formatting details.
If the structure is wrong, the test fails.
CLI Contract Tests¶
CLI behavior is validated separately to ensure user-facing stability.
Covered in:
tests/test_cli_contract.py
These tests assert that:
- Required arguments are enforced
-hand-vbehave correctly- Invalid flags fail fast
- Logging overrides are respected
The CLI is treated as a public contract, not an implementation detail.
Error Condition Coverage¶
Explicit tests exist for known failure scenarios, including:
- Missing
index.ditamap - Unresolvable main map references
- Missing or invalid definition maps
- Missing glossary navtitles
- Unsupported or ambiguous structures
These tests assert that:
- Structural impossibilities fail immediately
- Recoverable issues emit warnings, not crashes
- The pipeline never proceeds on invalid assumptions
Silent failure is considered a bug.
Fixtures and Test Organization¶
Reusable fixtures live under:
tests/fixtures/
Fixtures provide:
- minimal realistic DITA packages
- controlled variations for edge cases
- shared setup logic without hiding behavior
Fixtures support clarity.
They do not replace assertions.
Logging Expectations¶
Logging is always enabled.
- Default verbosity is configured in
pyproject.toml - CLI flags may override logging for one-off runs
- Tests do not suppress logs unless explicitly required
For diagnostics:
pytest -q --log-cli-level=DEBUG
Logs are part of the behavioral contract, not incidental output.
What Is Not Tested (By Design)¶
The test suite intentionally avoids:
- Mocked XML trees detached from the filesystem
- Snapshot-based output comparisons
- Performance benchmarking
- External schema validation (DTD/RNG)
The goal is correctness, clarity, and survivability, not theoretical coverage.
Summary¶
The testing strategy enforces the project’s core principles:
- Observe before transforming
- Encode knowledge explicitly
- Fail loudly on invalid assumptions
- Test real behavior against real data
If a behavior matters, it is tested structurally.
If an assumption exists, it is documented and enforced.