Worked Example: From Raw DITA to Discovery Evidence¶
This document walks through a concrete example of how discovery works, from raw DITA files to emitted evidence and final report output.
This is not a tutorial. It is an execution trace.
Example Input: Raw DITA Package¶
Assume the following package structure:
dita/
├── index.ditamap
├── Main.ditamap
└── topics/
└── definition.dita
index.ditamap¶
<map>
<title>Index</title>
<mapref href="Main.ditamap"/>
</map>
Main.ditamap¶
<map>
<title>Main Content</title>
<topicref href="topics/definition.dita"/>
</map>
topics/definition.dita¶
<glossentry>
<glossterm>Widget</glossterm>
<glossdef>Example definition</glossdef>
</glossentry>
Step 1: Discovery Scanner¶
The scanner walks the filesystem and identifies artifacts.
Discovered Artifacts¶
| Path | Type |
|---|---|
| index.ditamap | map |
| Main.ditamap | map |
| topics/definition.dita | topic |
No classification occurs at this stage.
Step 2: Metadata Extraction¶
Structural facts are extracted and stored as metadata.
Artifact: index.ditamap¶
{
"path": "index.ditamap",
"artifact_type": "map",
"metadata": {
"filename": "index.ditamap",
"contains_mapref": true,
"referenced_extensions": [".ditamap"]
}
}
Artifact: Main.ditamap¶
{
"path": "Main.ditamap",
"artifact_type": "map",
"metadata": {
"filename": "Main.ditamap",
"contains_topicref": true,
"referenced_extensions": [".dita"]
}
}
Artifact: topics/definition.dita¶
{
"path": "topics/definition.dita",
"artifact_type": "topic",
"metadata": {
"root_element": "glossentry"
}
}
Step 3: Pattern Evaluation¶
Patterns are evaluated independently for each artifact.
Evaluating index.ditamap¶
Pattern: main_map_by_index¶
Signals checked:
- filename equals
index.ditamap→ ✅ - contains
<mapref>with.ditamaphref → ✅
Result: Pattern matches.
Evidence emitted:
{
"pattern_id": "main_map_by_index",
"artifact_path": "index.ditamap",
"asserted_role": "MAIN",
"confidence": 0.9,
"rationale": [
"File is index.ditamap",
"Contains mapref to another map"
]
}
Evaluating Main.ditamap¶
No MAIN patterns match.
Fallback or content patterns may emit lower-confidence evidence later.
Evaluating topics/definition.dita¶
Pattern: glossary_topic_by_root¶
Signals checked:
- root_element equals
glossentry→ ✅
Evidence emitted:
{
"pattern_id": "glossary_topic_by_root",
"artifact_path": "topics/definition.dita",
"asserted_role": "GLOSSARY",
"confidence": 1.0,
"rationale": [
"Topic root element is <glossentry>"
]
}
Step 4: Evidence Collection¶
At the end of evaluation, evidence looks like this:
[
{
"artifact": "index.ditamap",
"role": "MAIN",
"confidence": 0.9
},
{
"artifact": "topics/definition.dita",
"role": "GLOSSARY",
"confidence": 1.0
}
]
No conflicts were resolved yet.
Step 5: Report Generation¶
The DiscoveryReport summarizes observed facts and evidence.
Example Summary Output¶
{
"maps": 2,
"topics": 1,
"main_maps": 1,
"glossary_topics": 1,
"unknown_artifacts": 0
}
This report is:
- Machine-readable
- Auditable
- Safe to archive
- Safe to feed into later pipeline stages
Key Takeaways¶
- Discovery does not decide, it records
- Patterns emit evidence, not truth
- Conflicts are expected and visible
- Transformation logic comes later
Discovery exists to replace guesswork with facts.