Segmentation
API Segmentation¶
The segmentation layer is the structural front-end of Novel Testbed. Its job is brutally simple: take whatever prose you wrote and make it executable. That means inserting explicit chapter and module boundaries so the parser and inference system have something real to work with.
It does not analyze meaning.
It does not judge quality.
It only creates joints.
Markdown segmentation utilities.
This module is responsible for discovering and inserting structural boundaries into raw prose Markdown so that downstream parsers can operate.
This is the first phase of the narrative compiler:
segment → parse → infer → assess
The base ModuleSegmenter is deterministic and conservative. It guarantees: - A top-level chapter header - At least one module header - Correct chapter → module ordering - Idempotency when input is already well-formed
LLMSegmenter ¶
Bases: ModuleSegmenter
LLM-powered semantic segmenter.
Uses an OpenAI-backed inferencer to insert: - Chapter titles - Scene boundaries - Exposition blocks - Transition modules
This replaces naive segmentation with semantic awareness.
segment_markdown ¶
segment_markdown(text, title)
Use an LLM to infer module boundaries and return fully annotated Markdown.
:param text: Raw Markdown prose. :param title: Novel title. :return: Structurally valid annotated Markdown.
ModuleSegmenter ¶
Deterministic Markdown segmenter.
Guarantees a structurally valid document:
- One top-level chapter heading:
# Title - At least one module heading:
## Scene ... - Chapter must appear before any module
- Idempotent for already-correct Markdown
segment_markdown ¶
segment_markdown(text, title)
Segment raw Markdown into structurally annotated Markdown.
Structural invariants enforced: - Chapter must come before any module - At least one chapter exists - At least one module exists
:param text: Raw or partially annotated Markdown. :param title: Novel title used for synthetic chapter heading. :return: Valid annotated Markdown.
Core Class¶
class ModuleSegmenter:
def segment_markdown(self, text: str, title: str) -> str:
"""
Convert raw prose into annotated Markdown with structural markers.
Returns Markdown that contains:
- A chapter header: # <title>
- At least one module: ## Scene / ## Exposition / ## Transition
"""
Behavior¶
| Input | Output |
|---|---|
| Raw prose | Markdown with # and ## headers inserted |
| Already structured Markdown | Returned unchanged (idempotent) |
| Empty input | Still returns a valid minimal structure |
Segmentation guarantees that downstream systems always receive valid structural input. The parser never has to guess. The inferencer never has to hallucinate boundaries. Structure is always explicit.
Default Strategy¶
The base ModuleSegmenter is deterministic and conservative:
- Adds a chapter using the provided title
- Creates a single
## Sceneif no modules exist - Preserves original text verbatim inside the new structure
- Never destroys author-defined headings
It is intentionally dumb. That is a feature.
Optional LLM Segmenter¶
You may implement:
class LLMSegmenter(ModuleSegmenter):
...
which uses an LLM to infer:
- Scene boundaries
- Exposition vs action
- Transitions
- Structural pacing
This mirrors the design of the inference layer: strategy objects, not hard wiring.
Role in the Pipeline¶
Segmentation is now the guaranteed first phase:
Markdown → Segment → Parse → Infer → Assess
If segmentation fails, nothing downstream is trustworthy.
This is not decoration.
It is compilation.