Model Module
model.py
Shared in-memory representations used throughout the SOP → DITA pipeline.
These objects capture semantic structure but avoid writer-level concerns. The classifier creates these models; the DITA writer consumes them.
The models must remain stable because multiple pipeline layers depend on both attribute presence and ordering guarantees.
- class dita_sop_converter.model.Block(*, text)
Bases:
objectBase class for block-level document content.
Parameters
- textstr or None
Raw text content when applicable. Structured blocks use
None.
- text: str | None
- Parameters:
text (str | None)
- class dita_sop_converter.model.ImageBlock(*, raw_bytes, filename, is_vector, alt=None)
Bases:
BlockRepresents an extracted embedded image prior to conversion.
Attributes
- raw_bytesbytes
Original binary payload.
- filenamestr
Canonical or synthesized filename.
- is_vectorbool
True when extension indicates vector form.
- altstr or None
Optional natural-language alternative text.
- alt: str | None = None
- filename: str = ''
- is_vector: bool = False
- raw_bytes: bytes = b''
- Parameters:
raw_bytes (bytes)
filename (str)
is_vector (bool)
alt (str | None)
- class dita_sop_converter.model.ImageConvertResult(output_filename, width_px, height_px)
Bases:
objectResult of a media conversion operation by DitaWriter.
Attributes
- output_filenamestr
Final filename inside media/ directory.
- width_pxint or None
Pixel width after conversion.
- height_pxint or None
Pixel height after conversion.
- height_px: int | None
- output_filename: str
- width_px: int | None
- Parameters:
output_filename (str)
width_px (int | None)
height_px (int | None)
- class dita_sop_converter.model.ImageModel(*, output_filename, width_px, height_px, alt, title=None)
Bases:
objectRepresents a converted/rasterized image ready for DITA emission.
Attributes
- output_filenamestr
Relative reference path for image inside media/.
- width_pxint
Pixel width.
- height_pxint
Pixel height.
- altstr or None
Alt text for accessibility.
- titlestr or None
Optional title/label derived from figure caption.
- alt: str | None
- height_px: int
- output_filename: str
- title: str | None = None
- width_px: int
- Parameters:
output_filename (str)
width_px (int)
height_px (int)
alt (str | None)
title (str | None)
- class dita_sop_converter.model.MapEntry(*, href, navtitle, topic_type=TopicType.TOPIC)
Bases:
objectRepresents a topic reference within a DITA map.
Attributes
- hrefstr
Relative reference to a topic file beneath topics/.
- navtitlestr
Human readable text label.
- topic_typeTopicType
Used in @type attributes for navigation semantics.
- href: str
- Parameters:
href (str)
navtitle (str)
topic_type (TopicType)
- class dita_sop_converter.model.MapModel(*, id, title, entries=<factory>)
Bases:
objectRepresents a DITA map document and its topicrefs.
Attributes
- idstr
Normalized map identifier.
- titlestr
Visible map title.
- entrieslist[MapEntry]
Ordered map entry references.
- id: str
- title: str
- Parameters:
id (str)
title (str)
entries (List[MapEntry])
- class dita_sop_converter.model.RawNoteBlock(*, text, note_type=None)
Bases:
BlockRepresents a NOTE/CAUTION/WARNING marker detected in the raw reader.
Attributes
- note_typestr or None
Explicit note type when derivable; otherwise inferred downstream.
- note_type: str | None = None
- Parameters:
text (str | None)
note_type (str | None)
- class dita_sop_converter.model.StepBlock(*, text, cmd=None, info=None, image=None)
Bases:
BlockRepresents an imperative procedure step for TASK topics.
Attributes
- cmdstr or None
Imperative command text, emitted in <cmd>.
- infostr or None
Optional subordinate continuation text merged into the rendered step.
- imageImageBlock or None
Inline/associated image attached to the step produced from 3-col tables.
- cmd: str | None = None
- image: ImageBlock | None = None
- info: str | None = None
- Parameters:
text (str | None)
cmd (str | None)
info (str | None)
image (ImageBlock | None)
- class dita_sop_converter.model.TableBlock(*, text, rows=<factory>, title=None, kind=None)
Bases:
BlockRepresents a structured table extracted from the DOCX.
Parameters
- textstr or None
Placeholder (unused) to maintain Block compatibility.
- rowslist[TableRowBlock]
Structured row instances.
- titlestr or None
Optional caption/title when heuristically detected.
- kindstr or None
Semantic classification assigned by classifier:
layout-2col : heading/value metadata tables task-3col : step/action/media tables data-table : general multi-column table
- kind: str | None = None
- rows: List[TableRowBlock]
- title: str | None = None
- Parameters:
text (str | None)
rows (List[TableRowBlock])
title (str | None)
kind (str | None)
- class dita_sop_converter.model.TableRowBlock(*, text, cells=<factory>, is_header=False)
Bases:
BlockRepresents a single table row in document order.
Parameters
- textstr or None
Fallback row text composed from cell contents.
- cellslist[str]
Visible cell values in presentation order.
- is_headerbool
True when row belongs to header section.
- cells: List[str]
- is_header: bool = False
- Parameters:
text (str | None)
cells (List[str])
is_header (bool)
- class dita_sop_converter.model.TopicModel(*, id, title, topic_type, shortdesc=None, blocks=<factory>)
Bases:
objectIn-memory representation of a DITA topic prior to serialization.
Attributes
- idstr
Filename/DITA @id seed, normalized later by the writer.
- titlestr
Title/heading text.
- topic_typeTopicType
Determines root tag and body wrapper.
- shortdescstr or None
Optional shortdesc extraction.
- blockslist[Block]
Ordered block instances representing body content.
- id: str
- shortdesc: str | None = None
- title: str