Model Module

model.py

Shared in-memory representations used throughout the SOP → DITA pipeline.

These objects capture semantic structure but avoid writer-level concerns. The classifier creates these models; the DITA writer consumes them.

The models must remain stable because multiple pipeline layers depend on both attribute presence and ordering guarantees.

class dita_sop_converter.model.Block(*, text)

Bases: object

Base class for block-level document content.

Parameters

textstr or None: Raw text content when applicable. Structured blocks use None.

text: str | None

Parameters:: text (str | None)

class dita_sop_converter.model.ImageBlock(*, raw_bytes, filename, is_vector, alt=None)

Bases: Block

Represents an extracted embedded image prior to conversion.

Attributes

raw_bytesbytes: Original binary payload.
filenamestr: Canonical or synthesized filename.
is_vectorbool: True when extension indicates vector form.
altstr or None: Optional natural-language alternative text.

alt: str | None = None

filename: str = ''

is_vector: bool = False

raw_bytes: bytes = b''

Parameters:

raw_bytes (bytes)
filename (str)
is_vector (bool)
alt (str | None)

class dita_sop_converter.model.ImageConvertResult(output_filename, width_px, height_px)

Bases: object

Result of a media conversion operation by DitaWriter.

Attributes

output_filenamestr: Final filename inside media/ directory.
width_pxint or None: Pixel width after conversion.
height_pxint or None: Pixel height after conversion.

height_px: int | None

output_filename: str

width_px: int | None

Parameters:

output_filename (str)
width_px (int | None)
height_px (int | None)

class dita_sop_converter.model.ImageModel(*, output_filename, width_px, height_px, alt, title=None)

Bases: object

Represents a converted/rasterized image ready for DITA emission.

Attributes

output_filenamestr: Relative reference path for image inside media/.
width_pxint: Pixel width.
height_pxint: Pixel height.
altstr or None: Alt text for accessibility.
titlestr or None: Optional title/label derived from figure caption.

alt: str | None

height_px: int

output_filename: str

title: str | None = None

width_px: int

Parameters:

output_filename (str)
width_px (int)
height_px (int)
alt (str | None)
title (str | None)

class dita_sop_converter.model.MapEntry(*, href, navtitle, topic_type=TopicType.TOPIC)

Bases: object

Represents a topic reference within a DITA map.

Attributes

hrefstr: Relative reference to a topic file beneath topics/.
navtitlestr: Human readable text label.
topic_typeTopicType: Used in @type attributes for navigation semantics.

href: str

navtitle: str

topic_type: TopicType = 'topic'

Parameters:

href (str)
navtitle (str)
topic_type (TopicType)

class dita_sop_converter.model.MapModel(*, id, title, entries=<factory>)

Bases: object

Represents a DITA map document and its topicrefs.

Attributes

idstr: Normalized map identifier.
titlestr: Visible map title.
entrieslist[MapEntry]: Ordered map entry references.

add_entry(entry)

Append a map entry.

Return type:: None
Parameters:: entry (MapEntry)

entries: List[MapEntry]

id: str

title: str

Parameters:

id (str)
title (str)
entries (List[MapEntry])

class dita_sop_converter.model.RawNoteBlock(*, text, note_type=None)

Bases: Block

Represents a NOTE/CAUTION/WARNING marker detected in the raw reader.

Attributes

note_typestr or None: Explicit note type when derivable; otherwise inferred downstream.

note_type: str | None = None

Parameters:

text (str | None)
note_type (str | None)

class dita_sop_converter.model.StepBlock(*, text, cmd=None, info=None, image=None)

Bases: Block

Represents an imperative procedure step for TASK topics.

Attributes

cmdstr or None: Imperative command text, emitted in <cmd>.
infostr or None: Optional subordinate continuation text merged into the rendered step.
imageImageBlock or None: Inline/associated image attached to the step produced from 3-col tables.

cmd: str | None = None

image: ImageBlock | None = None

info: str | None = None

Parameters:

text (str | None)
cmd (str | None)
info (str | None)
image (ImageBlock | None)

class dita_sop_converter.model.TableBlock(*, text, rows=<factory>, title=None, kind=None)

Bases: Block

Represents a structured table extracted from the DOCX.

Parameters

textstr or None: Placeholder (unused) to maintain Block compatibility.
rowslist[TableRowBlock]: Structured row instances.
titlestr or None: Optional caption/title when heuristically detected.
kindstr or None: Semantic classification assigned by classifier:

layout-2col : heading/value metadata tables task-3col : step/action/media tables data-table : general multi-column table

kind: str | None = None

rows: List[TableRowBlock]

title: str | None = None

Parameters:

text (str | None)
rows (List[TableRowBlock])
title (str | None)
kind (str | None)

class dita_sop_converter.model.TableRowBlock(*, text, cells=<factory>, is_header=False)

Bases: Block

Represents a single table row in document order.

Parameters

textstr or None: Fallback row text composed from cell contents.
cellslist[str]: Visible cell values in presentation order.
is_headerbool: True when row belongs to header section.

cells: List[str]

is_header: bool = False

Parameters:

text (str | None)
cells (List[str])
is_header (bool)

class dita_sop_converter.model.TopicModel(*, id, title, topic_type, shortdesc=None, blocks=<factory>)

Bases: object

In-memory representation of a DITA topic prior to serialization.

Attributes

idstr: Filename/DITA @id seed, normalized later by the writer.
titlestr: Title/heading text.
topic_typeTopicType: Determines root tag and body wrapper.
shortdescstr or None: Optional shortdesc extraction.
blockslist[Block]: Ordered block instances representing body content.

add_block(block)

Append a block to the topic.

Return type:: None
Parameters:: block (Block)

blocks: List[Block]

id: str

shortdesc: str | None = None

title: str

topic_type: TopicType

Parameters:

id (str)
title (str)
topic_type (TopicType)
shortdesc (str | None)
blocks (List[Block])

class dita_sop_converter.model.TopicType(*values)

Bases: str, Enum

Supported DITA topic types.

Values correspond to the root element name emitted by the writer.

CONCEPT = 'concept'

REFERENCE = 'reference'

TASK = 'task'

TOPIC = 'topic'