Model Module

model.py

Shared in-memory representations used throughout the SOP → DITA pipeline.

These objects capture semantic structure but avoid writer-level concerns. The classifier creates these models; the DITA writer consumes them.

The models must remain stable because multiple pipeline layers depend on both attribute presence and ordering guarantees.

class dita_sop_converter.model.Block(*, text)

Bases: object

Base class for block-level document content.

Parameters

textstr or None

Raw text content when applicable. Structured blocks use None.

text: str | None
Parameters:

text (str | None)

class dita_sop_converter.model.ImageBlock(*, raw_bytes, filename, is_vector, alt=None)

Bases: Block

Represents an extracted embedded image prior to conversion.

Attributes

raw_bytesbytes

Original binary payload.

filenamestr

Canonical or synthesized filename.

is_vectorbool

True when extension indicates vector form.

altstr or None

Optional natural-language alternative text.

alt: str | None = None
filename: str = ''
is_vector: bool = False
raw_bytes: bytes = b''
Parameters:
  • raw_bytes (bytes)

  • filename (str)

  • is_vector (bool)

  • alt (str | None)

class dita_sop_converter.model.ImageConvertResult(output_filename, width_px, height_px)

Bases: object

Result of a media conversion operation by DitaWriter.

Attributes

output_filenamestr

Final filename inside media/ directory.

width_pxint or None

Pixel width after conversion.

height_pxint or None

Pixel height after conversion.

height_px: int | None
output_filename: str
width_px: int | None
Parameters:
  • output_filename (str)

  • width_px (int | None)

  • height_px (int | None)

class dita_sop_converter.model.ImageModel(*, output_filename, width_px, height_px, alt, title=None)

Bases: object

Represents a converted/rasterized image ready for DITA emission.

Attributes

output_filenamestr

Relative reference path for image inside media/.

width_pxint

Pixel width.

height_pxint

Pixel height.

altstr or None

Alt text for accessibility.

titlestr or None

Optional title/label derived from figure caption.

alt: str | None
height_px: int
output_filename: str
title: str | None = None
width_px: int
Parameters:
  • output_filename (str)

  • width_px (int)

  • height_px (int)

  • alt (str | None)

  • title (str | None)

class dita_sop_converter.model.MapEntry(*, href, navtitle, topic_type=TopicType.TOPIC)

Bases: object

Represents a topic reference within a DITA map.

Attributes

hrefstr

Relative reference to a topic file beneath topics/.

navtitlestr

Human readable text label.

topic_typeTopicType

Used in @type attributes for navigation semantics.

href: str
navtitle: str
topic_type: TopicType = 'topic'
Parameters:
  • href (str)

  • navtitle (str)

  • topic_type (TopicType)

class dita_sop_converter.model.MapModel(*, id, title, entries=<factory>)

Bases: object

Represents a DITA map document and its topicrefs.

Attributes

idstr

Normalized map identifier.

titlestr

Visible map title.

entrieslist[MapEntry]

Ordered map entry references.

add_entry(entry)

Append a map entry.

Return type:

None

Parameters:

entry (MapEntry)

entries: List[MapEntry]
id: str
title: str
Parameters:
  • id (str)

  • title (str)

  • entries (List[MapEntry])

class dita_sop_converter.model.RawNoteBlock(*, text, note_type=None)

Bases: Block

Represents a NOTE/CAUTION/WARNING marker detected in the raw reader.

Attributes

note_typestr or None

Explicit note type when derivable; otherwise inferred downstream.

note_type: str | None = None
Parameters:
  • text (str | None)

  • note_type (str | None)

class dita_sop_converter.model.StepBlock(*, text, cmd=None, info=None, image=None)

Bases: Block

Represents an imperative procedure step for TASK topics.

Attributes

cmdstr or None

Imperative command text, emitted in <cmd>.

infostr or None

Optional subordinate continuation text merged into the rendered step.

imageImageBlock or None

Inline/associated image attached to the step produced from 3-col tables.

cmd: str | None = None
image: ImageBlock | None = None
info: str | None = None
Parameters:
  • text (str | None)

  • cmd (str | None)

  • info (str | None)

  • image (ImageBlock | None)

class dita_sop_converter.model.TableBlock(*, text, rows=<factory>, title=None, kind=None)

Bases: Block

Represents a structured table extracted from the DOCX.

Parameters

textstr or None

Placeholder (unused) to maintain Block compatibility.

rowslist[TableRowBlock]

Structured row instances.

titlestr or None

Optional caption/title when heuristically detected.

kindstr or None

Semantic classification assigned by classifier:

layout-2col : heading/value metadata tables task-3col : step/action/media tables data-table : general multi-column table

kind: str | None = None
rows: List[TableRowBlock]
title: str | None = None
Parameters:
  • text (str | None)

  • rows (List[TableRowBlock])

  • title (str | None)

  • kind (str | None)

class dita_sop_converter.model.TableRowBlock(*, text, cells=<factory>, is_header=False)

Bases: Block

Represents a single table row in document order.

Parameters

textstr or None

Fallback row text composed from cell contents.

cellslist[str]

Visible cell values in presentation order.

is_headerbool

True when row belongs to header section.

cells: List[str]
is_header: bool = False
Parameters:
  • text (str | None)

  • cells (List[str])

  • is_header (bool)

class dita_sop_converter.model.TopicModel(*, id, title, topic_type, shortdesc=None, blocks=<factory>)

Bases: object

In-memory representation of a DITA topic prior to serialization.

Attributes

idstr

Filename/DITA @id seed, normalized later by the writer.

titlestr

Title/heading text.

topic_typeTopicType

Determines root tag and body wrapper.

shortdescstr or None

Optional shortdesc extraction.

blockslist[Block]

Ordered block instances representing body content.

add_block(block)

Append a block to the topic.

Return type:

None

Parameters:

block (Block)

blocks: List[Block]
id: str
shortdesc: str | None = None
title: str
topic_type: TopicType
Parameters:
  • id (str)

  • title (str)

  • topic_type (TopicType)

  • shortdesc (str | None)

  • blocks (List[Block])

class dita_sop_converter.model.TopicType(*values)

Bases: str, Enum

Supported DITA topic types.

Values correspond to the root element name emitted by the writer.

CONCEPT = 'concept'
REFERENCE = 'reference'
TASK = 'task'
TOPIC = 'topic'