Skip to content

Markup Parser API

The markup parser module converts HTML and Markdown rich text into TextSpan trees.

Module Overview

Export Type Responsibility
TextSpan dataclass Rich text fragment (with style overrides and child spans)
parse_markup() function Unified entry point, selects parser by mode
parse_html() function HTML markup parser
parse_markdown() function Markdown markup parser

TextSpan

from latticesvg.markup.parser import TextSpan

span = TextSpan(
    text="Hello",
    bold=True,
    color="#ff0000",
    children=[
        TextSpan(text=" World", italic=True)
    ],
)

Parsing Examples

from latticesvg.markup.parser import parse_markup

# HTML
spans = parse_markup("<b>Bold</b> text", "html")

# Markdown
spans = parse_markup("**Bold** text", "markdown")

Auto-generated API Docs

parser

HTML-subset and Markdown markup parser for rich text spans.

Supports a minimal set of inline tags/syntax to produce List[TextSpan]. This is not a full HTML renderer — only inline-level elements are recognised; block-level tags (<div>, <p>, …) are silently ignored.

TextSpan dataclass

TextSpan(text: str = '', font_weight: Optional[str] = None, font_style: Optional[str] = None, font_family: Optional[str] = None, font_size: Optional[float] = None, color: Optional[str] = None, background_color: Optional[str] = None, baseline_shift: Optional[str] = None, text_decoration: Optional[str] = None, is_line_break: bool = False, is_math: bool = False)

A contiguous run of text sharing the same inline style overrides.

Fields set to None mean "inherit from the parent TextNode style".

parse_html

parse_html(text: str) -> List[TextSpan]

Parse an HTML-subset string into a list of TextSpan.

Supported tags: <b>/<strong>, <i>/<em>, <code>, <span style="...">, <br>, <sub>, <sup>, <u>, <s>/<del>, <math> (inline LaTeX formula).

Unsupported tags are silently ignored (their text content is still included as plain text).

Example::

>>> spans = parse_html('Hello <b>world</b>!')
>>> [(s.text, s.font_weight) for s in spans]
[('Hello ', None), ('world', 'bold'), ('!', None)]
Source code in src/latticesvg/markup/parser.py
def parse_html(text: str) -> List[TextSpan]:
    """Parse an HTML-subset string into a list of ``TextSpan``.

    Supported tags: ``<b>``/``<strong>``, ``<i>``/``<em>``, ``<code>``,
    ``<span style="...">``, ``<br>``, ``<sub>``, ``<sup>``, ``<u>``,
    ``<s>``/``<del>``, ``<math>`` (inline LaTeX formula).

    Unsupported tags are silently ignored (their text content is still
    included as plain text).

    Example::

        >>> spans = parse_html('Hello <b>world</b>!')
        >>> [(s.text, s.font_weight) for s in spans]
        [('Hello ', None), ('world', 'bold'), ('!', None)]
    """
    parser = _RichHTMLParser()
    parser.feed(text)
    return parser.spans

parse_markdown

parse_markdown(text: str) -> List[TextSpan]

Parse a Markdown-subset string into a list of TextSpan.

Supported syntax: **bold**, *italic*, `code`, ~~strikethrough~~, $latex$ (inline math).

Example::

>>> spans = parse_markdown('Hello **world**!')
>>> [(s.text, s.font_weight) for s in spans]
[('Hello ', None), ('world', 'bold'), ('!', None)]
Source code in src/latticesvg/markup/parser.py
def parse_markdown(text: str) -> List[TextSpan]:
    """Parse a Markdown-subset string into a list of ``TextSpan``.

    Supported syntax: ``**bold**``, ``*italic*``, `` `code` ``,
    ``~~strikethrough~~``, ``$latex$`` (inline math).

    Example::

        >>> spans = parse_markdown('Hello **world**!')
        >>> [(s.text, s.font_weight) for s in spans]
        [('Hello ', None), ('world', 'bold'), ('!', None)]
    """
    html = _markdown_to_html(text)
    return parse_html(html)

parse_markup

parse_markup(text: str, markup: str = 'none') -> List[TextSpan]

Parse text according to the markup mode.

PARAMETER DESCRIPTION
text

The source text, possibly containing inline markup.

TYPE: str

markup

"none" — return a single plain-text span (default). "html" — parse HTML subset tags. "markdown" — parse Markdown subset syntax.

TYPE: str DEFAULT: 'none'

RETURNS DESCRIPTION
List[TextSpan]

Ordered list of styled text segments.

Source code in src/latticesvg/markup/parser.py
def parse_markup(text: str, markup: str = "none") -> List[TextSpan]:
    """Parse *text* according to the *markup* mode.

    Parameters
    ----------
    text : str
        The source text, possibly containing inline markup.
    markup : str
        ``"none"`` — return a single plain-text span (default).
        ``"html"`` — parse HTML subset tags.
        ``"markdown"`` — parse Markdown subset syntax.

    Returns
    -------
    List[TextSpan]
        Ordered list of styled text segments.
    """
    if markup == "html":
        return parse_html(text)
    if markup == "markdown":
        return parse_markdown(text)
    # "none" or anything else — plain text
    return [TextSpan(text=text)]