跳转至

标记解析 API

标记解析模块负责将 HTML 和 Markdown 富文本解析为 TextSpan 树。

模块概览

导出 类型 职责
TextSpan dataclass 富文本片段(含样式覆盖和子 span)
parse_markup() 函数 统一入口,根据 mode 选择解析器
parse_html() 函数 HTML 标记解析
parse_markdown() 函数 Markdown 标记解析

TextSpan

from latticesvg.markup.parser import TextSpan

span = TextSpan(
    text="Hello",
    bold=True,
    color="#ff0000",
    children=[
        TextSpan(text=" World", italic=True)
    ],
)

解析示例

from latticesvg.markup.parser import parse_markup

# HTML
spans = parse_markup("<b>Bold</b> text", "html")

# Markdown
spans = parse_markup("**Bold** text", "markdown")

自动生成的 API 文档

parser

HTML-subset and Markdown markup parser for rich text spans.

Supports a minimal set of inline tags/syntax to produce List[TextSpan]. This is not a full HTML renderer — only inline-level elements are recognised; block-level tags (<div>, <p>, …) are silently ignored.

TextSpan dataclass

TextSpan(text: str = '', font_weight: Optional[str] = None, font_style: Optional[str] = None, font_family: Optional[str] = None, font_size: Optional[float] = None, color: Optional[str] = None, background_color: Optional[str] = None, baseline_shift: Optional[str] = None, text_decoration: Optional[str] = None, is_line_break: bool = False, is_math: bool = False)

A contiguous run of text sharing the same inline style overrides.

Fields set to None mean "inherit from the parent TextNode style".

parse_html

parse_html(text: str) -> List[TextSpan]

Parse an HTML-subset string into a list of TextSpan.

Supported tags: <b>/<strong>, <i>/<em>, <code>, <span style="...">, <br>, <sub>, <sup>, <u>, <s>/<del>, <math> (inline LaTeX formula).

Unsupported tags are silently ignored (their text content is still included as plain text).

Example::

>>> spans = parse_html('Hello <b>world</b>!')
>>> [(s.text, s.font_weight) for s in spans]
[('Hello ', None), ('world', 'bold'), ('!', None)]
Source code in src/latticesvg/markup/parser.py
def parse_html(text: str) -> List[TextSpan]:
    """Parse an HTML-subset string into a list of ``TextSpan``.

    Supported tags: ``<b>``/``<strong>``, ``<i>``/``<em>``, ``<code>``,
    ``<span style="...">``, ``<br>``, ``<sub>``, ``<sup>``, ``<u>``,
    ``<s>``/``<del>``, ``<math>`` (inline LaTeX formula).

    Unsupported tags are silently ignored (their text content is still
    included as plain text).

    Example::

        >>> spans = parse_html('Hello <b>world</b>!')
        >>> [(s.text, s.font_weight) for s in spans]
        [('Hello ', None), ('world', 'bold'), ('!', None)]
    """
    parser = _RichHTMLParser()
    parser.feed(text)
    return parser.spans

parse_markdown

parse_markdown(text: str) -> List[TextSpan]

Parse a Markdown-subset string into a list of TextSpan.

Supported syntax: **bold**, *italic*, `code`, ~~strikethrough~~, $latex$ (inline math).

Example::

>>> spans = parse_markdown('Hello **world**!')
>>> [(s.text, s.font_weight) for s in spans]
[('Hello ', None), ('world', 'bold'), ('!', None)]
Source code in src/latticesvg/markup/parser.py
def parse_markdown(text: str) -> List[TextSpan]:
    """Parse a Markdown-subset string into a list of ``TextSpan``.

    Supported syntax: ``**bold**``, ``*italic*``, `` `code` ``,
    ``~~strikethrough~~``, ``$latex$`` (inline math).

    Example::

        >>> spans = parse_markdown('Hello **world**!')
        >>> [(s.text, s.font_weight) for s in spans]
        [('Hello ', None), ('world', 'bold'), ('!', None)]
    """
    html = _markdown_to_html(text)
    return parse_html(html)

parse_markup

parse_markup(text: str, markup: str = 'none') -> List[TextSpan]

Parse text according to the markup mode.

PARAMETER DESCRIPTION
text

The source text, possibly containing inline markup.

TYPE: str

markup

"none" — return a single plain-text span (default). "html" — parse HTML subset tags. "markdown" — parse Markdown subset syntax.

TYPE: str DEFAULT: 'none'

RETURNS DESCRIPTION
List[TextSpan]

Ordered list of styled text segments.

Source code in src/latticesvg/markup/parser.py
def parse_markup(text: str, markup: str = "none") -> List[TextSpan]:
    """Parse *text* according to the *markup* mode.

    Parameters
    ----------
    text : str
        The source text, possibly containing inline markup.
    markup : str
        ``"none"`` — return a single plain-text span (default).
        ``"html"`` — parse HTML subset tags.
        ``"markdown"`` — parse Markdown subset syntax.

    Returns
    -------
    List[TextSpan]
        Ordered list of styled text segments.
    """
    if markup == "html":
        return parse_html(text)
    if markup == "markdown":
        return parse_markdown(text)
    # "none" or anything else — plain text
    return [TextSpan(text=text)]