.. _sync-filters: ============================= MathJax Pre- and Post-Filters ============================= Another means of hooking into MathJax's typesetting pipeline is via pre- and post-filters associated with MathJax's input and output jax. These are prioritized lists of functions that run either before or after the jax processes a :data:`MathItem`, and they can be used to pre-process or post-process MathJax's compiling and typesetting functions. Input and output jax have both pre- and post-filters, and the MathML input jax has an extra set of filters for the parsed MathML as well. When using :ref:`Mathjax Components framework `, you can use the MathJax configuration object to specify input and output jax filters. The :data:`preFilter` and :data:`postFilter` configuration options in the :data:`tex`, :data:`mathml`, :data:`output`, :data:`chtml`, or :data:`svg` blocks allow you to specify arrays of filters (or filters together with their priorities). See the :ref:`configuring-mathjax` section for details. When using direct access to the MathJax modules in node applications, to add a pre- or post-filter to an input jax use .. js:function:: InputJax.preFilters.add(fn, priority) InputJax.postFilters.add(fn, priority) :param (arg)=>boolean|void: The filter function to be called. The :data:`arg` argument is an object with three keys: :data:`math`, :data:`document`, and :data:`data`. The values for these keys are the :data:`MathItem` being processed, the :data:`MathDocument` containing that math item, and jax-specific additional data. If the function returns `false`, the any additional filters are cancelled. :param priority: The numeric priority of the filter, where lower numbers are executed first. This lets you insert functions anywhere in the filter list. For the TeX input jax, the :data:`data` item is the :data:`ParseOptions` object for the input jax, which holds configuration data about the TeX input jax. For the MathML input jax, the pre-filter only runs in the case that the MathML is a serialized MathML string, as it is when converting a MathML string, or when the :ref:`forceReparse ` option is true. The post-filter's :data:`data` is the root ```` element of the internal MathML tree of the MathML expression. For the MathML input jax, there is also a third filter: .. js:function:: InputJax.mmlFilters.add(fn, priority) This runs on the MathML DOM tree, either from the document itself, or the one obtained by parsing a serialized MathML string, before the input jax converts the MathML into MathJax's internal format. The :data:`data` in this case is the MathML DOM tree. The AsciiMath input jax does not currently execute any pre- or post-filters. For an output jax, the pre- and post-filters can be added via .. js:function:: OutputJax.preFilters.add(fn, priority) OutputJax.postFilters.add(fn, priority) with arguments as above. In this case, the :data:`data` is the ``mjx-container`` node in which the output DOM elements have been placed. This will become the :data:`MathItem.typesetRoot` value, but it has not yet been set when the filters run. In an application that is using MathJax Components, the input jax can be obtained from :data:`MathJax.startup.document.inputJax.tex` or :data:`MathJax.startup.document.inputJax.mml`, and the output jax from :data:`MathJax.startup.document.outputJax`. For applications using direct access to the MathJax modules, the input and output jax will have been instantiated by hand, so you should already have access to them; if not, then they can be obtained from the :data:`MathDocument` instance returned by :js:meth:`mathjax.document()` by using that in place of :data:`MathJax.startup.document` above. ----- .. _filter-number-space: Allowing Spaces in Numbers ========================== Here is an example of using a TeX input filter to allow numbers to be entered that contain spaces, but where the spaces are removed in the output. That is, ``$12 345$`` will be parsed as a single number and displayed as ``12345``. .. code-block:: js MathJax = { tex: { numberPattern: /^(?:[0-9]+(?:(?: +|\{,\})[0-9]+)*(?:\.[0-9]*)?|\.[0-9]+)/, postFilters: [ ({data}) => { for (const mn of data.getList('mn')) { const textNode = mn.childNodes[0]; textNode.text = textNode.text.replace(/ /g, ''); } } ], }, }; We set the :data:`numberPattern` option to allow spaces within the number, and then use a post-filter to remove the spaces from the text of any ``mn`` elements that were produced during the TeX processing. ----- .. _filter-fullwidth: Converting Full-Width Characters to ASCII Equivalents ===================================================== This filter converts any character in the Unicode Full-Width character range (U+FF01 -- U+FF5F) to their ASCII equivalent versions, leading to better quality output. .. code-block:: js MathJax = { tex: { preFilters: [ ({math}) => { math.math = math.math.replace(/[\uFF01-\uFF5E]/g, (c) => String.fromCodePoint(c.codePointAt(0) - 0xFF00 + 0x20)); } ] } }; This uses a pre-filter to replace characters in the full-width range by an equivalent one in the usual ASCII character range. This will allow numbers to be properly combined by TeX, for example, where the full-width versions would be treated as individual characters. ----- .. _filter-number-scripts: Converting Unicode Numeric Superscripts to TeX Ones =================================================== The following filter converts Unicode pseudo-script numbers (like those in the Superscript and Subscripts block) to actual TeX super- and subscripts. .. code-block:: js MathJax = { // // The pseudoscript numbers 0 through 9, and a pattern for plus-or-minus a number // scripts: '\u2070\u00B9\u00B2\u00B3\u2074\u2075\u2076\u2077\u2078\u2079', scriptRE: /([\u207A\u207B])?([\u2070\u00B9\u00B2\u00B3\u2074-\u2079]+)/g, tex: { preFilters: [ ({math}) => { math.math = math.math.replace(MathJax.config.scriptRE, (match, pm, n) => { const N = n.split('').map(c => MathJax.config.scripts.indexOf(c)); // convert digits pm === '\u207A' && N.unshift('+'); // add plus, if given pm === '\u207B' && N.unshift('-'); // add minus, if given return '^{' + N.join('') + '}'; // make it an actual power }); } ] } }; This uses a TeX input jax pre-filter to scan the TeX expression for Unicode superscript numerals, with optional plus or minus signs, and replace them with ASCII numerals inside braces with a ``^`` to make them actual TeX superscripts. The filter could be extended to process subscripts in a similar fashion. ----- .. _filter-svg-size: Converting SVG Size from Ex to Px units ======================================= The SVG output jax sets the ```` element :attr:`width` and :attr:`height` attributes using `ex` units, so the SVG will scale to the size of the surrounding font automatically. This filter converts those measurements to `px` units instead. .. code-block:: js MathJax = { svg: { postFilters: [ ({data}) => { const fixed = MathJax.startup.document.outputJax.fixed; const svg = data.querySelector('svg'); if (svg?.hasAttribute('viewBox')) { const [ , , w, h] = svg.getAttribute('viewBox').split(/ /); const em = MathJax.startup.document.outputJax.pxPerEm / 1000; svg.setAttribute('width', fixed(w * em) + 'px'); svg.setAttribute('height', fixed(h * em) + 'px'); } } ] } }; We use an output jax post-filter to modify the ``svg`` element's attributes, taking advantage of the output jax's :meth:`fixed()` method to obtain a limited number of decimal places. The width and height are determined from the :attr:`viewBox` attribute, whose values correspond to ``em`` units in the SVG output. ----- .. _filter-autobold: An Autobold Filter ================== This configuration implements a substitute for the v2 `autobold` extension. .. code-block:: js MathJax = { tex: { preFilters: [ ({math}) => { const styles = window.getComputedStyle(math.start.node.parentNode); if (styles.fontWeight >= 700 && !math.inputData.bolded) { math.math = '\\boldsymbol{' + math.math + '}'; math.inputData.bolded = true; } } ] } }; It uses a TeX input jax pre-filter that tests if the parent element of the math string has CSS with ``font-weight`` of 700 or more (the usual ``bold`` value), and if so, it wraps the TeX code in ``\boldsymbol{...}`` to make it bold. Note, however, that if the expression itself includes bold notation, that does not become extra bold, so may not be distinguishable from the rest of the expression. We track the fact that bolding has been added using the :data:`inputData` object of the :data:`math` object. That way, if the expression needs to be reparsed (e.g., for a ``\require`` command, or other dynamic data being loaded), we won't add ``\boldsymbol`` more than once. ----- .. _filter-mathvariant: Convert Mathvariant to Unicode ============================== This example is more complex, and demonstrates a way to convert the use of the :attr:`mathvariant` attribute on the internal MathML token elements to their Unicode equivalents in the Mathematical Alphanumerics block. Because MathML-Core (the version of MathML implemented in most browsers) does not include support for :attr:`mathvariant` (except as :attr:`mathvariant="normal"` on single-character ``mi`` elements to prevent the automatic italicization of the character), this may be useful for cases where you want to produce MathML expressions for use with a browser's native MathML-Core support. Using this together with the :ref:`native MathML output ` example would make that output more effective in browsers that implement MathML-Core. .. code-block:: js MathJax = { startup: { ready() { // // The numeric ranges for numbers, uppercase alphabet, lowercase alphabet, // uppercase Greek, and lowercase Greek, with optional remapping of some // characters into the (relative) positions used in the Math Alphanumeric block. // const ranges = [ [0x30, 0x39], [0x41, 0x5A], [0x61, 0x7A], [0x391, 0x3A9, {0x3F4: 0x3A2, 0x2207: 0x3AA}], [0x3B1, 0x3C9, {0x2202: 0x3CA, 0x3F5: 0x3CB, 0x3D1: 0x3CC, 0x3F0: 0x3CD, 0x3D5: 0x3CE, 0x3F1: 0x3CF, 0x3D6: 0x3D0}], ]; // // The starting values for numbers, Alpha, alpha, Greek, and greek for the variants // const variants = { bold: [0x1D7CE, 0x1D400, 0x1D41A, 0x1D6A8, 0x1D6C2], italic: [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}], 'bold-italic': [0, 0x1D468, 0x1D482, 0x1D71C, 0x1D736], script: [0, 0x1D49C, 0x1D4B6, 0, 0, { 0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110, 0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A, 0x6F: 0x2134, }], 'bold-script': [0, 0x1D4D0, 0x1D4EA, 0, 0], fraktur: [0, 0x1D504, 0x1D51E, 0, 0, { 0x43: 0x212D, 0x48: 0x210C, 0x49: 0x2111, 0x52: 0x211C, 0x5A: 0x2128, }], 'bold-fraktur': [0, 0x1D56C, 0x1D586, 0, 0], 'double-struck': [0x1D7D8, 0x1D538, 0x1D552, 0, 0, { 0x43: 0x2102, 0x48: 0x210D, 0x4E: 0x2115, 0x50: 0x2119, 0x51: 0x211A, 0x52: 0x211D, 0x5A: 0x2124, 0x393: 0x213E, 0x3A0: 0x213F, 0x3B3: 0x213D, 0x3C0: 0x213C, }], 'sans-serif': [0x1D7E2, 0x1D5A0, 0x1D5BA, 0, 0], 'bold-sans-serif': [0x1D7EC, 0x1D5D4, 0x1D5EE, 0x1D756, 0x1D770], 'sans-serif-italic': [0, 0x1D608, 0x1D622, 0, 0], 'sans-serif-bold-italic': [0, 0x1D63C, 0x1D656, 0x1D790, 0x1D7AA], monospace: [0x1D7F6, 0x1D670, 0x1D68A, 0, 0], '-tex-calligraphic': [0, 0x1D49C, 0x1D4B6, 0, 0, { 0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110, 0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A, 0x6F: 0x2134, }, '\uFE00'], '-tex-bold-calligraphic': [0, 0x1D4D0, 0x1D4EA, 0, 0, {}, '\uFE00'], '-tex-mathit': [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}], }; // // Styles to use for characters that can't be translated. // const variantStyles = { bold: 'font-weight: bold', italic: 'font-style: italic', 'bold-italic': 'font-weight; bold; font-style: italic', 'script': 'font-family: cursive', 'bold-script': 'font-family: cursive; font-weight: bold', 'sans-serif': 'font-family: sans-serif', 'bold-sans-serif': 'font-family: sans-serif; font-weight: bold', 'sans-serif-italic': 'font-family: sans-serif; font-style: italic', 'sans-serif-bold-italic': 'font-family: sans-serif; font-weight: bold; font-style: italic', 'monospace': 'font-family: monospace', '-tex-mathit': 'font-style: italic', }; // // The filter function // function unicodeVariants(root) { // // Walk the MathML tree for token nodes with mathvariant attributes // root.walkTree((node) => { if (!node.isToken || !node.attributes.isSet('mathvariant')) return; // // Get the variant and the unicode characters of the element // const variant = node.attributes.get('data-mjx-variant') ?? node.attributes.get('mathvariant'); const text = [...node.getText()]; // // Skip the only valid case in MathML-Core and any invalid variants // if (variant === 'normal' && node.isKind('mi') && text.length === 1) return; node.attributes.unset('mathvariant'); node.attributes.unset('data-mjx-mathvariant'); if (!Object.hasOwn(variants, variant)) return; // // Get the variant data // const start = variants[variant]; const remap = start[5] || {}; const modifier = start[6] || ''; // // Convert the text of the child nodes // let converted = true; for (const child of node.childNodes) { if (child.isKind('text')) { converted &= convertText(child, start, remap, modifier); } } // // If not all characters were converted, add styles, if possible, // but not when it would already be in italics. // if (!converted && !(['italic', '-tex-mathit'].includes(variant) && text.length === 1 && node.isKind('mi'))) { addStyles(node, variant); } }); } // // Convert the content of a text node // function convertText(node, start, remap, modifier) { // // Get the text // const text = [...node.getText()] // // Loop through the characters in the text // let converted = 0; for (let i = 0; i < text.length; i++) { let C = text[i].codePointAt(0); // // Check if the character is in one of the ranges // for (const j of [0, 1, 2, 3, 4]) { const [m, M, map = {}] = ranges[j]; if (!start[j]) continue; if (C < m) break; // // Set the new character based on the remappings and // starting location for the range // if (map[C]) { text[i] = String.fromCodePoint(map[C] - m + start[j]) + modifier; converted++; break; } else if (remap[C] || C <= M) { text[i] = String.fromCodePoint(remap[C] || C - m + start[j]) + modifier; converted++; break; } } } // // Put back the modified text content // node.setText(text.join('')); // // Return true if all characters were converted, false otherwise. // return converted === text.length; } // // Add styles when conversion isn't possible. // function addStyles(node, variant) { let styles = variantStyles[variant]; if (styles) { if (node.attributes.hasExplicit(styles)) { styles = node.attributes.get('style') + ' ' + styles; } node.attributes.set('style', styles); } } // // Add the post-filters to all input jax // MathJax.startup.defaultReady(); for (jax of MathJax.startup.document.inputJax) { jax.postFilters.add(({data}) => unicodeVariants(data.root || data)); } } } }; This example adds a post-filter to each of the input jax that are loaded (so it will work with both the MathML input as well as TeX input). The filter walks the internal MathML tree looking for token elements with :attr:`mathvariant` attributes, and then converts the content of the child text nodes of those token nodes to use the proper Unicode values for any alphabetic, numeric, or Greek characters that can be represented using the Mathematical Alphanumeric and Letterlike Symbols blocks. If any characters can't be converted to something in these blocks, we use a :attr:`style` attribute, when possible, to simulate the proper output. The :data:`ranges` variable gives the character ranges that will be converted, the :data:`variants` object gives the data needed to make those ranges to the various Mathematical Alphanumerics characters for the different :attr:`mathvariant` values, and the :data:`variantStyles` object to hold the styles that need to be applied for each variant. The special ``-tex-calligraphic`` and ``-tex-bold-calligraphic`` variants are used internally in MathJax to produce the Chancery calligraphic variant (as opposed to the Roundhand script variant), but Unicode does not distinguish between these two, and the result of the ``script`` and ``bold-script`` variants is font dependent. The `current mechanism `__ to distinguish between these two in Unicode is to use the Unicode variant selector codes U+FE00 and U+FE01. The code here adds U+FE00 for the TeX calligraphic variants. You may wish to add U+FE01 to the script variants to explicitly request the Roundhand versions as well. Note, however, that not all fonts support these variant specifiers, so you may get the same characters in both cases, and which you get will depend on the font. Some browsers may also show unknown character glyphs for these select codes when they don't understand how to process them. |-----|