.. _sync-filters:

=============================
MathJax Pre- and Post-Filters
=============================

Another means of hooking into MathJax's typesetting pipeline is via
pre- and post-filters associated with MathJax's input and output jax.
These are prioritized lists of functions that run either before or
after the jax processes a :data:`MathItem`, and they can be used to
pre-process or post-process MathJax's compiling and typesetting
functions.  Input and output jax have both pre- and post-filters, and
the MathML input jax has an extra set of filters for the parsed MathML
as well.

When using :ref:`Mathjax Components framework <web-components>`, you
can use the MathJax configuration object to specify input and output
jax filters.  The :data:`preFilter` and :data:`postFilter`
configuration options in the :data:`tex`, :data:`mathml`,
:data:`output`, :data:`chtml`, or :data:`svg` blocks allow you to
specify arrays of filters (or filters together with their priorities).
See the :ref:`configuring-mathjax` section for details.

When using direct access to the MathJax modules in node applications,
to add a pre- or post-filter to an input jax use

.. js:function:: InputJax.preFilters.add(fn, priority)
                 InputJax.postFilters.add(fn, priority)

   :param (arg)=>boolean|void: The filter function to be called.
                                 The :data:`arg` argument is an object
                                 with three keys: :data:`math`,
                                 :data:`document`, and :data:`data`.
                                 The values for these keys are the
                                 :data:`MathItem` being processed, the
                                 :data:`MathDocument` containing that
                                 math item, and jax-specific
                                 additional data.  If the function
                                 returns `false`, the any additional
                                 filters are cancelled.

   :param priority: The numeric priority of the filter, where lower
                    numbers are executed first.  This lets you insert
                    functions anywhere in the filter list.

For the TeX input jax, the :data:`data` item is the
:data:`ParseOptions` object for the input jax, which holds
configuration data about the TeX input jax.

For the MathML input jax, the pre-filter only runs in the case that
the MathML is a serialized MathML string, as it is when converting a
MathML string, or when the :ref:`forceReparse <mathml-forceReparse>`
option is true.  The post-filter's :data:`data` is the root ``<math>``
element of the internal MathML tree of the MathML expression.  For the
MathML input jax, there is also a third filter:

.. js:function:: InputJax.mmlFilters.add(fn, priority)

This runs on the MathML DOM tree, either from the document itself, or
the one obtained by parsing a serialized MathML string, before the
input jax converts the MathML into MathJax's internal format.  The
:data:`data` in this case is the MathML DOM tree.

The AsciiMath input jax does not currently execute any pre- or
post-filters.

For an output jax, the pre- and post-filters can be added via

.. js:function:: OutputJax.preFilters.add(fn, priority)
                 OutputJax.postFilters.add(fn, priority)

with arguments as above.  In this case, the :data:`data` is the
``mjx-container`` node in which the output DOM elements have been
placed.  This will become the :data:`MathItem.typesetRoot` value, but
it has not yet been set when the filters run.

In an application that is using MathJax Components, the input jax can
be obtained from :data:`MathJax.startup.document.inputJax.tex` or
:data:`MathJax.startup.document.inputJax.mml`, and the output jax
from :data:`MathJax.startup.document.outputJax`.  For applications
using direct access to the MathJax modules, the input and output jax
will have been instantiated by hand, so you should already have access
to them; if not, then they can be obtained from the
:data:`MathDocument` instance returned by
:js:meth:`mathjax.document()` by using that in place of
:data:`MathJax.startup.document` above.


-----

.. _filter-number-space:

Allowing Spaces in Numbers
==========================

Here is an example of using a TeX input filter to allow numbers to be
entered that contain spaces, but where the spaces are removed in the
output.  That is, ``$12 345$`` will be parsed as a single number and
displayed as ``12345``.

.. code-block:: js

   MathJax = {
     tex: {
       numberPattern: /^(?:[0-9]+(?:(?: +|\{,\})[0-9]+)*(?:\.[0-9]*)?|\.[0-9]+)/,
       postFilters: [
         ({data}) => {
           for (const mn of data.getList('mn')) {
             const textNode = mn.childNodes[0];
             textNode.text = textNode.text.replace(/ /g, '');
           }
         }
       ],
     },
   };

We set the :data:`numberPattern` option to allow spaces within the
number, and then use a post-filter to remove the spaces from the text
of any ``mn`` elements that were produced during the TeX processing.

-----

.. _filter-fullwidth:

Converting Full-Width Characters to ASCII Equivalents
=====================================================

This filter converts any character in the Unicode Full-Width character
range (U+FF01 -- U+FF5F) to their ASCII equivalent versions, leading
to better quality output.

.. code-block:: js

   MathJax = {
     tex: {
       preFilters: [
         ({math}) => {
           math.math = math.math.replace(/[\uFF01-\uFF5E]/g,
             (c) => String.fromCodePoint(c.codePointAt(0) - 0xFF00 + 0x20));
         }
       ]
     }
   };

This uses a pre-filter to replace characters in the full-width range
by an equivalent one in the usual ASCII character range.  This will
allow numbers to be properly combined by TeX, for example, where the
full-width versions would be treated as individual characters.

-----

.. _filter-number-scripts:

Converting Unicode Numeric Superscripts to TeX Ones
===================================================

The following filter converts Unicode pseudo-script numbers (like
those in the Superscript and Subscripts block) to actual TeX super-
and subscripts.

.. code-block:: js

   MathJax = {
     //
     // The pseudoscript numbers 0 through 9, and a pattern for plus-or-minus a number
     //
     scripts: '\u2070\u00B9\u00B2\u00B3\u2074\u2075\u2076\u2077\u2078\u2079',
     scriptRE: /([\u207A\u207B])?([\u2070\u00B9\u00B2\u00B3\u2074-\u2079]+)/g,

     tex: {
       preFilters: [
         ({math}) => {
           math.math = math.math.replace(MathJax.config.scriptRE, (match, pm, n) => {
             const N = n.split('').map(c => MathJax.config.scripts.indexOf(c));  // convert digits
             pm === '\u207A' && N.unshift('+');     // add plus, if given
             pm === '\u207B' && N.unshift('-');     // add minus, if given
             return '^{' + N.join('') + '}';        // make it an actual power
           });
         }
       ]
     }
   };

This uses a TeX input jax pre-filter to scan the TeX expression for
Unicode superscript numerals, with optional plus or minus signs, and
replace them with ASCII numerals inside braces with a ``^`` to make
them actual TeX superscripts.

The filter could be extended to process subscripts in a similar
fashion.

-----

.. _filter-svg-size:

Converting SVG Size from Ex to Px units
=======================================

The SVG output jax sets the ``<svg>`` element :attr:`width` and
:attr:`height` attributes using `ex` units, so the SVG will scale to
the size of the surrounding font automatically.  This filter converts
those measurements to `px` units instead.

.. code-block:: js

   MathJax = {
     svg: {
       postFilters: [
         ({data}) => {
           const fixed = MathJax.startup.document.outputJax.fixed;
           const svg = data.querySelector('svg');
           if (svg?.hasAttribute('viewBox')) {
             const [ , , w, h] = svg.getAttribute('viewBox').split(/ /);
             const em = MathJax.startup.document.outputJax.pxPerEm / 1000;
             svg.setAttribute('width', fixed(w * em) + 'px');
             svg.setAttribute('height', fixed(h * em) + 'px');
           }
         }
       ]
     }
   };

We use an output jax post-filter to modify the ``svg`` element's
attributes, taking advantage of the output jax's :meth:`fixed()`
method to obtain a limited number of decimal places.  The width and
height are determined from the :attr:`viewBox` attribute, whose values
correspond to ``em`` units in the SVG output.

-----

.. _filter-autobold:

An Autobold Filter
==================

This configuration implements a substitute for the v2 `autobold` extension.

.. code-block:: js

   MathJax = {
     tex: {
       preFilters: [
         ({math}) => {
           const styles = window.getComputedStyle(math.start.node.parentNode);
           if (styles.fontWeight >= 700 && !math.inputData.bolded) {
             math.math = '\\boldsymbol{' + math.math + '}';
             math.inputData.bolded = true;
           }
         }
       ]
     }
   };

It uses a TeX input jax pre-filter that tests if the parent element of
the math string has CSS with ``font-weight`` of 700 or more (the
usual ``bold`` value), and if so, it wraps the TeX code in
``\boldsymbol{...}`` to make it bold.  Note, however, that if the
expression itself includes bold notation, that does not become extra
bold, so may not be distinguishable from the rest of the expression.

We track the fact that bolding has been added using the
:data:`inputData` object of the :data:`math` object.  That way, if the
expression needs to be reparsed (e.g., for a ``\require`` command, or
other dynamic data being loaded), we won't add ``\boldsymbol`` more
than once.

-----

.. _filter-mathvariant:

Convert Mathvariant to Unicode
==============================

This example is more complex, and demonstrates a way to convert the
use of the :attr:`mathvariant` attribute on the internal MathML token
elements to their Unicode equivalents in the Mathematical
Alphanumerics block.  Because MathML-Core (the version of MathML
implemented in most browsers) does not include support for
:attr:`mathvariant` (except as :attr:`mathvariant="normal"` on
single-character ``mi`` elements to prevent the automatic
italicization of the character), this may be useful for cases where
you want to produce MathML expressions for use with a browser's native
MathML-Core support.  Using this together with the :ref:`native MathML
output <NativeMML>` example would make that output more effective in
browsers that implement MathML-Core.

.. code-block:: js

   MathJax = {
     startup: {
       ready() {
         //
         //  The numeric ranges for numbers, uppercase alphabet, lowercase alphabet,
         //  uppercase Greek, and lowercase Greek, with optional remapping of some
         //  characters into the (relative) positions used in the Math Alphanumeric block.
         //
         const ranges = [
           [0x30, 0x39],
           [0x41, 0x5A],
           [0x61, 0x7A],
           [0x391, 0x3A9, {0x3F4: 0x3A2, 0x2207: 0x3AA}],
           [0x3B1, 0x3C9, {0x2202: 0x3CA, 0x3F5: 0x3CB, 0x3D1: 0x3CC,
                           0x3F0: 0x3CD, 0x3D5: 0x3CE, 0x3F1: 0x3CF, 0x3D6: 0x3D0}],
         ];
         //
         //  The starting values for numbers, Alpha, alpha, Greek, and greek for the variants
         //
         const variants = {
           bold: [0x1D7CE, 0x1D400, 0x1D41A, 0x1D6A8, 0x1D6C2],
           italic: [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}],
           'bold-italic': [0, 0x1D468, 0x1D482, 0x1D71C, 0x1D736],
           script: [0, 0x1D49C, 0x1D4B6, 0, 0, {
             0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110,
             0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A,
             0x6F: 0x2134,
           }],
           'bold-script': [0, 0x1D4D0, 0x1D4EA, 0, 0],
           fraktur: [0, 0x1D504, 0x1D51E, 0, 0, {
             0x43: 0x212D, 0x48: 0x210C, 0x49: 0x2111, 0x52: 0x211C, 0x5A: 0x2128,
           }],
           'bold-fraktur': [0, 0x1D56C, 0x1D586, 0, 0],
           'double-struck': [0x1D7D8, 0x1D538, 0x1D552, 0, 0, {
             0x43: 0x2102, 0x48: 0x210D, 0x4E: 0x2115, 0x50: 0x2119, 0x51: 0x211A,
             0x52: 0x211D, 0x5A: 0x2124,
             0x393: 0x213E, 0x3A0: 0x213F, 0x3B3: 0x213D, 0x3C0: 0x213C,
           }],
           'sans-serif': [0x1D7E2, 0x1D5A0, 0x1D5BA, 0, 0],
           'bold-sans-serif': [0x1D7EC, 0x1D5D4, 0x1D5EE, 0x1D756, 0x1D770],
           'sans-serif-italic': [0, 0x1D608, 0x1D622, 0, 0],
           'sans-serif-bold-italic': [0, 0x1D63C, 0x1D656, 0x1D790, 0x1D7AA],
           monospace: [0x1D7F6, 0x1D670, 0x1D68A, 0, 0],
           '-tex-calligraphic': [0, 0x1D49C, 0x1D4B6, 0, 0, {
             0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110,
             0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A,
             0x6F: 0x2134,
           }, '\uFE00'],
           '-tex-bold-calligraphic': [0, 0x1D4D0, 0x1D4EA, 0, 0, {}, '\uFE00'],
           '-tex-mathit': [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}],
         };
         //
         // Styles to use for characters that can't be translated.
         //
         const variantStyles = {
           bold: 'font-weight: bold',
           italic: 'font-style: italic',
           'bold-italic': 'font-weight; bold; font-style: italic',
           'script': 'font-family: cursive',
           'bold-script': 'font-family: cursive; font-weight: bold',
           'sans-serif': 'font-family: sans-serif',
           'bold-sans-serif': 'font-family: sans-serif; font-weight: bold',
           'sans-serif-italic': 'font-family: sans-serif; font-style: italic',
           'sans-serif-bold-italic': 'font-family: sans-serif; font-weight: bold; font-style: italic',
           'monospace': 'font-family: monospace',
           '-tex-mathit': 'font-style: italic',
         };
         //
         //  The filter function
         //
         function unicodeVariants(root) {
           //
           //  Walk the MathML tree for token nodes with mathvariant attributes
           //
           root.walkTree((node) => {
             if (!node.isToken || !node.attributes.isSet('mathvariant')) return;
             //
             //  Get the variant and the unicode characters of the element
             //
             const variant =
               node.attributes.get('data-mjx-variant') ?? node.attributes.get('mathvariant');
             const text = [...node.getText()];
             //
             //  Skip the only valid case in MathML-Core and any invalid variants
             //
             if (variant === 'normal' && node.isKind('mi') && text.length === 1) return;
             node.attributes.unset('mathvariant');
             node.attributes.unset('data-mjx-mathvariant');
             if (!Object.hasOwn(variants, variant)) return;
             //
             //  Get the variant data
             //
             const start = variants[variant];
             const remap = start[5] || {};
             const modifier = start[6] || '';
             //
             //  Convert the text of the child nodes
             //
             let converted = true;
             for (const child of node.childNodes) {
               if (child.isKind('text')) {
                 converted &= convertText(child, start, remap, modifier);
               }
             }
             //
             // If not all characters were converted, add styles, if possible,
             // but not when it would already be in italics.
             //
             if (!converted &&
                 !(['italic', '-tex-mathit'].includes(variant) && text.length === 1 && node.isKind('mi'))) {
               addStyles(node, variant);
             }
           });
         }
         //
         //  Convert the content of a text node
         //
         function convertText(node, start, remap, modifier) {
           //
           //  Get the text
           //
           const text = [...node.getText()]
           //
           //  Loop through the characters in the text
           //
           let converted = 0;
           for (let i = 0; i < text.length; i++) {
             let C = text[i].codePointAt(0);
             //
             //  Check if the character is in one of the ranges
             //
             for (const j of [0, 1, 2, 3, 4]) {
               const [m, M, map = {}] = ranges[j];
               if (!start[j]) continue;
               if (C < m) break;
               //
               //  Set the new character based on the remappings and
               //  starting location for the range
               //
               if (map[C]) {
                 text[i] = String.fromCodePoint(map[C] - m + start[j]) + modifier;
                 converted++;
                 break;
               } else if (remap[C] || C <= M) {
                 text[i] = String.fromCodePoint(remap[C] || C - m + start[j]) + modifier;
                 converted++;
                 break;
               }
             }
           }
           //
           //  Put back the modified text content
           //
           node.setText(text.join(''));
           //
           // Return true if all characters were converted, false otherwise.
           //
           return converted === text.length;
         }
         //
         // Add styles when conversion isn't possible.
         //
         function addStyles(node, variant) {
           let styles = variantStyles[variant];
           if (styles) {
             if (node.attributes.hasExplicit(styles)) {
               styles = node.attributes.get('style') + ' ' + styles;
             }
             node.attributes.set('style', styles);
           }
         }

         //
         //  Add the post-filters to all input jax
         //
         MathJax.startup.defaultReady();
         for (jax of MathJax.startup.document.inputJax) {
           jax.postFilters.add(({data}) => unicodeVariants(data.root || data));
         }
       }
     }
   };

This example adds a post-filter to each of the input jax that are
loaded (so it will work with both the MathML input as well as TeX
input).  The filter walks the internal MathML tree looking for token
elements with :attr:`mathvariant` attributes, and then converts the
content of the child text nodes of those token nodes to use the proper
Unicode values for any alphabetic, numeric, or Greek characters that
can be represented using the Mathematical Alphanumeric and Letterlike
Symbols blocks.  If any characters can't be converted to something in
these blocks, we use a :attr:`style` attribute, when possible, to
simulate the proper output.

The :data:`ranges` variable gives the character ranges that will be
converted, the :data:`variants` object gives the data needed to make
those ranges to the various Mathematical Alphanumerics characters for
the different :attr:`mathvariant` values, and the
:data:`variantStyles` object to hold the styles that need to be
applied for each variant.

The special ``-tex-calligraphic`` and ``-tex-bold-calligraphic``
variants are used internally in MathJax to produce the Chancery
calligraphic variant (as opposed to the Roundhand script variant), but
Unicode does not distinguish between these two, and the result of the
``script`` and ``bold-script`` variants is font dependent.  The
`current mechanism <https://w3c.github.io/xml-entities/script.html>`__
to distinguish between these two in Unicode is to use the Unicode
variant selector codes U+FE00 and U+FE01.  The code here adds U+FE00
for the TeX calligraphic variants.  You may wish to add U+FE01 to the
script variants to explicitly request the Roundhand versions as well.
Note, however, that not all fonts support these variant specifiers, so
you may get the same characters in both cases, and which you get will
depend on the font.  Some browsers may also show unknown character
glyphs for these select codes when they don't understand how to
process them.


|-----|