MathJax Pre- and Post-Filters

Another means of hooking into MathJax’s typesetting pipeline is via pre- and post-filters associated with MathJax’s input and output jax. These are prioritized lists of functions that run either before or after the jax processes a MathItem, and they can be used to pre-process or post-process MathJax’s compiling and typesetting functions. Input and output jax have both pre- and post-filters, and the MathML input jax has an extra set of filters for the parsed MathML as well.

When using Mathjax Components framework, you can use the MathJax configuration object to specify input and output jax filters. The preFilter and postFilter configuration options in the tex, mathml, output, chtml, or svg blocks allow you to specify arrays of filters (or filters together with their priorities). See the MathJax Configuration Options section for details.

When using direct access to the MathJax modules in node applications, to add a pre- or post-filter to an input jax use

InputJax.preFilters.add(fn, priority)

InputJax.postFilters.add(fn, priority)

Arguments:

(arg)=>boolean|void – The filter function to be called. The arg argument is an object with three keys: math, document, and data. The values for these keys are the MathItem being processed, the MathDocument containing that math item, and jax-specific additional data. If the function returns false, the any additional filters are cancelled.
priority – The numeric priority of the filter, where lower numbers are executed first. This lets you insert functions anywhere in the filter list.

For the TeX input jax, the data item is the ParseOptions object for the input jax, which holds configuration data about the TeX input jax.

For the MathML input jax, the pre-filter only runs in the case that the MathML is a serialized MathML string, as it is when converting a MathML string, or when the forceReparse option is true. The post-filter’s data is the root <math> element of the internal MathML tree of the MathML expression. For the MathML input jax, there is also a third filter:

InputJax.mmlFilters.add(fn, priority)

This runs on the MathML DOM tree, either from the document itself, or the one obtained by parsing a serialized MathML string, before the input jax converts the MathML into MathJax’s internal format. The data in this case is the MathML DOM tree.

The AsciiMath input jax does not currently execute any pre- or post-filters.

For an output jax, the pre- and post-filters can be added via

OutputJax.preFilters.add(fn, priority)
OutputJax.postFilters.add(fn, priority)

with arguments as above. In this case, the data is the mjx-container node in which the output DOM elements have been placed. This will become the MathItem.typesetRoot value, but it has not yet been set when the filters run.

In an application that is using MathJax Components, the input jax can be obtained from MathJax.startup.document.inputJax.tex or MathJax.startup.document.inputJax.mml, and the output jax from MathJax.startup.document.outputJax. For applications using direct access to the MathJax modules, the input and output jax will have been instantiated by hand, so you should already have access to them; if not, then they can be obtained from the MathDocument instance returned by mathjax.document() by using that in place of MathJax.startup.document above.

Allowing Spaces in Numbers

Here is an example of using a TeX input filter to allow numbers to be entered that contain spaces, but where the spaces are removed in the output. That is, $12 345$ will be parsed as a single number and displayed as 12345.

MathJax = {
  tex: {
    numberPattern: /^(?:[0-9]+(?:(?: +|\{,\})[0-9]+)*(?:\.[0-9]*)?|\.[0-9]+)/,
    postFilters: [
      ({data}) => {
        for (const mn of data.getList('mn')) {
          const textNode = mn.childNodes[0];
          textNode.text = textNode.text.replace(/ /g, '');
        }
      }
    ],
  },
};

We set the numberPattern option to allow spaces within the number, and then use a post-filter to remove the spaces from the text of any mn elements that were produced during the TeX processing.

Converting Full-Width Characters to ASCII Equivalents

This filter converts any character in the Unicode Full-Width character range (U+FF01 – U+FF5F) to their ASCII equivalent versions, leading to better quality output.

MathJax = {
  tex: {
    preFilters: [
      ({math}) => {
        math.math = math.math.replace(/[\uFF01-\uFF5E]/g,
          (c) => String.fromCodePoint(c.codePointAt(0) - 0xFF00 + 0x20));
      }
    ]
  }
};

This uses a pre-filter to replace characters in the full-width range by an equivalent one in the usual ASCII character range. This will allow numbers to be properly combined by TeX, for example, where the full-width versions would be treated as individual characters.

Converting Unicode Numeric Superscripts to TeX Ones

The following filter converts Unicode pseudo-script numbers (like those in the Superscript and Subscripts block) to actual TeX super- and subscripts.

MathJax = {
  //
  // The pseudoscript numbers 0 through 9, and a pattern for plus-or-minus a number
  //
  scripts: '\u2070\u00B9\u00B2\u00B3\u2074\u2075\u2076\u2077\u2078\u2079',
  scriptRE: /([\u207A\u207B])?([\u2070\u00B9\u00B2\u00B3\u2074-\u2079]+)/g,

  tex: {
    preFilters: [
      ({math}) => {
        math.math = math.math.replace(MathJax.config.scriptRE, (match, pm, n) => {
          const N = n.split('').map(c => MathJax.config.scripts.indexOf(c));  // convert digits
          pm === '\u207A' && N.unshift('+');     // add plus, if given
          pm === '\u207B' && N.unshift('-');     // add minus, if given
          return '^{' + N.join('') + '}';        // make it an actual power
        });
      }
    ]
  }
};

This uses a TeX input jax pre-filter to scan the TeX expression for Unicode superscript numerals, with optional plus or minus signs, and replace them with ASCII numerals inside braces with a ^ to make them actual TeX superscripts.

The filter could be extended to process subscripts in a similar fashion.

Converting SVG Size from Ex to Px units

The SVG output jax sets the <svg> element width and height attributes using ex units, so the SVG will scale to the size of the surrounding font automatically. This filter converts those measurements to px units instead.

MathJax = {
  svg: {
    postFilters: [
      ({data}) => {
        const fixed = MathJax.startup.document.outputJax.fixed;
        const svg = data.querySelector('svg');
        if (svg?.hasAttribute('viewBox')) {
          const [ , , w, h] = svg.getAttribute('viewBox').split(/ /);
          const em = MathJax.startup.document.outputJax.pxPerEm / 1000;
          svg.setAttribute('width', fixed(w * em) + 'px');
          svg.setAttribute('height', fixed(h * em) + 'px');
        }
      }
    ]
  }
};

We use an output jax post-filter to modify the svg element’s attributes, taking advantage of the output jax’s fixed() method to obtain a limited number of decimal places. The width and height are determined from the viewBox attribute, whose values correspond to em units in the SVG output.

An Autobold Filter

This configuration implements a substitute for the v2 autobold extension.

MathJax = {
  tex: {
    preFilters: [
      ({math}) => {
        const styles = window.getComputedStyle(math.start.node.parentNode);
        if (styles.fontWeight >= 700 && !math.inputData.bolded) {
          math.math = '\\boldsymbol{' + math.math + '}';
          math.inputData.bolded = true;
        }
      }
    ]
  }
};

It uses a TeX input jax pre-filter that tests if the parent element of the math string has CSS with font-weight of 700 or more (the usual bold value), and if so, it wraps the TeX code in \boldsymbol{...} to make it bold. Note, however, that if the expression itself includes bold notation, that does not become extra bold, so may not be distinguishable from the rest of the expression.

We track the fact that bolding has been added using the inputData object of the math object. That way, if the expression needs to be reparsed (e.g., for a \require command, or other dynamic data being loaded), we won’t add \boldsymbol more than once.

Convert Mathvariant to Unicode

This example is more complex, and demonstrates a way to convert the use of the mathvariant attribute on the internal MathML token elements to their Unicode equivalents in the Mathematical Alphanumerics block. Because MathML-Core (the version of MathML implemented in most browsers) does not include support for mathvariant (except as mathvariant="normal" on single-character mi elements to prevent the automatic italicization of the character), this may be useful for cases where you want to produce MathML expressions for use with a browser’s native MathML-Core support. Using this together with the native MathML output example would make that output more effective in browsers that implement MathML-Core.

MathJax = {
  startup: {
    ready() {
      //
      //  The numeric ranges for numbers, uppercase alphabet, lowercase alphabet,
      //  uppercase Greek, and lowercase Greek, with optional remapping of some
      //  characters into the (relative) positions used in the Math Alphanumeric block.
      //
      const ranges = [
        [0x30, 0x39],
        [0x41, 0x5A],
        [0x61, 0x7A],
        [0x391, 0x3A9, {0x3F4: 0x3A2, 0x2207: 0x3AA}],
        [0x3B1, 0x3C9, {0x2202: 0x3CA, 0x3F5: 0x3CB, 0x3D1: 0x3CC,
                        0x3F0: 0x3CD, 0x3D5: 0x3CE, 0x3F1: 0x3CF, 0x3D6: 0x3D0}],
      ];
      //
      //  The starting values for numbers, Alpha, alpha, Greek, and greek for the variants
      //
      const variants = {
        bold: [0x1D7CE, 0x1D400, 0x1D41A, 0x1D6A8, 0x1D6C2],
        italic: [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}],
        'bold-italic': [0, 0x1D468, 0x1D482, 0x1D71C, 0x1D736],
        script: [0, 0x1D49C, 0x1D4B6, 0, 0, {
          0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110,
          0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A,
          0x6F: 0x2134,
        }],
        'bold-script': [0, 0x1D4D0, 0x1D4EA, 0, 0],
        fraktur: [0, 0x1D504, 0x1D51E, 0, 0, {
          0x43: 0x212D, 0x48: 0x210C, 0x49: 0x2111, 0x52: 0x211C, 0x5A: 0x2128,
        }],
        'bold-fraktur': [0, 0x1D56C, 0x1D586, 0, 0],
        'double-struck': [0x1D7D8, 0x1D538, 0x1D552, 0, 0, {
          0x43: 0x2102, 0x48: 0x210D, 0x4E: 0x2115, 0x50: 0x2119, 0x51: 0x211A,
          0x52: 0x211D, 0x5A: 0x2124,
          0x393: 0x213E, 0x3A0: 0x213F, 0x3B3: 0x213D, 0x3C0: 0x213C,
        }],
        'sans-serif': [0x1D7E2, 0x1D5A0, 0x1D5BA, 0, 0],
        'bold-sans-serif': [0x1D7EC, 0x1D5D4, 0x1D5EE, 0x1D756, 0x1D770],
        'sans-serif-italic': [0, 0x1D608, 0x1D622, 0, 0],
        'sans-serif-bold-italic': [0, 0x1D63C, 0x1D656, 0x1D790, 0x1D7AA],
        monospace: [0x1D7F6, 0x1D670, 0x1D68A, 0, 0],
        '-tex-calligraphic': [0, 0x1D49C, 0x1D4B6, 0, 0, {
          0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110,
          0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A,
          0x6F: 0x2134,
        }, '\uFE00'],
        '-tex-bold-calligraphic': [0, 0x1D4D0, 0x1D4EA, 0, 0, {}, '\uFE00'],
        '-tex-mathit': [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}],
      };
      //
      // Styles to use for characters that can't be translated.
      //
      const variantStyles = {
        bold: 'font-weight: bold',
        italic: 'font-style: italic',
        'bold-italic': 'font-weight; bold; font-style: italic',
        'script': 'font-family: cursive',
        'bold-script': 'font-family: cursive; font-weight: bold',
        'sans-serif': 'font-family: sans-serif',
        'bold-sans-serif': 'font-family: sans-serif; font-weight: bold',
        'sans-serif-italic': 'font-family: sans-serif; font-style: italic',
        'sans-serif-bold-italic': 'font-family: sans-serif; font-weight: bold; font-style: italic',
        'monospace': 'font-family: monospace',
        '-tex-mathit': 'font-style: italic',
      };
      //
      //  The filter function
      //
      function unicodeVariants(root) {
        //
        //  Walk the MathML tree for token nodes with mathvariant attributes
        //
        root.walkTree((node) => {
          if (!node.isToken || !node.attributes.isSet('mathvariant')) return;
          //
          //  Get the variant and the unicode characters of the element
          //
          const variant =
            node.attributes.get('data-mjx-variant') ?? node.attributes.get('mathvariant');
          const text = [...node.getText()];
          //
          //  Skip the only valid case in MathML-Core and any invalid variants
          //
          if (variant === 'normal' && node.isKind('mi') && text.length === 1) return;
          node.attributes.unset('mathvariant');
          node.attributes.unset('data-mjx-mathvariant');
          if (!Object.hasOwn(variants, variant)) return;
          //
          //  Get the variant data
          //
          const start = variants[variant];
          const remap = start[5] || {};
          const modifier = start[6] || '';
          //
          //  Convert the text of the child nodes
          //
          let converted = true;
          for (const child of node.childNodes) {
            if (child.isKind('text')) {
              converted &= convertText(child, start, remap, modifier);
            }
          }
          //
          // If not all characters were converted, add styles, if possible,
          // but not when it would already be in italics.
          //
          if (!converted &&
              !(['italic', '-tex-mathit'].includes(variant) && text.length === 1 && node.isKind('mi'))) {
            addStyles(node, variant);
          }
        });
      }
      //
      //  Convert the content of a text node
      //
      function convertText(node, start, remap, modifier) {
        //
        //  Get the text
        //
        const text = [...node.getText()]
        //
        //  Loop through the characters in the text
        //
        let converted = 0;
        for (let i = 0; i < text.length; i++) {
          let C = text[i].codePointAt(0);
          //
          //  Check if the character is in one of the ranges
          //
          for (const j of [0, 1, 2, 3, 4]) {
            const [m, M, map = {}] = ranges[j];
            if (!start[j]) continue;
            if (C < m) break;
            //
            //  Set the new character based on the remappings and
            //  starting location for the range
            //
            if (map[C]) {
              text[i] = String.fromCodePoint(map[C] - m + start[j]) + modifier;
              converted++;
              break;
            } else if (remap[C] || C <= M) {
              text[i] = String.fromCodePoint(remap[C] || C - m + start[j]) + modifier;
              converted++;
              break;
            }
          }
        }
        //
        //  Put back the modified text content
        //
        node.setText(text.join(''));
        //
        // Return true if all characters were converted, false otherwise.
        //
        return converted === text.length;
      }
      //
      // Add styles when conversion isn't possible.
      //
      function addStyles(node, variant) {
        let styles = variantStyles[variant];
        if (styles) {
          if (node.attributes.hasExplicit(styles)) {
            styles = node.attributes.get('style') + ' ' + styles;
          }
          node.attributes.set('style', styles);
        }
      }

      //
      //  Add the post-filters to all input jax
      //
      MathJax.startup.defaultReady();
      for (jax of MathJax.startup.document.inputJax) {
        jax.postFilters.add(({data}) => unicodeVariants(data.root || data));
      }
    }
  }
};

This example adds a post-filter to each of the input jax that are loaded (so it will work with both the MathML input as well as TeX input). The filter walks the internal MathML tree looking for token elements with mathvariant attributes, and then converts the content of the child text nodes of those token nodes to use the proper Unicode values for any alphabetic, numeric, or Greek characters that can be represented using the Mathematical Alphanumeric and Letterlike Symbols blocks. If any characters can’t be converted to something in these blocks, we use a style attribute, when possible, to simulate the proper output.

The ranges variable gives the character ranges that will be converted, the variants object gives the data needed to make those ranges to the various Mathematical Alphanumerics characters for the different mathvariant values, and the variantStyles object to hold the styles that need to be applied for each variant.

The special -tex-calligraphic and -tex-bold-calligraphic variants are used internally in MathJax to produce the Chancery calligraphic variant (as opposed to the Roundhand script variant), but Unicode does not distinguish between these two, and the result of the script and bold-script variants is font dependent. The current mechanism to distinguish between these two in Unicode is to use the Unicode variant selector codes U+FE00 and U+FE01. The code here adds U+FE00 for the TeX calligraphic variants. You may wish to add U+FE01 to the script variants to explicitly request the Roundhand versions as well. Note, however, that not all fonts support these variant specifiers, so you may get the same characters in both cases, and which you get will depend on the font. Some browsers may also show unknown character glyphs for these select codes when they don’t understand how to process them.