MathJax Pre- and Post-Filters
Another means of hooking into MathJax’s typesetting pipeline is via
pre- and post-filters associated with MathJax’s input and output jax.
These are prioritized lists of functions that run either before or
after the jax processes a MathItem, and they can be used to
pre-process or post-process MathJax’s compiling and typesetting
functions. Input and output jax have both pre- and post-filters, and
the MathML input jax has an extra set of filters for the parsed MathML
as well.
When using Mathjax Components framework, you
can use the MathJax configuration object to specify input and output
jax filters. The preFilter and postFilter
configuration options in the tex, mathml,
output, chtml, or svg blocks allow you to
specify arrays of filters (or filters together with their priorities).
See the MathJax Configuration Options section for details.
When using direct access to the MathJax modules in node applications, to add a pre- or post-filter to an input jax use
- InputJax.preFilters.add(fn, priority)
- InputJax.postFilters.add(fn, priority)
- Arguments:
(arg)=>boolean|void – The filter function to be called. The
argargument is an object with three keys:math,document, anddata. The values for these keys are theMathItembeing processed, theMathDocumentcontaining that math item, and jax-specific additional data. If the function returns false, the any additional filters are cancelled.priority – The numeric priority of the filter, where lower numbers are executed first. This lets you insert functions anywhere in the filter list.
For the TeX input jax, the data item is the
ParseOptions object for the input jax, which holds
configuration data about the TeX input jax.
For the MathML input jax, the pre-filter only runs in the case that
the MathML is a serialized MathML string, as it is when converting a
MathML string, or when the forceReparse
option is true. The post-filter’s data is the root <math>
element of the internal MathML tree of the MathML expression. For the
MathML input jax, there is also a third filter:
- InputJax.mmlFilters.add(fn, priority)
This runs on the MathML DOM tree, either from the document itself, or
the one obtained by parsing a serialized MathML string, before the
input jax converts the MathML into MathJax’s internal format. The
data in this case is the MathML DOM tree.
The AsciiMath input jax does not currently execute any pre- or post-filters.
For an output jax, the pre- and post-filters can be added via
with arguments as above. In this case, the data is the
mjx-container node in which the output DOM elements have been
placed. This will become the MathItem.typesetRoot value, but
it has not yet been set when the filters run.
In an application that is using MathJax Components, the input jax can
be obtained from MathJax.startup.document.inputJax.tex or
MathJax.startup.document.inputJax.mml, and the output jax
from MathJax.startup.document.outputJax. For applications
using direct access to the MathJax modules, the input and output jax
will have been instantiated by hand, so you should already have access
to them; if not, then they can be obtained from the
MathDocument instance returned by
mathjax.document() by using that in place of
MathJax.startup.document above.
Allowing Spaces in Numbers
Here is an example of using a TeX input filter to allow numbers to be
entered that contain spaces, but where the spaces are removed in the
output. That is, $12 345$ will be parsed as a single number and
displayed as 12345.
MathJax = {
tex: {
numberPattern: /^(?:[0-9]+(?:(?: +|\{,\})[0-9]+)*(?:\.[0-9]*)?|\.[0-9]+)/,
postFilters: [
({data}) => {
for (const mn of data.getList('mn')) {
const textNode = mn.childNodes[0];
textNode.text = textNode.text.replace(/ /g, '');
}
}
],
},
};
We set the numberPattern option to allow spaces within the
number, and then use a post-filter to remove the spaces from the text
of any mn elements that were produced during the TeX processing.
Converting Full-Width Characters to ASCII Equivalents
This filter converts any character in the Unicode Full-Width character range (U+FF01 – U+FF5F) to their ASCII equivalent versions, leading to better quality output.
MathJax = {
tex: {
preFilters: [
({math}) => {
math.math = math.math.replace(/[\uFF01-\uFF5E]/g,
(c) => String.fromCodePoint(c.codePointAt(0) - 0xFF00 + 0x20));
}
]
}
};
This uses a pre-filter to replace characters in the full-width range by an equivalent one in the usual ASCII character range. This will allow numbers to be properly combined by TeX, for example, where the full-width versions would be treated as individual characters.
Converting Unicode Numeric Superscripts to TeX Ones
The following filter converts Unicode pseudo-script numbers (like those in the Superscript and Subscripts block) to actual TeX super- and subscripts.
MathJax = {
//
// The pseudoscript numbers 0 through 9, and a pattern for plus-or-minus a number
//
scripts: '\u2070\u00B9\u00B2\u00B3\u2074\u2075\u2076\u2077\u2078\u2079',
scriptRE: /([\u207A\u207B])?([\u2070\u00B9\u00B2\u00B3\u2074-\u2079]+)/g,
tex: {
preFilters: [
({math}) => {
math.math = math.math.replace(MathJax.config.scriptRE, (match, pm, n) => {
const N = n.split('').map(c => MathJax.config.scripts.indexOf(c)); // convert digits
pm === '\u207A' && N.unshift('+'); // add plus, if given
pm === '\u207B' && N.unshift('-'); // add minus, if given
return '^{' + N.join('') + '}'; // make it an actual power
});
}
]
}
};
This uses a TeX input jax pre-filter to scan the TeX expression for
Unicode superscript numerals, with optional plus or minus signs, and
replace them with ASCII numerals inside braces with a ^ to make
them actual TeX superscripts.
The filter could be extended to process subscripts in a similar fashion.
Converting SVG Size from Ex to Px units
The SVG output jax sets the <svg> element width and
height attributes using ex units, so the SVG will scale to
the size of the surrounding font automatically. This filter converts
those measurements to px units instead.
MathJax = {
svg: {
postFilters: [
({data}) => {
const fixed = MathJax.startup.document.outputJax.fixed;
const svg = data.querySelector('svg');
if (svg?.hasAttribute('viewBox')) {
const [ , , w, h] = svg.getAttribute('viewBox').split(/ /);
const em = MathJax.startup.document.outputJax.pxPerEm / 1000;
svg.setAttribute('width', fixed(w * em) + 'px');
svg.setAttribute('height', fixed(h * em) + 'px');
}
}
]
}
};
We use an output jax post-filter to modify the svg element’s
attributes, taking advantage of the output jax’s fixed()
method to obtain a limited number of decimal places. The width and
height are determined from the viewBox attribute, whose values
correspond to em units in the SVG output.
An Autobold Filter
This configuration implements a substitute for the v2 autobold extension.
MathJax = {
tex: {
preFilters: [
({math}) => {
const styles = window.getComputedStyle(math.start.node.parentNode);
if (styles.fontWeight >= 700 && !math.inputData.bolded) {
math.math = '\\boldsymbol{' + math.math + '}';
math.inputData.bolded = true;
}
}
]
}
};
It uses a TeX input jax pre-filter that tests if the parent element of
the math string has CSS with font-weight of 700 or more (the
usual bold value), and if so, it wraps the TeX code in
\boldsymbol{...} to make it bold. Note, however, that if the
expression itself includes bold notation, that does not become extra
bold, so may not be distinguishable from the rest of the expression.
We track the fact that bolding has been added using the
inputData object of the math object. That way, if the
expression needs to be reparsed (e.g., for a \require command, or
other dynamic data being loaded), we won’t add \boldsymbol more
than once.
Convert Mathvariant to Unicode
This example is more complex, and demonstrates a way to convert the
use of the mathvariant attribute on the internal MathML token
elements to their Unicode equivalents in the Mathematical
Alphanumerics block. Because MathML-Core (the version of MathML
implemented in most browsers) does not include support for
mathvariant (except as mathvariant="normal" on
single-character mi elements to prevent the automatic
italicization of the character), this may be useful for cases where
you want to produce MathML expressions for use with a browser’s native
MathML-Core support. Using this together with the native MathML
output example would make that output more effective in
browsers that implement MathML-Core.
MathJax = {
startup: {
ready() {
//
// The numeric ranges for numbers, uppercase alphabet, lowercase alphabet,
// uppercase Greek, and lowercase Greek, with optional remapping of some
// characters into the (relative) positions used in the Math Alphanumeric block.
//
const ranges = [
[0x30, 0x39],
[0x41, 0x5A],
[0x61, 0x7A],
[0x391, 0x3A9, {0x3F4: 0x3A2, 0x2207: 0x3AA}],
[0x3B1, 0x3C9, {0x2202: 0x3CA, 0x3F5: 0x3CB, 0x3D1: 0x3CC,
0x3F0: 0x3CD, 0x3D5: 0x3CE, 0x3F1: 0x3CF, 0x3D6: 0x3D0}],
];
//
// The starting values for numbers, Alpha, alpha, Greek, and greek for the variants
//
const variants = {
bold: [0x1D7CE, 0x1D400, 0x1D41A, 0x1D6A8, 0x1D6C2],
italic: [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}],
'bold-italic': [0, 0x1D468, 0x1D482, 0x1D71C, 0x1D736],
script: [0, 0x1D49C, 0x1D4B6, 0, 0, {
0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110,
0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A,
0x6F: 0x2134,
}],
'bold-script': [0, 0x1D4D0, 0x1D4EA, 0, 0],
fraktur: [0, 0x1D504, 0x1D51E, 0, 0, {
0x43: 0x212D, 0x48: 0x210C, 0x49: 0x2111, 0x52: 0x211C, 0x5A: 0x2128,
}],
'bold-fraktur': [0, 0x1D56C, 0x1D586, 0, 0],
'double-struck': [0x1D7D8, 0x1D538, 0x1D552, 0, 0, {
0x43: 0x2102, 0x48: 0x210D, 0x4E: 0x2115, 0x50: 0x2119, 0x51: 0x211A,
0x52: 0x211D, 0x5A: 0x2124,
0x393: 0x213E, 0x3A0: 0x213F, 0x3B3: 0x213D, 0x3C0: 0x213C,
}],
'sans-serif': [0x1D7E2, 0x1D5A0, 0x1D5BA, 0, 0],
'bold-sans-serif': [0x1D7EC, 0x1D5D4, 0x1D5EE, 0x1D756, 0x1D770],
'sans-serif-italic': [0, 0x1D608, 0x1D622, 0, 0],
'sans-serif-bold-italic': [0, 0x1D63C, 0x1D656, 0x1D790, 0x1D7AA],
monospace: [0x1D7F6, 0x1D670, 0x1D68A, 0, 0],
'-tex-calligraphic': [0, 0x1D49C, 0x1D4B6, 0, 0, {
0x42: 0x212C, 0x45: 0x2130, 0x46: 0x2131, 0x48: 0x210B, 0x49: 0x2110,
0x4C: 0x2112, 0x4D: 0x2133, 0x52: 0x211B, 0x65: 0x212F, 0x67: 0x210A,
0x6F: 0x2134,
}, '\uFE00'],
'-tex-bold-calligraphic': [0, 0x1D4D0, 0x1D4EA, 0, 0, {}, '\uFE00'],
'-tex-mathit': [0, 0x1D434, 0x1D44E, 0x1D6E2, 0x1D6FC, {0x68: 0x210E}],
};
//
// Styles to use for characters that can't be translated.
//
const variantStyles = {
bold: 'font-weight: bold',
italic: 'font-style: italic',
'bold-italic': 'font-weight; bold; font-style: italic',
'script': 'font-family: cursive',
'bold-script': 'font-family: cursive; font-weight: bold',
'sans-serif': 'font-family: sans-serif',
'bold-sans-serif': 'font-family: sans-serif; font-weight: bold',
'sans-serif-italic': 'font-family: sans-serif; font-style: italic',
'sans-serif-bold-italic': 'font-family: sans-serif; font-weight: bold; font-style: italic',
'monospace': 'font-family: monospace',
'-tex-mathit': 'font-style: italic',
};
//
// The filter function
//
function unicodeVariants(root) {
//
// Walk the MathML tree for token nodes with mathvariant attributes
//
root.walkTree((node) => {
if (!node.isToken || !node.attributes.isSet('mathvariant')) return;
//
// Get the variant and the unicode characters of the element
//
const variant =
node.attributes.get('data-mjx-variant') ?? node.attributes.get('mathvariant');
const text = [...node.getText()];
//
// Skip the only valid case in MathML-Core and any invalid variants
//
if (variant === 'normal' && node.isKind('mi') && text.length === 1) return;
node.attributes.unset('mathvariant');
node.attributes.unset('data-mjx-mathvariant');
if (!Object.hasOwn(variants, variant)) return;
//
// Get the variant data
//
const start = variants[variant];
const remap = start[5] || {};
const modifier = start[6] || '';
//
// Convert the text of the child nodes
//
let converted = true;
for (const child of node.childNodes) {
if (child.isKind('text')) {
converted &= convertText(child, start, remap, modifier);
}
}
//
// If not all characters were converted, add styles, if possible,
// but not when it would already be in italics.
//
if (!converted &&
!(['italic', '-tex-mathit'].includes(variant) && text.length === 1 && node.isKind('mi'))) {
addStyles(node, variant);
}
});
}
//
// Convert the content of a text node
//
function convertText(node, start, remap, modifier) {
//
// Get the text
//
const text = [...node.getText()]
//
// Loop through the characters in the text
//
let converted = 0;
for (let i = 0; i < text.length; i++) {
let C = text[i].codePointAt(0);
//
// Check if the character is in one of the ranges
//
for (const j of [0, 1, 2, 3, 4]) {
const [m, M, map = {}] = ranges[j];
if (!start[j]) continue;
if (C < m) break;
//
// Set the new character based on the remappings and
// starting location for the range
//
if (map[C]) {
text[i] = String.fromCodePoint(map[C] - m + start[j]) + modifier;
converted++;
break;
} else if (remap[C] || C <= M) {
text[i] = String.fromCodePoint(remap[C] || C - m + start[j]) + modifier;
converted++;
break;
}
}
}
//
// Put back the modified text content
//
node.setText(text.join(''));
//
// Return true if all characters were converted, false otherwise.
//
return converted === text.length;
}
//
// Add styles when conversion isn't possible.
//
function addStyles(node, variant) {
let styles = variantStyles[variant];
if (styles) {
if (node.attributes.hasExplicit(styles)) {
styles = node.attributes.get('style') + ' ' + styles;
}
node.attributes.set('style', styles);
}
}
//
// Add the post-filters to all input jax
//
MathJax.startup.defaultReady();
for (jax of MathJax.startup.document.inputJax) {
jax.postFilters.add(({data}) => unicodeVariants(data.root || data));
}
}
}
};
This example adds a post-filter to each of the input jax that are
loaded (so it will work with both the MathML input as well as TeX
input). The filter walks the internal MathML tree looking for token
elements with mathvariant attributes, and then converts the
content of the child text nodes of those token nodes to use the proper
Unicode values for any alphabetic, numeric, or Greek characters that
can be represented using the Mathematical Alphanumeric and Letterlike
Symbols blocks. If any characters can’t be converted to something in
these blocks, we use a style attribute, when possible, to
simulate the proper output.
The ranges variable gives the character ranges that will be
converted, the variants object gives the data needed to make
those ranges to the various Mathematical Alphanumerics characters for
the different mathvariant values, and the
variantStyles object to hold the styles that need to be
applied for each variant.
The special -tex-calligraphic and -tex-bold-calligraphic
variants are used internally in MathJax to produce the Chancery
calligraphic variant (as opposed to the Roundhand script variant), but
Unicode does not distinguish between these two, and the result of the
script and bold-script variants is font dependent. The
current mechanism
to distinguish between these two in Unicode is to use the Unicode
variant selector codes U+FE00 and U+FE01. The code here adds U+FE00
for the TeX calligraphic variants. You may wish to add U+FE01 to the
script variants to explicitly request the Roundhand versions as well.
Note, however, that not all fonts support these variant specifiers, so
you may get the same characters in both cases, and which you get will
depend on the font. Some browsers may also show unknown character
glyphs for these select codes when they don’t understand how to
process them.