Key Concepts
Regular Expression (Regex)
RegexSolver supports a subset of regular expressions that adhere to the principles of regular languages. Here are the key characteristics and limitations of the regular expressions supported by RegexSolver:
-
Anchored Expressions: All regular expressions in RegexSolver are anchored. This means that the expressions are treated as if they start and end at the boundaries of the input text. For example, the expression
abc
will match the string "abc" but not "xabc" or "abcx". -
Lookahead/Lookbehind: RegexSolver does not support lookahead (
(?=...)
) or lookbehind ((?<=...)
) assertions. Using them would return an error. -
Greedy/Ungreedy Quantifiers: The concept of ungreedy (
*?
,+?
,??
) quantifiers is not supported. All quantifiers are treated as greedy. For example,a*
ora*?
will match the longest possible sequence of "a"s. -
Line Feed and Dot: RegexSolver handle every characters the same way. The dot character
.
matches every possible unicode characters including the line feed (\n
). -
Pure Regular Expressions: RegexSolver focuses on pure regular expressions as defined in regular language theory. This means features that extend beyond regular languages, such as backreferences (
\1
,\2
, etc.), are not supported. Any use of backreference would return an error. -
Empty Regular Expressions: An empty regular expression is denoted by
[]
, which represents a pattern that matches no input, not even an empty string.
RegexSolver uses the Rust regex-syntax library for parsing expressions. As a result, unsupported features supported by Rust will be parsed but ignored. This allows for some flexibility in writing regular expressions, but it is important to be aware of the unsupported features to avoid unexpected behavior.
FAIR (Fast Internal Automaton Representation)
FAIR, or Fast Automaton Internal Representation, is an internal representation used by the RegexSolver engine when it cannot return a readable regular expression. This ensures that users can still leverage the results of operations for subsequent processing, even when a valid regex cannot be generated.
It consists of a string encoded in Z85 with padding, providing a compact and efficient format for the automaton representation. The length of this string is variable, depending on the complexity of the automaton.
Since the API is stateless, FAIR can be stored and reused in future operations. This avoids the need to recompute frequent operations, saving time and API calls.