![union concatenation kleene star order union concatenation kleene star order](http://image2.slideserve.com/5332710/slide26-n.jpg)
In BRE syntax, all characters are literal except. This is the syntax used by the Unix commands grep and sed. Let's take a look at some of the common ones.īasic Regular Expressions (BRE). It's nearly impossible to determine what a given RE means without knowing which tool is supposed to use it. Now the bad news: there are a plethora of incompatible regular expression syntaxes and feature sets in common use. The character class ] is particularly useful: it matches any character which is displayed as whitespace (spaces, tabs, carriage returns, etc.). Never try to use a range of letters in a bracket expression like or unless you're operating in the C locale. Therefore, modern implementations of egrep provide class names instead: In the case of digits, there's not much danger but in the case of letters of the alphabet, ASCII ordering cannot be safely assumed. However, this relies on the ordering of characters. The resulting expression matches any single character that falls within the specified range. The syntax is called a character class or a bracket expression, and specifies an implicit union operation.
![union concatenation kleene star order union concatenation kleene star order](https://image3.slideserve.com/6177403/operace-kleene-star-l.jpg)
For example, in egrep, our previous example could be written: Most RE implementations have shortcuts to greatly reduce the length and ugliness of common expressions.
![union concatenation kleene star order union concatenation kleene star order](https://userpages.umbc.edu/~squire/images/re2.gif)
(The parentheses introduce a feature known as grouping.) Obviously, in order to have any practical use, these features must be combined together. RE a* matches the empty string, or an input string of a, or an input string of aa, etc. RE a|b matches an input string of a or an input string of b. Here are some examples of the three required features, using this syntax:Ĭoncatenation. We'll start with the syntax used by the Unix command egrep, because it's probably the most common.
![union concatenation kleene star order union concatenation kleene star order](https://image.slideserve.com/937867/role-of-various-representations-for-regular-languages-l.jpg)
The syntax by which these features are expressed varies widely across different RE implementations. If you need formal definitions, please consult a computer science textbook instead.) (I'm not using precise mathematical language here. The small expression may be "repeated" zero or more times in order to match the input. Also called "Kleene closure" (prounced "KLEE-nee"). The large expression will match the input if either of the small expressions matches the input.Ĭlosure. The resulting large expression will match the input string if and only if a part of the input that matches the first small expression is immediately followed by a part that matches the second small expression. Two regular expressions may be written next to each other. Let's start with the theory.Ī regular expression consists of three features:Ĭoncatenation. There are countless variations, including both syntactic and semantic changes. Shoenfield, Mathematical Logic (Addison-Wesley, Reading, MA, 1967).Regular expressions (RE) are a computer science construct, used to determine whether a string matches some sort of pattern. Papadimitriou, Computational Complexity (Addison-Wesley, Reading, MA, 1994). Marker, Model Theory: An Introduction (Springer, New York, 2002).Ĭ. Srebrny (IOS Press, Amsterdam, 2008), pp. Zdanowski, ‘‘Undecidability and Concatenation,’’ in Andrzej Mostowski and Foundational Studies, Ed. Grzegorczyk, ‘‘Undecidability without Arithmetization,’’ Studia Logica 79, 163–230 (2005).Ī. Karlov, ‘‘On decidability of regular languages theories,’’ in Computer Science-Theory and Applications, Ed. 1: Parsing (Prentice-Hall, Englewood Cliffs, NJ, 1972). Ullman, The Theory of Parsing, Translation and Compiling, Vol.