Regex in Emacs
Published on May 13, 2014 by Sachin.
Working on the regular expressions in GNU Emacs is fun!! Unlike the conventional regex in Perl or Bash, where one has to type the expression and execute it in order to test, regex in Emacs is highly interactive! Emacs has a build-in regex builder which highlights the match pattern as we create the regular expression.
Figure 1: re-builder
This post explains the interactive re-builder
function in Emacs, which I
personally enjoyed a lot. As an example, I am going to take few header lines
from Linux kernel source code(I altered some of them) for which we will to
create a regular expression.
Consider following header lines:
1: #include <stdio.h> 2: #include <linux/stdio.h> 3: #include <linux/stdio.h> 4: #include <linux/module.h> 5: #include<linux/slab.h> 6: #include<linux/init.h> 7: #include <linux/types.h> 8: #include <linux/dmi.h> 9: #include <linux/delay.h> 10: #include <linux/platform_device.h> 11: #include <linux/power_supply.h> 12: #include "stdio.h" 13: #include "linux/stdio.h" 14: #include "linux/stdio.h" 15: #include "linux/module.h"
Invoke re-builder
Call re-builder using
M-x re-builder
This will open a buffer with the name RE-Builder as shown below
Figure 2: re-builder buffer
Build an expression
The header line start with a
#
, lets begin by typing^#
,^
denotes the beginning of the line, string or a buffer followed by a#
and a stringinclude
. Altogether the expression will be^#include
. This should highlight all the region which has#include
Figure 3: Beginning of line, string or a buffer
To match the white space after the
#include
, note that in some lines it does not exists, for example in the line#include<slab.h>
. That means the white space should be skipped. To handle this, we make use of square brackets [] to denote an optional part. Lets append [ ] (notice the space between the square brackets). The expression will be^#include[ ]
.Figure 4: Highlight white spaces
Problem with the above expression is it skips lines like below
1: #include<linux/slab.h> 2: #include<linux/init.h>
and does not highlight more than one spaces like below
1: #include <linux/stdio.h> 2: #include "linux/module.h"
This can easily handled using an asterisk (
*
), which match an expression zero or more times. So the modified expression will be^#include[ ]*
Figure 5: Highlight zero or more white spaces
Now we have to match < or “ (double-quote). We can use another square brackets to match them. Note that < and “ are special characters and should be escaped with \ (backslash) at the beginning which makes the final expression as
^#include[ ]*[\<\"]
Figure 6: Special characters
Now we need to match a string. This is achieved using
[a-z]
which will match all characters between ’a’ and ’z’. The expression will be^#include[ ]*[\<\"][a-z]
Figure 7: Match characters
Now we have a same problem like before that the above expression will highlight just single character. And appending a
+
sign will match previous pattern one or more times. Now the expression will be^#include[ ]*[\<\"][a-z]+
. To make it more flexible, lets also match all the capital letters which transforms an expression into^#include[ ]*[\<\"][a-zA-Z]+
Figure 8: Match all characters
Now lets also match /, . and _. We have to escape all these special characters using \ and the expression will look like
^#include[ ]*[\<\"][a-zA-Z\/\.\_]+
Figure 9: Match special characters
Finally > and closing *“*(double-quote) can be matched using
[\>\"]
. Our final expression will be"^#include[ ]*[\<\"][a-zA-Z\/\.\_]+[\>\"]"
Figure 10: Match all
Figure 11: Regex in an animated form
This ends an introduction to Emacs’s re-builder, for more info please visit Xah Lee’s page on regex.