UP | HOME
Sachin Patil

Sachin Patil

Free Software Developer | GNU Emacs Hacker

Regex in Emacs
Published on May 13, 2014 by Sachin.

Working on the regular expressions in GNU Emacs is fun!! Unlike the conventional regex in Perl or Bash, where one has to type the expression and execute it in order to test, regex in Emacs is highly interactive! Emacs has a build-in regex builder which highlights the match pattern as we create the regular expression.

poster.png

Figure 1: re-builder

This post explains the interactive re-builder function in Emacs, which I personally enjoyed a lot. As an example, I am going to take few header lines from Linux kernel source code(I altered some of them) for which we will to create a regular expression.

Consider following header lines:

 1: #include <stdio.h>
 2: #include <linux/stdio.h>
 3: #include  <linux/stdio.h>
 4: #include <linux/module.h>
 5: #include<linux/slab.h>
 6: #include<linux/init.h>
 7: #include <linux/types.h>
 8: #include <linux/dmi.h>
 9: #include <linux/delay.h>
10: #include <linux/platform_device.h>
11: #include <linux/power_supply.h>
12: #include "stdio.h"
13: #include "linux/stdio.h"
14: #include "linux/stdio.h"
15: #include  "linux/module.h"

Invoke re-builder

Call re-builder using

M-x re-builder

This will open a buffer with the name RE-Builder as shown below

snap_1.png

Figure 2: re-builder buffer

Build an expression

  • The header line start with a #, lets begin by typing ^#, ^ denotes the beginning of the line, string or a buffer followed by a # and a string include. Altogether the expression will be ^#include. This should highlight all the region which has #include

    snap_2.png

    Figure 3: Beginning of line, string or a buffer

  • To match the white space after the #include, note that in some lines it does not exists, for example in the line #include<slab.h>. That means the white space should be skipped. To handle this, we make use of square brackets [] to denote an optional part. Lets append [ ] (notice the space between the square brackets). The expression will be ^#include[ ].

    snap_3.png

    Figure 4: Highlight white spaces

  • Problem with the above expression is it skips lines like below

    1: #include<linux/slab.h>
    2: #include<linux/init.h>
    

    and does not highlight more than one spaces like below

    1: #include  <linux/stdio.h>
    2: #include  "linux/module.h"
    

    This can easily handled using an asterisk (*), which match an expression zero or more times. So the modified expression will be ^#include[ ]*

    snap_4.png

    Figure 5: Highlight zero or more white spaces

  • Now we have to match < or (double-quote). We can use another square brackets to match them. Note that < and are special characters and should be escaped with \ (backslash) at the beginning which makes the final expression as ^#include[ ]*[\<\"]

    snap_5.png

    Figure 6: Special characters

  • Now we need to match a string. This is achieved using [a-z] which will match all characters between ’a’ and ’z’. The expression will be ^#include[ ]*[\<\"][a-z]

    snap_6.png

    Figure 7: Match characters

    Now we have a same problem like before that the above expression will highlight just single character. And appending a + sign will match previous pattern one or more times. Now the expression will be ^#include[ ]*[\<\"][a-z]+. To make it more flexible, lets also match all the capital letters which transforms an expression into ^#include[ ]*[\<\"][a-zA-Z]+

    snap_7.png

    Figure 8: Match all characters

  • Now lets also match /, . and _. We have to escape all these special characters using \ and the expression will look like ^#include[ ]*[\<\"][a-zA-Z\/\.\_]+

    snap_8.png

    Figure 9: Match special characters

  • Finally > and closing *“*(double-quote) can be matched using [\>\"]. Our final expression will be

    "^#include[ ]*[\<\"][a-zA-Z\/\.\_]+[\>\"]"
    

    snap_9.png

    Figure 10: Match all

    snap_10.gif

    Figure 11: Regex in an animated form

This ends an introduction to Emacs’s re-builder, for more info please visit Xah Lee’s page on regex.