UP | HOME
Sachin Patil

Sachin

Free Software developer & Emacser

Regex in Emacs

Published on May 13, 2014 by Sachin.

Working on the regular expressions in GNU Emacs is fun!! Unlike the conventional regex in Perl or Bash, where one has to type the expression and execute it in order to test, regex in Emacs is highly interactive! Emacs has a build-in regex builder which highlights the match pattern as we create the regular expression.

poster.png
Figure 1: re-builder

This post explains the interactive re-builder function in Emacs, which I personally enjoyed a lot. As an example, I am going to take few header lines from Linux kernel source code(I altered some of them) for which we will to create a regular expression.

Consider following header lines:

 1: #include <stdio.h>
 2: #include <linux/stdio.h>
 3: #include  <linux/stdio.h>
 4: #include <linux/module.h>
 5: #include<linux/slab.h>
 6: #include<linux/init.h>
 7: #include <linux/types.h>
 8: #include <linux/dmi.h>
 9: #include <linux/delay.h>
10: #include <linux/platform_device.h>
11: #include <linux/power_supply.h>
12: #include "stdio.h"
13: #include "linux/stdio.h"
14: #include "linux/stdio.h"
15: #include  "linux/module.h"

Invoke re-builder

Call re-builder using

M-x re-builder

This will open a buffer with the name RE-Builder as shown below

snap_1.png
Figure 2: re-builder buffer

Build an expression

  • The header line start with a #, lets begin by typing ^#, ^ denotes the beginning of the line, string or a buffer followed by a # and a string include. Altogether the expression will be ^#include. This should highlight all the region which has #include

    snap_2.png
    Figure 3: Beginning of line, string or a buffer
  • To match the white space after the #include, note that in some lines it does not exists, for example in the line #include<slab.h>. That means the white space should be skipped. To handle this, we make use of square brackets [] to denote an optional part. Lets append [ ] (notice the space between the square brackets). The expression will be ^#include[ ].

    snap_3.png
    Figure 4: Highlight white spaces
  • Problem with the above expression is it skips lines like below

    1: #include<linux/slab.h>
    2: #include<linux/init.h>
    

    and does not highlight more than one spaces like below

    1: #include  <linux/stdio.h>
    2: #include  "linux/module.h"
    

    This can easily handled using an asterisk (*), which match an expression zero or more times. So the modified expression will be ^#include[ ]*

    snap_4.png
    Figure 5: Highlight zero or more white spaces
  • Now we have to match < or (double-quote). We can use another square brackets to match them. Note that < and are special characters and should be escaped with \ (backslash) at the beginning which makes the final expression as ^#include[ ]*[\<\"]

    snap_5.png
    Figure 6: Special characters
  • Now we need to match a string. This is achieved using [a-z] which will match all characters between ’a’ and ’z’. The expression will be ^#include[ ]*[\<\"][a-z]

    snap_6.png
    Figure 7: Match characters

    Now we have a same problem like before that the above expression will highlight just single character. And appending a + sign will match previous pattern one or more times. Now the expression will be ^#include[ ]*[\<\"][a-z]+. To make it more flexible, lets also match all the capital letters which transforms an expression into ^#include[ ]*[\<\"][a-zA-Z]+

    snap_7.png
    Figure 8: Match all characters
  • Now lets also match /, . and _. We have to escape all these special characters using \ and the expression will look like ^#include[ ]*[\<\"][a-zA-Z\/\.\_]+

    snap_8.png
    Figure 9: Match special characters
  • Finally > and closing *“*(double-quote) can be matched using [\>\"]. Our final expression will be

    "^#include[ ]*[\<\"][a-zA-Z\/\.\_]+[\>\"]"
    
    snap_9.png
    Figure 10: Match all
    snap_10.gif
    Figure 11: Regex in an animated form

This ends an introduction to Emacs’s re-builder, for more info please visit Xah Lee’s page on regex.