python – Tkinter : Syntax highlighting for Text widget

This is an extension of tfpf’s answer.

When you call ic.make_pat() it returns the entire regular expression for python formatting. Whereas it may seem convenient to OR In some extra expressions, to one side or the other, it doesn’t really give you much control, and it becomes cumbersome quickly. A potentially more useful and definitely more customizable approach would be to print/copy/paste ic.make_pat(), and break it up similar to below. This also has the bonus side-effect that you don’t have to worry about how to call ic.make_pat() in regards to python versions because, after you do this you aren’t going to use ic.make_pat()at all.

#syntax highlighter patterns
KEYWORD   = r"b(?P<KEYWORD>False|None|True|and|as|assert|async|await|break|class|continue|def|del|elif|else|except|finally|for|from|global|if|import|in|is|lambda|nonlocal|not|or|pass|raise|return|try|while|with|yield)b"
EXCEPTION = r"([^.'"\#]b|^)(?P<EXCEPTION>ArithmeticError|AssertionError|AttributeError|BaseException|BlockingIOError|BrokenPipeError|BufferError|BytesWarning|ChildProcessError|ConnectionAbortedError|ConnectionError|ConnectionRefusedError|ConnectionResetError|DeprecationWarning|EOFError|Ellipsis|EnvironmentError|Exception|FileExistsError|FileNotFoundError|FloatingPointError|FutureWarning|GeneratorExit|IOError|ImportError|ImportWarning|IndentationError|IndexError|InterruptedError|IsADirectoryError|KeyError|KeyboardInterrupt|LookupError|MemoryError|ModuleNotFoundError|NameError|NotADirectoryError|NotImplemented|NotImplementedError|OSError|OverflowError|PendingDeprecationWarning|PermissionError|ProcessLookupError|RecursionError|ReferenceError|ResourceWarning|RuntimeError|RuntimeWarning|StopAsyncIteration|StopIteration|SyntaxError|SyntaxWarning|SystemError|SystemExit|TabError|TimeoutError|TypeError|UnboundLocalError|UnicodeDecodeError|UnicodeEncodeError|UnicodeError|UnicodeTranslateError|UnicodeWarning|UserWarning|ValueError|Warning|WindowsError|ZeroDivisionError)b"
BUILTIN   = r"([^.'"\#]b|^)(?P<BUILTIN>abs|all|any|ascii|bin|breakpoint|callable|chr|classmethod|compile|complex|copyright|credits|delattr|dir|divmod|enumerate|eval|exec|exit|filter|format|frozenset|getattr|globals|hasattr|hash|help|hex|id|input|isinstance|issubclass|iter|len|license|locals|map|max|memoryview|min|next|oct|open|ord|pow|print|quit|range|repr|reversed|round|set|setattr|slice|sorted|staticmethod|sum|type|vars|zip)b"
DOCSTRING = r"(?P<DOCSTRING>(?i:r|u|f|fr|rf|b|br|rb)?'''[^'\]*((\.|'(?!''))[^'\]*)*(''')?|(?i:r|u|f|fr|rf|b|br|rb)?"""[^"\]*((\.|"(?!""))[^"\]*)*(""")?)"
STRING    = r"(?P<STRING>(?i:r|u|f|fr|rf|b|br|rb)?'[^'\n]*(\.[^'\n]*)*'?|(?i:r|u|f|fr|rf|b|br|rb)?"[^"\n]*(\.[^"\n]*)*"?)"
TYPES     = r"b(?P<TYPES>bool|bytearray|bytes|dict|float|int|list|str|tuple|object)b"
NUMBER    = r"b(?P<NUMBER>((0x|0b|0o|#)[da-fA-F]+)|((d*.)?d+))b"
CLASSDEF  = r"(?<=bclass)[ t]+(?P<CLASSDEF>w+)[ t]*[:(]" #recolor of DEFINITION for class definitions
DECORATOR = r"(^[ t]*(?P<DECORATOR>@[wd.]+))"
INSTANCE  = r"b(?P<INSTANCE>super|self|cls)b"
COMMENT   = r"(?P<COMMENT>#[^n]*)"
SYNC      = r"(?P<SYNC>n)"

Then you can concat all of those patterns, in whatever order suits you, as below:

PROG   = rf"{KEYWORD}|{BUILTIN}|{EXCEPTION}|{TYPES}|{COMMENT}|{DOCSTRING}|{STRING}|{SYNC}|{INSTANCE}|{DECORATOR}|{NUMBER}|{CLASSDEF}"

You may notice that DEFINITION is not present in any of the above patterns. That’s because the above patterns are for .progbut the DEFINITION pattern is determined by .idprog. Below is mine. I wanted class definitions to be a different color so, my pattern ignores definitions that are preceded with class. If you don’t intend to make some exceptions to DEFINITION you don’t have to mess with it, at all.

#original - r"s+(w+)"
IDPROG = r"(?<!class)s+(w+)"

The next thing to consider is tagdefs. Instead of line by line adding/modifying a key, you can just predefine tagdefs. Below is an example. Note that every regex group name used in the first set of patterns above, is represented with a key in the below object. Also note that DEFINITION is included here. Each object below becomes options for tag_configureand you can use any option that tag_configure accepts. Colors and fonts are my own, and including them is unnecessary to the example.

TAGDEFS   = {   'COMMENT'    : {'foreground': CHARBLUE  , 'background': None},
                'TYPES'      : {'foreground': CLOUD2    , 'background': None},
                'NUMBER'     : {'foreground': LEMON     , 'background': None},
                'BUILTIN'    : {'foreground': OVERCAST  , 'background': None},
                'STRING'     : {'foreground': PUMPKIN   , 'background': None},
                'DOCSTRING'  : {'foreground': STORMY    , 'background': None},
                'EXCEPTION'  : {'foreground': CLOUD2    , 'background': None, 'font':FONTBOLD},
                'DEFINITION' : {'foreground': SAILOR    , 'background': None, 'font':FONTBOLD},
                'DECORATOR'  : {'foreground': CLOUD2    , 'background': None, 'font':FONTITAL},
                'INSTANCE'   : {'foreground': CLOUD     , 'background': None, 'font':FONTITAL},
                'KEYWORD'    : {'foreground': DK_SEAFOAM, 'background': None, 'font':FONTBOLD},
                'CLASSDEF'   : {'foreground': PURPLE    , 'background': None, 'font':FONTBOLD},
            }

'''
#what literally happens to this data when it is applied
for tag, cfg in self.tagdefs.items():
    self.tag_configure(tag, **cfg)
'''

Once you have that setup you can easily plug everything in. If you make a custom text widget you could put the below in __init__ and change YourTextWidget to self. Otherwise, just change ‘YourTextWidget’ to the instance name of the text widget you want to connect this to (as it is in tfpf’s answer).

cd         = ic.ColorDelegator()
cd.prog    = re.compile(PROG, re.S|re.M)
cd.idprog  = re.compile(IDPROG, re.S)
cd.tagdefs = {**cd.tagdefs, **TAGDEFS}
ip.Percolator(YourTextWidget).insertfilter(cd)

cd.tagdefs = {**cd.tagdefs, **TAGDEFS}

Why did I do it this way? We don’t omit any values ​​with this method. What if KEYWORD was defined in tagdefsbut not in TAGDEFS? If we didn’t first unpack tagdefs into itself we would lose KEYWORD.

To sum up this end of the system: one big regex is run, and whatever regex group name matches becomes the name of the tag to apply. Whatever new regex groups you create have to have an identically named key in .tagdefs. If not, when your group matches it will try to get a tag that does not exist.

Leave a Comment