re.Scanner面白い。

re.Scanner

簡単に文字列を分解できる。

import re

class WordScanner(re.Scanner):
  def __init__(self):
    # super(WordScanner, self).__init__([ #こちらは動かない。
    re.Scanner.__init__(self, [
        (r"\S+", lambda sc, s : s), # word
        (r"[\n\s]+",  lambda sc, s : None) # skip
        ], re.M) # multiline

print(WordScanner().scan("foo bar baz\n yee")[0])
# => ['foo', 'bar', 'baz', 'yee']

もちろん、これくらいならstr#splitでも十分だけれど。

jsonパーサ(もどき)?

遊びでjsonパーサ(もどき)を作ってみた。文字列をパースして辞書で返す。
ついでにabcも使ってみた。文字列のクォートに対応してなかったりするのでjsonのサブセットだと思う。

code

output

% python re_json.py
> {}
> {'foo': 10, 'voo': True}
> {'foo': True, 'boo': 10}
> {'arr': [1, 2, 3, 4, 5, 6], 'obj': {'y': 20, 'x': 10, 'val': None}, 'val': 10}
> {'item': [{'itemPrice': 300, 'itemCode': 91, 'itemName': '\xe5\xa1\xa9\xe3\x83\xa9\xe3\x83\xbc\xe3\x83\xa1\xe3\x83\xb3'}, {'itemPrice': 290, 'itemCode': 94, 'itemName': '\xe5\x91\xb3\xe5\x99\x8c\xe3\x83\xa9\xe3\x83\xbc\xe3\x83\xa1\xe3\x83\xb3'}, {'itemPrice': 320, 'itemCode': 95, 'itemName': '\xe8\xb1\x9a\xe9\xaa\xa8\xe3\x83\xa9\xe3\x83\xbc\xe3\x83\xa1\xe3\x83\xb3'}]}

文字化けしてしまっている＞＜。