在工作中,经常会遇到这种需求,常用的可能就是结合正则表达式搞了,简单点的,可能直接会使用 awk。这里要推荐一种比较小众的方式,使用 Pyparsing 来定制自己的解析器。而上面的场景,都可以用 Pyparsing 来实现。下面一起看看 Pyparsing 是什么。
那 Pyparsing 是什么呢?简单来说就是解析表达式使用标准的 python 类标记和符号表示。因为它的规则可读性很强,因此很方便维护以及拓展,在中等复杂的需求中,可以拿来使用。
复杂点说,Pyparsing 是:
可能这么说还是比较抽象,直接看下面的例子吧
看一个 traceview 的解析例子,假设源文件如下:
AsyncTask #3
pool-2-thread-5
pool-3-thread-1
uil-pool-2-thread-1
uil-pool-2-thread-2
uil-pool-2-thread-3
uil-pool-2-thread-4
Trace (threadID action usecs class.method signature):
xit 0 ..dalvik.system.VMDebug.startMethodTracingFilename (Ljava/lang/String;IIZI)V VMDebug.java
xit 0 ..com.android.org.conscrypt.NativeCrypto.EVP_DigestUpdate (Lcom/android/org/conscrypt/OpenSSLDigestContext;[BII)V NativeCrypto.java
xit 218 .dalvik.system.VMDebug.startMethodTracing (Ljava/lang/String;IIZI)V VMDebug.java
xit 225 android.os.Debug.startMethodTracing (Ljava/lang/String;II)V Debug.java
xit 230-android.os.Debug.startMethodTracing (Ljava/lang/String;I)V Debug.java
xit 266-java.lang.reflect.Method.invoke (Ljava/lang/Object;[Ljava/lang/Object;Z)Ljava/lang/Object; Method.java
ent 528 ..java.lang.ClassLoader.loadClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
ent 543 ...java.lang.ClassLoader.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ClassLoader.java
ent 548 ....java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
ent 567 .....java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
xit 576 .....java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
xit 681 ....java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
ent 689 ....com.uc.base.aerie.hack.ClassLoaderSupport$a.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ProGuard
ent 704 .....java.lang.ClassLoader.getParent ()Ljava/lang/ClassLoader; ClassLoader.java
16804 ent 726 ......java.lang.BootClassLoader.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ClassLoader.java
ent 730 .......java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
ent 734 ........java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
xit 740 ........java.lang.BootClassLoader.getInstance ()Ljava/lang/BootClassLoader; ClassLoader.java
xit 754 .......java.lang.ClassLoader.findLoadedClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
xit 759 ......java.lang.BootClassLoader.loadClass (Ljava/lang/String;Z)Ljava/lang/Class; ClassLoader.java
xit 763 .....java.lang.ClassLoader.loadClass (Ljava/lang/String;)Ljava/lang/Class; ClassLoader.java
可以看到文件格式比较统一,其实正则很好去解析,但这里为了方便说明,用 pyparsing 来实现一个解析器。
将文件进行分解,假设需要解析的是 Trace (threadID action usecs class.method signature) 之后的数据
semiFlag = Literal(";")
dotFlag = Suppress(Literal("."))
threadID =Word(nums, max=5)
actionField = Word(alphas)
usecsField = Word(nums, max=8)
clsField = Word(alphas+".")
methodField = Combine("(" + ZeroOrMore(Word(alphas + ";/")) + ")" + Word(alphas + "/") + semiFlag)
可以看到跟clsField
一致
所以整合起来,脚本可以很快写出来(这里没有适配部分异常情况,直接try..except
处理简单看看效果)
import os
from pyparsing import Word, nums, Combine, alphas, Literal, ZeroOrMore, Group, \
Suppress
semiFlag = Literal(";")
dotFlag = Suppress(Literal("."))
multiDot = ZeroOrMore(dotFlag)
threadID =Word(nums, max=5)
actionField = Word(alphas)
usecsField = Word(nums, max=8)
clsField = Word(alphas+".")
methodField = Combine("(" + ZeroOrMore(Word(alphas + ";/")) + ")" + Word(alphas + "/") + semiFlag)
regex = threadID + actionField + usecsField + multiDot + Group(clsField + methodField) + clsField
with open(os.path.join(os.getcwd(), "StepBeforeFirstDraw_o.txt"), "rb") as f:
lineno = 0
flag = 0
while 1:
line = f.readline()
lineno += 1
if "threadID action usecs" in line:
flag = lineno
continue
if flag > 0:
try:
regex.parseString(line).toXML("")
except Exception as e:
pass
解析后的结果为:
/usr/bin/python2.7 /home/alex/workspace/virtual_space/project/calclex.py
['16804', 'ent', '528', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '543', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '548', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '567', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '576', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '681', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '704', ['java.lang.ClassLoader.getParent', '()Ljava/lang/ClassLoader;'], 'ClassLoader.java']
['16804', 'ent', '726', ['java.lang.BootClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '730', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '734', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '740', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '754', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'xit', '759', ['java.lang.BootClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'xit', '763', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'xit', '771', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'xit', '774', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '809', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '814', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '818', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '822', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '827', ['java.lang.BootClassLoader.getInstance', '()Ljava/lang/BootClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '842', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '853', ['java.lang.ClassLoader.getParent', '()Ljava/lang/ClassLoader;'], 'ClassLoader.java']
['16804', 'xit', '857', ['java.lang.ClassLoader.getParent', '()Ljava/lang/ClassLoader;'], 'ClassLoader.java']
['16804', 'ent', '861', ['java.lang.ClassLoader.loadClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '865', ['java.lang.BootClassLoader.loadClass', '(Ljava/lang/String;Z)Ljava/lang/Class;'], 'ClassLoader.java']
['16804', 'ent', '869', ['java.lang.ClassLoader.findLoadedClass', '(Ljava/lang/String;)Ljava/lang/Class;'], 'ClassLoader.java']