Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

部分标签转化的小问题 #23

Open
zhengxinonly opened this issue Aug 31, 2019 · 0 comments
Open

部分标签转化的小问题 #23

zhengxinonly opened this issue Aug 31, 2019 · 0 comments

Comments

@zhengxinonly
Copy link

当我在爬取CSDN文章时,下面标签转化过程中出现了问题。
原文链接为:https://blog.csdn.net/weixin_38405253/article/details/100151657

<li>
	RetentionPolicy.SOURCE: 注解只保留在源文件中
	</li>
	<li>
	RetentionPolicy.CLASS : 注解保留在class文件中,在加载到JVM虚拟机时丢弃
	</li>
	<li>
	RetentionPolicy.RUNTIME: 注解保留在程序运行期间,此时可以通过反射获得定义在某个类上的所有注解。
	</li>

看了一下tomd的源码,有点看不懂,所以不清楚怎么改,所以自行打了一个补丁,代码如下

import re

str_ = '''<li>
        RetentionPolicy.SOURCE: 注解只保留在源文件中
        </li>
        <li>
        RetentionPolicy.CLASS : 注解保留在class文件中,在加载到JVM虚拟机时丢弃
        </li>
        <li>
        RetentionPolicy.RUNTIME: 注解保留在程序运行期间,此时可以通过反射获得定义在某个类上的所有注解。
        </li>'''

pattem = re.compile(' *<li.*?>(.*?)</li>', re.S)
s = re.sub(pattem, lambda temp: "+ " + temp.group(1).strip(), str_)
print(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant