问答来练练手！在一个 html 页面中截取一段字符串

Ikaros灬 · 2018年04月26日 · 最后由 wind 回复于 2018年05月18日 · 7068 次阅读

这个是我最近碰到的一个问题，大家也可以当做练手哈。
我们的产品在登录前会从服务器获取到一个 csrf 值，这个值藏在登录页面的 html 代码中。在登录时这个值将作为参数一同提交到服务器上。
现在的问题就是我怎样从整个页面的 html 代码中准确找到这个值。
已知这段 html 的代码固定为

<input type="hidden" name="_csrf" value="12345678-1234-1234-1234-123456781234" />

前后的 html 代码都不确定，value 值每次随机生成，value 长度固定
如何把 value 中的值截取下来。

接下来，秀技的时候到了

共收到 21 条回复时间点赞

Ikaros灬 #21 · 2018年04月26日 Author

第一次发布直接粘贴 html 代码还真的给隐藏了。。

BeNice #20 · 2018年04月26日

document.getElementsByName("_csrf").value

Ikaros灬 #19 · 2018年04月26日 Author

对

BeNice 回复

不是这个意思哈，拿到的 html 页面已经是一段文本了。。整个页面的 html 代码作为一个 string

tongyx #18 · 2018年04月26日 1 个赞

string.split("name=\"_csrf\" value=\"")[1].split("\"")[0];

chen #17 · 2018年04月26日

[0-9-]+

Ikaros灬 #16 · 2018年04月26日 Author

对

tongyx 回复

6666666666666

hellohell #15 · 2018年04月26日

s='''<html>
<input type="hidden" name="_csrf" value="12345678-1234-1234-1234-123456781234" />
</html>
'''

import re
print re.search(r'name="_csrf" value="(.+)"',s).group(1)

程明远 #14 · 2018年04月26日

正则匹配呗

昨天有雨 #13 · 2018年04月26日

String str = "<input type=\"hidden\" name=\"_csrf\" value=\"12345678-1234-1234-1234-123456781234\" />";
String value = str.split(" ")[3].split("\"")[1];
System.out.println(value);

LOL #12 · 2018年04月26日

/name="_csrf" value="([\d-]+)"/

匿名 #11 · 2018年04月26日

s='<input type="hidden" name="_csrf" value="12345678-1234-1234-1234-123456781234" />'
value=s.split()[3][7:-1]

乾行 #10 · 2018年04月26日

正则表达式搞定一切

hellohell #9 · 2018年04月26日

用//定义正则的都是大神，像 10 楼

Jaxon #8 · 2018年04月27日

都是大神

hellohell #7 · 2018年04月27日

andward_xu #16 · 2018年04月27日

r^{'1234[.]*1234'$}

yangrm #5 · 2018年04月28日

这个用 js 拿值不是最好么，为啥还要用正则呢？求解释？

lishihai80 #4 · 2018年05月03日

数据
保养

这样的的 HTML 如何使用 selenium 定位啊？

chu #3 · 2018年05月04日

除了正则，也可以用 lxml 解析

import lxml
from lxml import html


s='''<html>
 <input type="hidden" name="_csrf" value="12345678-1234-1234-1234-123456781234" />
</html>
 '''

doc = lxml.html.fromstring(s)
result = doc.xpath('//input[@name="_csrf"]')
if result:
  print(result[0].value)
else:
  print("no result")

wind #2 · 2018年05月18日

码了下答案在楼下

wind #1 · 2018年05月18日

文本信息如下：

<html>
 <input type="hidden" name="_csrf" value="12345678-1234-1234-1234-123456781234" />
</html>

import os.path
import re

# 假设此文本文件在test_study目录下，且文件名为ceshi
text_log = os.path.dirname(os.path.abspath('.')) +'/test_study/ceshi'
#打开此文本
f = open(text_log)
#循环读取每一行
for line in f.readlines():
#判断某一行中是否存在name="_csrf"的
      if re.search('name="_csrf"',line):
 #若有，则正则提取该行固定长度的数字的值
           list = re.findall('[0-9].{35}',line)
#将list变成字符串形式
           print ''.join(list)

需要登录后方可回复, 如果你还没有账号请点击这里注册。

问答 来练练手！在一个 html 页面中截取一段字符串

问答 来练练手！在一个 html 页面中截取一段字符串

问答来练练手！在一个 html 页面中截取一段字符串

问答来练练手！在一个 html 页面中截取一段字符串