Selenium 记解决自动爬取页面元素刷新或返回后元素失效问题

Tcat · 2018年01月20日 · 1600 次阅读

问题

之前采用方案是直接保存 webelement 的元素,然后发现一个问题,每次当跳到其他页面然后返回前一页面时,取出前面页面元素进行遍历点击时都引用失效了,直接报出StaleElementReferenceException

过程

  • 寻找报出StaleElementReferenceException异常的原因,发现当页面改变,元素被其他元素遮挡或者位置发生变化时,再次使用元素都会导致这个问题
  • 寻找为什么元素会发生变动后的解决方法,发现需要通过重新获取元素来解决
First of all lets be clear about what a WebElement is.

A WebElement is a reference to an element in the DOM.

A StaleElementException is thrown when the element you were interacting is destroyed and then recreated. Most complex web pages these days will move things about on the fly as the user interacts with it and this requires elements in the DOM to be destroyed and recreated.

When this happens the reference to the element in the DOM that you previously had becomes stale and you are no longer able to use this reference to interact with the element in the DOM. When this happens you will need to refresh your reference, or in real world terms find the element again.
  • 依之前的直接保存 webelement 这种方案已经不行了,于是寻找可以唯一确定元素并且通用的定位方式,xpath 是再好不过的方案了,虽然定位有点慢,不过不影响使用,并且比较稳定
  • 在保存元素时改用保存 xpath 的方式,但是 webelement 元素并没有提供直接转化成 xpath 的方法,故寻找有没有比较好的解决方案。最终在 SeleniumHQ 上发现比较稳定的extract xpath of a Webelement的方式,采用 js 执行的方式去拼凑。
public static String getAbsoluteXPath(WebElement element)
    {
        return (String) ((JavascriptExecutor) driver).executeScript(
                "function absoluteXPath(element) {"+
                        "var comp, comps = [];"+
                        "var parent = null;"+
                        "var xpath = '';"+
                        "var getPos = function(element) {"+
                        "var position = 1, curNode;"+
                        "if (element.nodeType == Node.ATTRIBUTE_NODE) {"+
                        "return null;"+
                        "}"+
                        "for (curNode = element.previousSibling; curNode; curNode = curNode.previousSibling)
{"+
                        "if (curNode.nodeName == element.nodeName) {"+
                        "++position;"+
                        "}"+
                        "}"+
                        "return position;"+
                        "};"+

    "if (element instanceof Document) {"+
    "return '/';"+
    "}"+

    "for (; element && !(element instanceof Document); element = element.nodeType ==
Node.ATTRIBUTE_NODE ? element.ownerElement : element.parentNode) {"+
    "comp = comps[comps.length] = {};"+
    "switch (element.nodeType) {"+
    "case Node.TEXT_NODE:"+
    "comp.name = 'text()';"+
    "break;"+
    "case Node.ATTRIBUTE_NODE:"+
    "comp.name = '@' + element.nodeName;"+
    "break;"+
    "case Node.PROCESSING_INSTRUCTION_NODE:"+
    "comp.name = 'processing-instruction()';"+
    "break;"+
    "case Node.COMMENT_NODE:"+
    "comp.name = 'comment()';"+
    "break;"+
    "case Node.ELEMENT_NODE:"+
    "comp.name = element.nodeName;"+
    "break;"+
    "}"+
    "comp.position = getPos(element);"+
    "}"+

    "for (var i = comps.length - 1; i >= 0; i--) {"+
    "comp = comps[i];"+
    "xpath += '/' + comp.name.toLowerCase();"+
    "if (comp.position !== null) {"+
    "xpath += '[' + comp.position + ']';"+
    "}"+
    "}"+

    "return xpath;"+

"} return absoluteXPath(arguments[0]);", element);
    }

如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
暂无回复。
需要 登录 后方可回复, 如果你还没有账号请点击这里 注册