通用技术实现页面链接健康度的自动化遍历检测

扫地僧 · 2016年05月07日 · 最后由 49875183 回复于 2016年06月27日 · 5673 次阅读

本帖已被设为精华帖！

适用场景

快速遍历页面所有链接，找出失效链接或低级错误

实现逻辑

/**
     * 判断页面所有的链接是否健康,判断条件 http status 400~469 500~569 600~669
     *
     * @param url     被测网页地址
     * @param waitFor 每次加载页面的最长等待时间
     * @author quqing
     */
    public void testLinksHealth(String url, long... waitFor) {
        boolean isHealthPage;
        String href;
        String pageSource;
        String hasFind;
        List<String> hrefList = new ArrayList<String>();
        Map<String, String> actualResultMap = new LinkedHashMap<String, String>();
        Map<String, String> expectedResultMap = new LinkedHashMap<String, String>();

        try {
            driver.navigate().to(url);
            Thread.sleep(6000);
            List<WebElement> links = driver.findElements(By.xpath("//a"));
            Log.logInfo("links numbers -> " + links.size());

            for (WebElement link : links) {
                if (null != link.getText() && !"".equals(link.getText())) {
                    href = link.getAttribute("href");
                    if (href.startsWith("http:") || href.startsWith("https:") || href.startsWith("/")) {
                        hrefList.add(href + "!=!" + link.getText());
                        Log.logInfo(href);
                    }
                }
            }

            for (String sUrl : hrefList) {
                driver.get(sUrl.split("!=!")[0]);
                if (null != waitFor && waitFor.length > 0)
                    Thread.sleep(waitFor[0]);
                pageSource = driver.getPageSource();
                hasFind = findSubString(pageSource);
                Log.logInfo(sUrl.split("!=!")[0] + " -> " + sUrl.split("!=!")[1]);
                Log.logInfo("Page contains exception information -> " + hasFind);
                expectedResultMap.put(sUrl.split("!=!")[0], null);
                actualResultMap.put(sUrl.split("!=!")[0], hasFind);
                isHealthPage = (null == hasFind) ? true : false;
                Log.logInfo("Is it a healthy page? -> " + isHealthPage);
            }

            Assert.assertEquals(actualResultMap, expectedResultMap);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    /**
     * 查找自定义条件的字符串,判断条件用正则表达式描述
     *
     * @param url     被测网页地址
     * @param waitFor 每次加载页面的最长等待时间
     * @param regx    正则表达式（例如:HTTP Status [456][06][0-9]|(?i)exception）
     * @author quqing
     */
    public void testLinksHealth(String url, String regx, long... waitFor) {
        boolean isHealthPage;
        String href;
        String pageSource;
        String hasFind;
        List<String> hrefList = new ArrayList<String>();
        Map<String, String> actualResultMap = new LinkedHashMap<String, String>();
        Map<String, String> expectedResultMap = new LinkedHashMap<String, String>();

        try {
            driver.navigate().to(url);
            Thread.sleep(6000);
            List<WebElement> links = driver.findElements(By.xpath("//a"));
            Log.logInfo("links numbers -> " + links.size());

            for (WebElement link : links) {
                if (null != link.getText() && !"".equals(link.getText())) {
                    href = link.getAttribute("href");
                    if (href.startsWith("http:") || href.startsWith("https:") || href.startsWith("/")) {
                        hrefList.add(href + "!=!" + link.getText());
                        Log.logInfo(href);
                    }
                }
            }

            for (String sUrl : hrefList) {
                driver.get(sUrl.split("!=!")[0]);
                if (null != waitFor && waitFor.length > 0)
                    Thread.sleep(waitFor[0]);
                pageSource = driver.getPageSource();
                hasFind = findSubString(pageSource, regx);
                Log.logInfo(sUrl.split("!=!")[0] + " -> " + sUrl.split("!=!")[1]);
                Log.logInfo("Page contains exception information -> " + hasFind);
                expectedResultMap.put(sUrl.split("!=!")[0], null);
                actualResultMap.put(sUrl.split("!=!")[0], hasFind);
                isHealthPage = (null == hasFind) ? true : false;
                Log.logInfo("Is it a healthy page? -> " + isHealthPage);
            }

            Assert.assertEquals(actualResultMap, expectedResultMap);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    /**
     * 查找自定义条件的字符串,判断条件用正则表达式描述
     *
     * @param find 被查找的字符串
     * @param regx 正则表达式（例如:匹配包含换行符（回车）的任意字符串的正则表达式：[\\s\\S]*? ）
     * @return 匹配的字符串
     * @author quqing
     */
    public String findSubString(String find, String regx) {
        String str = null;
        try {
            Pattern pattern = Pattern.compile(regx);
            Matcher matcher = pattern.matcher(find);
            while (matcher.find()) {
                str = matcher.group();
                return str;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return str;
    }

    /**
     * 查找预定义条件的字符串,判断条件 http status 400~469 500~569 600~669
     *
     * @param find 被查找的字符串
     * @return 匹配的字符串
     * @author quqing
     */
    public String findSubString(String find) {
        return findSubString(find, "HTTP Status [456][06][0-9]");
    }

测试调用

// 默认判断条件
webOperate.testLinksHealth("http://172.18.16.205:8080/api_manage/");
// 默认判断条件，每次加载页面的等待时间
webOperate.testLinksHealth("http://172.18.16.205:8080/api_manage/",6000);
// 自定义判断条件
webOperate.testLinksHealth("http://172.18.16.205:8080/api_manage/","HTTP Status [456][06][0-9]|(?i)exception");
// 自定义判断条件，每次加载页面的等待时间
webOperate.testLinksHealth("http://172.18.16.205:8080/api_manage/","HTTP Status [456][06][0-9]",6000);

本文系原创，转载请注明出处

8 个赞

如果觉得我的文章对您有用，请随意打赏。您的支持将鼓励我继续创作！

打赏支持

共收到 20 条回复时间点赞

达峰的夏天 #1 · 2016年05月07日

同步渲染检测可以直接请求一遍 a 标签。

不过还有 img 等资源标签啊，还有异步加载造成的请求啊，建议用渲染内核做。

扫地僧 #2 · 2016年05月07日 Author

#1 楼 @xdf img 和 ajax 请求当时确实没考虑到，真是内行看门道啊。渲染过程中检测，非常棒的想法，有好的经验分享吗？

0x7C00 #3 · 2016年05月07日

hi，遇到一个问题。

<h3 class="title" onclick="javascript:set_map_data('http://www.xxxxx.com');"></h3>

请问下类似这种 js 跳转的链接，应该如何检测呢？

达峰的夏天 #4 · 2016年05月07日

#2 楼 @quqing 可以看下这个 macaca-electron

扫地僧 #5 · 2016年05月07日 Author

#3 楼 @xie_0723 渲染时检测应该能解决这个问题，可以参考下 xdf 的回复

扫地僧 #6 · 2016年05月07日 Author

#4 楼 @xdf 我先看下，谢谢

扫地僧 #7 · 2016年05月07日 Author

#4 楼 @xdf
#3 楼 @xie_0723 我想了想，我的方法通过加强，也能解决 ajax 异步请求、调用 js 里的链接，找到调用节点后模拟点击，实现起来有点小复杂；另一种解决方案是利用爬虫爬取所有的热点资源路径和 url 拼接成完整的请求地址＋ajax 请求地址＋js 调用地址，模拟 httpclient 发送请求进行测试，