Appium 基于图像识别、文字识别，完成点击操作 (一)

Smile · 2018年10月17日 · 最后由 snowLiang 回复于 2019年07月29日 · 3474 次阅读

简介：

基于 Appium、aircv、tesseract

实现识别截图上的中文，完成对中文的点击操作；

实现有目标截图和原图的情况下，完成对目标截图在原图上的点击操作。

依赖：

1、aircv( 网易开源的项目，基于图像识别，定位到待点击图片位置。Find object position based on python-opencv2 for python2.7+)

2、tesseract( Tesseract Open Source OCR Engine，基于文字识别的引擎。有兴趣可以深入去了解一下。)

3、Appium( Appium 的安装有问题的小伙伴可以私信我。)

用法：

文件名：Locate_Image_Click.py

两个模块：

1、click

def click(self, imgsrc, imgobj):

    imsrc = ac.imread(imgsrc)  # 原始图像
    imsch = ac.imread(imgobj)  # 待查找的部分
    position = ac.find_sift(imsrc, imsch)

    x, y = position['result']

    print("x = ", x)
    print("y = ", y)

    self.driver.swipe(x, y, x, y, 50)  # 点击操作

调用例子：

imgsrc = 'path to imgsrc.png'
imgobj = 'path to imgobj.png'
self.Locate_Image_Click.click(imgsrc, imgobj)

注：这里只用到了aircv的SIFT查找图像，另外还有SIFT多个相同的部分查找、直接匹配查找图像，感兴趣的同学可以自行去了解。

2、click_text

def click_text(self, text, imagename):

    sleep(1)
    h = self.driver.get_window_size()['height']    # 获取屏幕高度
    self.driver.get_screenshot_as_file(imagename)  # 截屏保存在执行脚本文件夹

    if len(text) is None:
        print('请输入需要点击的文字，目前最多支持2个字！')

    elif len(text) == 1:

        if os.path.isfile(imagename):
            os.system('tesseract {} out -l chi_sim makebox'.format(imagename))
            print("输出坐标文件 : out.box")
        else:
            print("{} not found.format(" + imagename + ")")

        list1 = []  # 创建列表，用于存储要点击的文字的位置信息

        if os.path.isfile('out.box'):
            with open('out.box') as f:
                for line in f:
                    if line.split()[0] in text:
                        list1.append(line.split())

        x = (int(list1[0][1]) + int(list1[0][3]))/2
        print(text + ' X坐标为：', x)

        y = int((h - int(list1[0][2])) + (h - int(list1[0][4])))/2
        print(text + ' Y坐标为：', y)

        self.driver.swipe(x, y, x, y, 50)

    elif len(text) == 2:
        if os.path.isfile(imagename):
            os.system('tesseract {} out -l chi_sim makebox'.format(imagename))
            print("输出坐标文件 : out.box")
        else:
            print("{} not found.format(" + imagename + ")")

        list2 = []

        if os.path.isfile('out.box'):
            with open('out.box') as f:
                for line in f:
                    if line.split()[0] in text:
                        list2.append(line.split())

        point_mid_x1 = (int(list2[0][1]) + int(list2[0][3]))/2  # 第一个字的X轴中间点
        point_mid_x2 = (int(list2[1][1]) + int(list2[1][3]))/2  # 第二个字的X轴中间点

        x = (point_mid_x1 + point_mid_x2)/2
        print(text + ' X坐标为：', x)

        # 一般认为第一个字和第二个字的中间点Y轴是一样的，所以取一个字的Y轴就可以了
        y = int((h - int(list2[0][2])) + (h - int(list2[0][4])))/2
        print(text + ' Y坐标为：', y)

        self.driver.swipe(x, y, x, y, 50)

    elif len(text) == 3:
        if os.path.isfile(imagename):
            os.system('tesseract {} out -l chi_sim makebox'.format(imagename))
            print("输出坐标文件 : out.box")
        else:
            print("{} not found.format(" + imagename + ")")

        list3 = []

        if os.path.isfile('out.box'):
            with open('out.box') as f:
                for line in f:
                    if line.split()[0] in text:
                        list3.append(line.split())

        point_mid_x1 = (int(list3[0][1]) + int(list3[0][3]))/2  # 第1个字的X轴中间点
        point_mid_x2 = (int(list3[1][1]) + int(list3[1][3]))/2  # 第2个字的X轴中间点
        point_mid_x3 = (int(list3[2][1]) + int(list3[2][3]))/2  # 第3个字的X轴中间点

        x = (point_mid_x1 + point_mid_x2 + point_mid_x3)/2
        print(text + ' X坐标为：', x)

        # 一般认为三个字的中间点Y轴是一样的，所以取一个字的Y轴就可以了
        y = int((h - int(list3[0][2])) + (h - int(list3[0][4])))/2
        print(text + ' Y坐标为：', y)

        self.driver.swipe(x, y, x, y, 50)

    else:
        print('目前最多支持3个字！')

调用例子

self.Locate_Image_Click.click_text(self, '待查找文字', '屏幕截图的名字')

注：  
1、目前支持1-3个中文定位点击，合理利用可以事半功倍。
2、很重要！这里调用tesseract，直接用的系统命令，所以首先需要在本机上跑通tesseract。

tesseract 本机安装：

1、安装方法

2、命令行使用方法

3、本机调试

MAC系统：  

brew install tesseract # 安装

tesseract [path to image] outputbase -l chi_sim makebox # chi_sim 是官方训练的中文识别库 ，makebox参数会生成一个保存文字坐标的文件（out.box）

cat output.txt  # 抓取输出txt，在命令行里看到识别出的txt，说明本地可以跑通。

4、数据训练

5、chi_sim 下载

注：  
1、chi_sim 下载后放在 /usr/local/Cellar/tesseract/3.05.02/share/tessdata/
2、如果感觉识别不够准确，可以自己训练数据。

12 个赞

如果觉得我的文章对您有用，请随意打赏。您的支持将鼓励我继续创作！

打赏支持

共收到 8 条回复时间点赞

snowLiang #9 · 2019年07月29日

好像 aircv 对不同分辨率的图片，匹配结果不太准确，楼主有研究过没呀~

Smile #8 · 2019年01月09日 Author

对

杉菜回复

对，这个地方直接是两张图片对比，对照目标图片在原图片上找到位置

杉菜 #3 · 2019年01月04日

其实，特别想知道 imgobj = 'path to imgobj.png'，这个待查找的界面，是需要运行到相应界面，截图到电脑，然后再触发对比吗？

Smile 在利用图像识别技术解决非原生控件的定位问题中提及了此贴 10月30日 20:08

pan #5 · 2018年10月19日

对

Smile 回复

(๑‾ ꇴ ‾๑) 好哒，坐等分享

Smile #4 · 2018年10月19日 Author

对

pan 回复

有解决方案的，等我周末有时间了捋一下~

pan #3 · 2018年10月18日

挺好的文章，因为业务需求，我也在走这条路，我走的路线是 Appium+openCV2，我自己只封装了一个简单的 click 操作，那想问下楼主：假设我想通过 OpenCV2 去断言一个元素是否存在，这个是需要用到 CV2 的特征匹配吗？若有意思讨论下，请问可以互留 QQ 莫？

Smile #2 · 2018年10月17日 Author

对

枫叶回复

笔芯~

枫叶 #1 · 2018年10月17日 1 个赞

好文章，收藏

需要登录后方可回复, 如果你还没有账号请点击这里注册。