selenium + python处理select标签下拉框的选项
1. 背景
在爬取网页是,有时候我们会遇到下图中的下拉框,也就是< select > < /select >标签。按照一般的点击方案是无法成功的,而selenium提供了专门的Select类来处理这种下拉框。2. 环境
python 3.6.1系统:win7IDE:pycharm安装过chrome浏览器配置好chromedriverselenium 3.7.03. select下拉框的处理方案
上图中的网页源代码如下:<select aria-describedby="searchDropdownDescription" class="nav-search-dropdown searchSelect" data-nav-digest="RflwCB5c/WnDs5EtBNnFFArlBJk" data-nav-selected="0" id="searchDropdownBox" name="url" style="display: block; top: 0px;" tabindex="18" title="Search in"><option selected="selected" value="search-alias=aps">All Departments</option><option value="search-alias=alexa-skills">Alexa Skills</option><option value="search-alias=amazon-devices">Amazon Devices</option><option value="search-alias=instant-video">Amazon Video</option><option value="search-alias=warehouse-deals">Amazon Warehouse Deals</option><option value="search-alias=appliances">Appliances</option><option value="search-alias=mobile-apps">Apps & Games</option><option value="search-alias=arts-crafts">Arts, Crafts & Sewing</option><option value="search-alias=automotive">Automotive Parts & Accessories</option><option value="search-alias=baby-products">Baby</option><option value="search-alias=beauty">Beauty & Personal Care</option><option value="search-alias=stripbooks">Books</option><option value="search-alias=popular">CDs & Vinyl</option><option value="search-alias=mobile">Cell Phones & Accessories</option><option value="search-alias=fashion">Clothing, Shoes & Jewelry</option><option value="search-alias=fashion-womens"> Women</option><option value="search-alias=fashion-mens"> Men</option><option value="search-alias=fashion-girls"> Girls</option><option value="search-alias=fashion-boys"> Boys</option><option value="search-alias=fashion-baby"> Baby</option><option value="search-alias=collectibles">Collectibles & Fine Art</option><option value="search-alias=computers">Computers</option><option value="search-alias=courses">Courses</option><option value="search-alias=financial">Credit and Payment Cards</option><option value="search-alias=digital-music">Digital Music</option><option value="search-alias=electronics">Electronics</option><option value="search-alias=lawngarden">Garden & Outdoor</option><option value="search-alias=gift-cards">Gift Cards</option><option value="search-alias=grocery">Grocery & Gourmet Food</option><option value="search-alias=handmade">Handmade</option><option value="search-alias=hpc">Health, Household & Baby Care</option><option value="search-alias=local-services">Home & Business Services</option><option value="search-alias=garden">Home & Kitchen</option><option value="search-alias=industrial">Industrial & Scientific</option><option value="search-alias=digital-text">Kindle Store</option><option value="search-alias=fashion-luggage">Luggage & Travel Gear</option><option value="search-alias=luxury-beauty">Luxury Beauty</option><option value="search-alias=magazines">Magazine Subscriptions</option><option value="search-alias=movies-tv">Movies & TV</option><option value="search-alias=mi">Musical Instruments</option><option value="search-alias=office-products">Office Products</option><option value="search-alias=pets">Pet Supplies</option><option value="search-alias=prime-exclusive">Prime Exclusive Savings</option><option value="search-alias=pantry">Prime Pantry</option><option value="search-alias=software">Software</option><option value="search-alias=sporting">Sports & Outdoors</option><option value="search-alias=tools">Tools & Home Improvement</option><option value="search-alias=toys-and-games">Toys & Games</option><option value="search-alias=vehicles">Vehicles</option><option value="search-alias=videogames">Video Games</option></select>
可以看一下selenium源代码中select.py文件的实现:…\selenium\webdriver\support\select.py
3.1. 第一种方案:select_by_value
使用条件:用于选取< option>标签的value值,要求必须要有value属性,当然,这不是废话嘛…。from selenium import webdriverfrom mon.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom mon.action_chains import ActionChainsfrom mon.keys import Keysfrom selenium.webdriver.support.ui import Selectimport randomimport timebrowser = webdriver.Chrome()browser.get("/")# 选中最后一个选项# <option value="search-alias=videogames">Video Games</option>classSelectValue = 'search-alias=videogames'Select(browser.find_element_by_tag_name("select")).select_by_value(classSelectValue)# 拿到搜索框input = browser.find_element_by_xpath("//form[@name='site-search']/div[@class='nav-fill']//input[@class='nav-input']")# 输入搜索关键字time.sleep(random.randrange(1, 5, 1))input.clear()input.send_keys('mount')# 敲enter键input.send_keys(Keys.RETURN)
3.2. 第二种方案:select_by_index
使用条件:要求下拉框的选项必须要有index属性,例如index=”1”。注意,这不是数组下标值,而是属性值!# 稍微变一下,这种方法在这不合适,仅仅是mark一下Select(browser.find_element_by_tag_name("select")).select_by_index(2)
3.3. 第三种方案:select_by_visible_text
使用条件:用于选取< option>标签的 text 值!# <option value="search-alias=toys-and-games">Toys & Games</option># 一定要注意是源代码中的text值,而不要从页面去复制,区别就在于,源代码中会有&classSelectText = 'Toys & Games'Select(browser.find_element_by_tag_name("select")).select_by_visible_text(classSelectText )