Ссылки являются фундаментальными элементами веб-страниц. По сути, это гиперссылки, которые делают возможным существование всемирной интернет сети. Ниже приведен пример ссылки, а также источник HTML.
<a href=“index.html” id=“recommend_selenium_link” class=“nav” data–id=“123” style=“font-size: 14px;”> Recommend Selenium </a> |
Запуск браузера
Тестирование веб-сайтов начинается с браузера.
from selenium import webdriver driver = webdriver.Firefox() driver.get(“http://testwisely.com/demo”) |
Используйте webdriver.Chrome или webdriver.Ie для тестирования в Chrome и IE соответственно. Для новичков рекомендуется закрыть окно браузера в конце тестового примера.
Найти ссылку по тексту
Поиск по тексту — это, пожалуй, самый прямой способ найти ссылку в Selenium, поскольку это то, что мы видим на странице.
driver.find_element_by_link_text(“Recommend Selenium”).click() |
Найти ссылку по ID
driver.find_element_by_id(“recommend_selenium_link”).click() |
Кроме того, если вы тестируете веб-сайт с несколькими языками, использование идентификаторов является, пожалуй, единственно возможным вариантом. Если вы не хотите писать тестовые скрипты, как показано ниже:
if is_italian: driver.find_element_by_link_text(“Accedi”).click() if is_chinese: driver.find_element_by_link_text(“注册”).click() else: driver.find_element_by_link_text(“Sign in”).click() |
Найти ссылку по частичному тексту
HTML содержимое:
<a href=“/page”>Get partial page</a> |
Поиск ссылки:
driver.find_element_by_partial_link_text(“partial”).click() |
Найти ссылку при помощи XPath
Пример ниже — поиск ссылки с текстом «Recommend Selenium» под тегом абзаца <p>.
driver.find_element_by_xpath(“//p/a[text()=’Recommend Selenium’]”).click() |
Можно сказать, что приведенный выше пример (найти по тексту ссылки) проще и интуитивно понятнее. Но давайте рассмотрим другой пример:
<div> First div <a href=“link-url.html”>Click here</a> </div> <div> Second div <a href=“link-partial.html”>Click here</a> </div> |
На этой странице есть две ссылки «Click here». Если в тестовом случае вам нужно щелкнуть вторую ссылку «Click here», простой find_element_by_link_text (“Click here“) не будет работать (так как он нажимает первый). Вот способ выполнения XPath:
xpath = ‘//div[contains(text(), “Second”)]/a[text()=”Click here”]’ driver.find_element_by_xpath(xpath).click() |
Клик на N-ую ссылку с похожим текстом
Это не редкость, что существует более одной ссылки с точно таким же текстом. По умолчанию Selenium выберет первый. Что, если вы выберете второй или N-ый? На приведенной ниже веб-странице есть три ссылки «Show Answer».
Чтобы выбрать вторую ссылку:
driver.find_elements_by_link_text(“Show Answer”)[1].click() |
find_elements_xxx возвращает список (также называемый массивом) веб-элементов, соответствующих критериям в порядке появления. Selenium (Python) использует индексирование на основе нуля, то есть первое из них равно 0.
Клик на N-ую ссылку по CSS критериям
Вы также можете использовать CSS выражения для поиска веб-элемента.
driver.find_element_by_css_selector(“p > a:nth-child(3)”).click() |
Однако, использование стилей более склонно к изменениям.
Получение атрибутов данных ссылок
Как только веб-элемент идентифицирован, мы можем получить его атрибуты. Это обычно применимо к большинству элементов HTML.
<a href=“recomand.html” id=“recommend_selenium_link”> Recommend Selenium </a> |
href = driver.find_element_by_link_text(“Recommend Selenium”).get_attribute(“href”) print(href) # Вывод: recomand.html id = driver.find_element_by_link_text(“Recommend Selenium”).get_attribute(“id”) print(id) # Вывод: recommend_selenium_link text = driver.find_element_by_id(“recommend_selenium_link”).text print(text) # Вывод: Recommend Selenium tag_name = driver.find_element_by_id(“recommend_selenium_link”).tag_name print(tag_name) # Вывод: a |
Тестовые ссылки открывают новое окно браузера
При нажатии на ссылку ниже откроется связанный URL-адрес в новом окне или вкладке браузера.
<a href=“http://testwisely.com/demo” target=“_blank”> Open new window </a> |
Чтобы найти новое окно браузера, будет проще выполнить все тесты в одном окне браузера. Вот как это сделать:
current_url = driver.current_url new_window_url = driver.find_element_by_link_text(“Open new window”).get_attribute(“href”) driver.get(new_window_url) # … тестируем на новом сайте driver.find_element_by_name(“name”).send_keys(“sometext”) driver.get(current_url) # Назад |
В этом тестовом скрипте мы используем локальную переменную (термин программирования) ‘current_url‘ для хранения текущего URL-адреса.
I am trying to copy the href value from a website, and the html code looks like this:
<p class="sc-eYdvao kvdWiq">
<a href="https://www.iproperty.com.my/property/setia-eco-park/sale-
1653165/">Shah Alam Setia Eco Park, Setia Eco Park
</a>
</p>
I’ve tried driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href")
but it returned 'list' object has no attribute 'get_attribute'
. Using driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href")
returned None
. But i cant use xpath because the website has like 20+ href which i need to copy all. Using xpath would only copy one.
If it helps, all the 20+ href are categorised under the same class which is sc-eYdvao kvdWiq
.
Ultimately i would want to copy all the 20+ href and export them out to a csv file.
Appreciate any help possible.
asked Feb 25, 2019 at 8:48
2
You want driver.find_elements if more than one element. This will return a list. For the css selector you want to ensure you are selecting for those classes that have a child href
elems = driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq [href]")
links = [elem.get_attribute('href') for elem in elems]
You might also need a wait condition for presence of all elements located by css selector.
elems = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sc-eYdvao.kvdWiq [href]")))
answered Feb 25, 2019 at 9:20
QHarrQHarr
83.1k12 gold badges54 silver badges100 bronze badges
2
As per the given HTML:
<p class="sc-eYdvao kvdWiq">
<a href="https://www.iproperty.com.my/property/setia-eco-park/sale-1653165/">Shah Alam Setia Eco Park, Setia Eco Park</a>
</p>
As the href
attribute is within the <a>
tag ideally you need to move deeper till the <a>
node. So to extract the value of the href
attribute you can use either of the following Locator Strategies:
-
Using
css_selector
:print(driver.find_element_by_css_selector("p.sc-eYdvao.kvdWiq > a").get_attribute('href'))
-
Using
xpath
:print(driver.find_element_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a").get_attribute('href'))
If you want to extract all the values of the href
attribute you need to use find_elements*
instead:
-
Using
css_selector
:print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_css_selector("p.sc-eYdvao.kvdWiq > a")])
-
Using
xpath
:print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a")])
Dynamic elements
However, if you observe the values of class attributes i.e. sc-eYdvao
and kvdWiq
ideally those are dynamic values. So to extract the href
attribute you have to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies:
-
Using
CSS_SELECTOR
:print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a"))).get_attribute('href'))
-
Using
XPATH
:print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a"))).get_attribute('href'))
If you want to extract all the values of the href
attribute you can use visibility_of_all_elements_located()
instead:
-
Using
CSS_SELECTOR
:print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a")))])
-
Using
XPATH
:print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
answered Jul 9, 2020 at 18:34
Get the whole element you want with driver.find_elements(By.XPATH, 'path')
.
To extract the href link use get_attribute('href')
.
Which gives,
driver.find_elements(By.XPATH, 'path').get_attribute('href')
answered Oct 1, 2022 at 14:08
The XPATH
//p[@class='sc-eYdvao kvdWiq']/a
return the elements you are looking for.
Writing the data to CSV file is not related to the scraping challenge. Just try to look at examples and you will be able to do it.
answered Feb 25, 2019 at 9:10
baldermanbalderman
22.6k7 gold badges33 silver badges49 bronze badges
To crawl any hyperlink or Href, proxycrwal API is ideal as it uses pre-built functions for fetching desired information. Just pip install the API and follow the code to get the required output. The second approach to fetch Href links using python selenium is to run the following code.
Source Code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
list = ['https://www.heliosholland.com/Ampullendoos-voor-63-ampullen','https://www.heliosholland.com/lege-testdozen’]
driver = webdriver.Chrome()
wait = WebDriverWait(driver,29)
for i in list:
driver.get(i)
image = wait.until(EC.visibility_of_element_located((By.XPATH,'/html/body/div[1]/div[3]/div[2]/div/div[2]/div/div/form/div[1]/div[1]/div/div/div/div[1]/div/img'))).get_attribute('src')
print(image)
To scrape the link, use .get_attribute(‘src’).
answered Jun 3, 2021 at 19:04
BilalBilal
91 bronze badge
try something like:
elems = driver.find_elements_by_xpath("//p[contains(@class, 'sc-eYdvao') and contains(@class='kvdWiq')]/a")
for elem in elems:
print elem.get_attribute['href']
answered Feb 25, 2019 at 9:01
RobertRobert
1714 bronze badges
How to locate the href elements using selenium and python
i tried below code but its not working
driver.find_element_by_xpath("//a[@href]")
asked Apr 4, 2017 at 7:37
1
To locate href
attribute usually you can use
//a/@href
but selenium
doesn’t support this syntax as in selenium
you can locate webelements only.
You can try below:
driver.find_element_by_xpath("//a").get_attribute('href')
With this line of code you should be able to get href
attribute of the first anchor element on the page.
answered Apr 4, 2017 at 7:43
AnderssonAndersson
51.4k16 gold badges74 silver badges127 bronze badges
Try below code with replacing href attribute value with your value.
driver.find_element_by_xpath("//a[@href='href attribute value']");
answered Apr 4, 2017 at 7:56
AkarshAkarsh
9675 silver badges9 bronze badges
We can fetch href links in a page in Selenium by using the method find_elements(). All the links in the webpage are designed in a html document such that they are enclosed within the anchor tag.
To fetch all the elements having <anchor> tagname, we shall use the method find_elements_by_tag_name(). It will fetch a list of elements of anchor tag name as given in the method argument. If there is no matching tagname in the page, an empty list shall be returned.
Example
Code Implementation.
from selenium import webdriver driver = webdriver.Chrome (executable_path="C:chromedriver.exe") driver.maximize_window() driver.get("https://www.google.com/") # identify elements with tagname <a> lnks=driver.find_elements_by_tag_name("a") # traverse list for lnk in lnks: # get_attribute() to get all href print(lnk.get_attribute(href)) driver.quit()
Output
Links in HTML are nothing but hyperlinks. When we click on a link, we are navigated to another page or document.
HTML links are defined with anchor (<a>) tags.
Syntax: <a href=”url”> link text </a>
Example:
<a href="http://qatechhub.com/selenium/">Selenium Tutorial</a>
Here in above code, “href“ is an attribute which specifies the address of another HTML page and “Selenium Tutorial” is the link text associated with it. (i.e. the visible part on the page).
Identifiers in Selenium WebDriver to Interact with Links:
Selenium WebDriver provides two different identifiers to interact with links.
- linktext()
- partialLinktext()
linktext() identifier matches exact text associated with the link web element. So, to click on above-mentioned link code will be:
driver.findElement(By.linkText("Selenium Tutorial")).click();
partialLinktext() identifier matches partial text associated with the link web element. It can be considered as a contains method. So, to click on above-mentioned link code will be:
Driver.findElement(By.partialLinkText("Selenium")).click(); //or Driver.findElement(By.partialLinkText("Tutorial")).click();
Actions performed on Links:
We can perform different actions on a link like click(), getText(), moverHover().
The click() method is to click on the mentioned link, this will navigate you to another webpage in the same window or in another window.
driver.findElement(By.linkText("Selenium Tutorial")).click();
The getText() method is to fetch the text from a link. Below code will return “Selenium Tutorial”.
driver.findElement(By.partialLinkText("Tutorial")).getText();
The mouseHover() operation we will learn in detail in later classes.
Getting Attribute from an HTML Tag:
One of the important scenario while working with links is to get the URL from the link. getAttribute(“String”) method is used to get the url from the link. See the below example, for above HTML link the code will return “http://qatechub.com/selenium”.
String url = driver.findElement(By.linkText("Selenium Tutorial")).getAttribute("href");
Let us try some interview based scenarios:
Scenario 1: Counting numbers of link on a web page say Flipkart
- Open any browser say “Chrome”
- Navigate to Flipkart website. (http://www.flipkart.com)
- Write a code to get the number of links on this page.
To get the number of links on a page, we will first get all the links using a method called findElements(By.tagName(“a”)). This method will return a list of all the WebElements in a list. Call size() method to get the count of the number of WebElements i.e. number of links.
package day3; import java.util.concurrent.TimeUnit; import org.openqa.selenium.By; import org.openqa.selenium.Dimension; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.firefox.FirefoxDriver; import org.openqa.selenium.ie.InternetExplorerDriver; public class WorkingWithLinks { WebDriver driver; String sUrl = "http://www.flipkart.com"; public void invokeBrowser(String driver){ if(driver.equalsIgnoreCase("ff")){ oBrowser = new FirefoxDriver(); } else if(driver.equalsIgnoreCase("chrome")){ System.setProperty("webdriver.chrome.driver", "C:\workspace\libs\chromedriver.exe"); oBrowser = new ChromeDriver(); } else if(driver.equalsIgnoreCase("ie")){ System.setProperty("webdriver.ie.driver", "C:\workspace\libs\IEDriverServer.exe"); oBrowser = new InternetExplorerDriver(); } else { System.err.println("Invalid Browser type.. setting default browser as Firefox Driver"); driver = new FirefoxDriver(); } // To Maximize the window driver.manage().window().maximize(); driver.manage().deleteAllCookies(); driver.manage().timeouts().pageLoadTimeout(60, TimeUnit.SECONDS); driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); driver.get(sUrl); } public void getNumberOfLinks(){ int iNumberOfLinks = driver.findElements(By.tagName("a")).size(); System.out.println("Number of Links on a page : "+ iNumberOfLinks); } public void closeBrowser(){ driver.quit(); } }
To Execute the code let us create a main method:
package day3; public class DemoWorkingWithLinks { public static void main(String[] args) { WorkingWithLinks wl = new WorkingWithLinks(); wl.invokeBrowser("chrome"); wl.numberOfLinks(); } }
In above code, getNumberOfLinks() method is used to get the number of links from a page.
Scenario 2: Printing the text associated with all the links and their URL on a page say Flipkart.
- Open any browser say “Chrome.”
- Navigate to Flipkart site (http://www.flipkart.com)
- Write a code to get the number of links on this page.
- Write a code to get the text associated with each link and its URL.
This scenario is an extension of the above scenario. Here, after finding the number of links. We will iterate over each link and fetch its text using a method called getText() and fetch its URL by a method called getAttribute(“href”).
Refer below code for the same:
package day3; import java.util.List; import java.util.concurrent.TimeUnit; import org.openqa.selenium.By; import org.openqa.selenium.Dimension; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.firefox.FirefoxDriver; import org.openqa.selenium.ie.InternetExplorerDriver; public class WorkingWithLinks { WebDriver driver; String sUrl = "http://www.flipkart.com"; public void invokeBrowser(String sBrowserType){ if(sBrowserType.equalsIgnoreCase("ff")){ driver = new FirefoxDriver(); } else if(sBrowserType.equalsIgnoreCase("chrome")){ System.setProperty("webdriver.chrome.driver", "C:\workspace\libs\chromedriver.exe"); driver = new ChromeDriver(); } else if(sBrowserType.equalsIgnoreCase("ie")){ System.setProperty("webdriver.ie.driver", "C:\workspace\libs\IEDriverServer.exe"); driver = new InternetExplorerDriver(); } else { System.err.println("Invalid Browser type.. setting default browser as Firefox Driver"); driver = new FirefoxDriver(); } // To Maximize the window driver.manage().window().maximize(); driver.manage().deleteAllCookies(); driver.manage().timeouts().pageLoadTimeout(60, TimeUnit.SECONDS); driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); driver.get(sUrl); } public void numberOfLinks(){ int iNumberOfLinks = driver.findElements(By.tagName("a")).size(); System.out.println("Number of Links on a page : "+ iNumberOfLinks); } public void printAllLinks(){ List<WebElement> oList; oList = driver.findElements(By.tagName("a")); // OR // oList = oBrowser.findElements(By.xpath("//a")); // -- you can use xpath also for(WebElement oTemp : oList){ System.out.println("The Link Text of WebElement : "+ oTemp.getText() + " and its url is " + oTemp.getAttribute("href")); System.out.println("***********************************************"); } } public void closeBrowser(){ driver.close(); } }
Enhanced for loop is used to iterate over each link, fetching its text and associated url.
This is all in this tutorial, I hope you enjoyed the different scenario’s possible with links using Selenium WebDriver.
For any questions, queries or comments feel free to write us at support@qatechhub.com or saurabh@qatechub.com. Happy Learning 🙂