python - How to extract the <span> tag contents using the Beautiful Soup? -
i'm trying extract span tag content google translate website. content translated result has id="result_box". when tried print contents, returns none value.
please check image here
import requests bs4 import beautifulsoup r = requests.get("https://translate.google.co.in/?rlz=1c1chzl_enin729in729&um=1&ie=utf-8&hl=en&client=tw-ob#en/fr/good%20morning") soup = beautifulsoup(r.content, "lxml") spanner = soup.find(id = "result_box") result = spanner.text
requests doesn't execute javascript, use selenium
, phantomjs
headless browsing this:
from bs4 import beautifulsoup selenium import webdriver url = "https://translate.google.co.in/?rlz=1c1chzl_enin729in729&um=1&ie=utf-8&hl=en&client=tw-ob#en/fr/good%20morning" browser = webdriver.phantomjs() browser.get(url) html = browser.page_source soup = beautifulsoup(html, 'lxml') spanner = soup.find(id = "result_box") result = spanner.text
this gives our expected result:
>>> result 'bonjour'
Comments
Post a Comment