I am trying to parse job descriptions from Djinni for a personal project. I`m using Python 3.6, BeautifulSoup4, and request library. When I use requests.get to get HTML of a job opening page, it returns HTML without the most critical part - the text of the description. For example, take this page’s URL - and the following code I wrote:
def scrape_job_desc(self, url):
job_desc_html = self._get_search_page_html(url)
soup = BeautifulSoup(job_desc_html, features='html.parser')
try:
short_desc = str(soup.find('p', {'class': 'job-teaser svelte-a3rpl2'}).getText())
full_desc = soup.find('div', {'class': 'job-description-wrapper svelte-a3rpl2'}).find('p').getText()
except AttributeError:
short_desc = None
full_desc = None
return short_desc, full_desc
def _get_search_page_html(self, url):
html = requests.get(url=url, headers={'User-Agent': 'Mozilla/5.0 CK={} (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko'})
return html.text
It will return the short_desc but not the full_desc. Furthermore, the text of the needed
tag is not present in the HTML at all. But when I download the page using my browser it’s all there. What is causing this?