Python requests.get not returning text in one of the tags in html document

tobimarsh43 · November 10, 2020, 6:51am

I am trying to parse job descriptions from Djinni for a personal project. I`m using Python 3.6, BeautifulSoup4, and request library. When I use requests.get to get HTML of a job opening page, it returns HTML without the most critical part - the text of the description. For example, take this page’s URL - and the following code I wrote:

def scrape_job_desc(self, url):
    job_desc_html = self._get_search_page_html(url)
    soup = BeautifulSoup(job_desc_html, features='html.parser')
    try:
        short_desc = str(soup.find('p', {'class': 'job-teaser svelte-a3rpl2'}).getText())
        full_desc = soup.find('div', {'class': 'job-description-wrapper svelte-a3rpl2'}).find('p').getText()
    except AttributeError:
        short_desc = None
        full_desc = None
    return short_desc, full_desc

def _get_search_page_html(self, url):
    html = requests.get(url=url, headers={'User-Agent': 'Mozilla/5.0 CK={} (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko'})
    return html.text

It will return the short_desc but not the full_desc. Furthermore, the text of the needed

tag is not present in the HTML at all. But when I download the page using my browser it’s all there. What is causing this?

bikashsaud · May 15, 2022, 1:44am

@tobimarsh43 how can you get returned values ?

if you get values like below, it works.

short_desc, full_desc = scrape_jobs_desc(url....)