Pick random movie with a script python

Pick random movie with a script python

Introduction :

Hello, Today we will write a code on python to suggest us a movie name from the top movies on imdb . 

NOTE: This require a medium knowledge in html and python

    This operation is called web scraping in which we are going to scrape or extract data from imdb this data will be a: list of the top films,and the trailer of each film. Let's start coding...

Setup BeautifulSoup :

What is BeautifulSoup ?

Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. Wikipedia

Install BeautifulSoup :

pip install beautifulsoup4

or 

pip3 install bs4

Start Coding :

Preparing the html we want scrap :

    First we need to get the HTML code of the page contains the list of the top movies : 'https://www.imdb.com/chart/top' we will use a library called 'requests' and 'bs4 or BeautifulSoup' .

from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
}
response = requests.get('https://www.imdb.com/chart/top', headers=headers)
soup=BeautifulSoup(response.text, 'html.parser')

Now the variable soup contains the HTML code of the page we want.

Start extracting data :

Next we are going to extract all the titles of movies and store it in a list

tbody=soup.find_all('td',class_='titleColumn')

Then we are going to use the library 'random' to pick a random item from the list we made

randomNmber=random.randint(0,len(tbody))
nameOfMovie=tbody[nmb].find('a').text
year=tbody[nmb].find('span').text
link='https://www.imdb.com'+tbody[nmb].find('a').get('href')

Now what we did in this code is : we pick random number in range(0;len(tbody)) then we pick an item from the list by this random number .

this is easy to understand for people had knowledge about html now the code of the item we picked look like this :

<td class="titleColumn">
      1.
      <a href="/title/tt0111161/?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&amp;pf_rd_r=06YPM80E849SQRYFH0EP&amp;pf_rd_s=center-1&amp;pf_rd_t=15506&amp;pf_rd_i=top&amp;ref_=chttp_tt_1" title="Frank Darabont (dir.), Tim Robbins, Morgan Freeman">The Shawshank Redemption</a>
        <span class="secondaryInfo">(1994)</span>
    </td>

now if we want get the name of the movie and the link we are going extract it from this :

<a href="/title/tt0071562/?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&amp;pf_rd_r=06YPM80E849SQRYFH0EP&amp;pf_rd_s=center-1&amp;pf_rd_t=15506&amp;pf_rd_i=top&amp;ref_=chttp_tt_3" title="Francis Ford Coppola (dir.), Al Pacino, Robert De Niro">The Godfather: Part II</a>

the name is in the text of 'a' tag and the link is in the href attr 

so we are going to use :

nameOfMovie=tbody[randomNmber].find('a').text
link='https://www.imdb.com'+tbody[randomNmber].find('a').get('href')

same thing for the year is in the span text.

Now to get the trailer link we are going to do like the previously steps:

Get html of the link of movie then get the trailer link:

reso=requests.get(link,headers=headers)
soupmv=BeautifulSoup(reso.text, 'html.parser')
trailer=soupmv.find('a',class_='slate_button prevent-ad-overlay video-modal').get('href')
trailerLINK='https://www.imdb.com'+trailer

Now everything is good we have the name and trailer link so we are going to represent them  :

print('how about: '+nameOfMovie+'  '+year)
print('trailer link: '+trailerLINK)

We can use a loop to keep suggest films

Full code :

from bs4 import BeautifulSoup
import random
import requests
import webbrowser

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
}

response = requests.get('https://www.imdb.com/chart/top', headers=headers)
soup=BeautifulSoup(response.text, 'html.parser')
tbody=soup.find_all('td',class_='titleColumn')
######
######
while True:
randomNmber=random.randint(0,len(tbody))
nameOfMovie=tbody[randomNmber].find('a').text
year=tbody[randomNmber].find('span').text
link='https://www.imdb.com'+tbody[randomNmber].find('a').get('href')
reso=requests.get(link,headers=headers)
soupmv=BeautifulSoup(reso.text, 'html.parser')
trailer=soupmv.find('a',class_='slate_button prevent-ad-overlay video-modal').get('href')
trailerLINK='https://www.imdb.com'+trailer
print('how about: '+
nameOfMovie+' '+year) print('trailer link: '+trailerLINK) wht=input('go to trailer hit "go" new one hit "enter" exit(any key)') if wht=='go' or wht=='A': webbrowser.open(trailerLINK) if wht == '': continue else: break

OUTPUT :

OUTPUT of Pick random movie with a script python

Thanks for Reading if you have any problem contact me on contact form below or by my instagram... 

#Web scraping Python BeautifulSoup

#KeepCoding




You may like these posts