Pick random movie with a script python
Introduction :
Hello, Today we will write a code on python to suggest us a movie name from the top movies on imdb .
NOTE: This require a medium knowledge in html and python
This operation is called web scraping in which we are going to scrape or extract data from imdb this data will be a: list of the top films,and the trailer of each film. Let's start coding...
Setup BeautifulSoup :
What is BeautifulSoup ?
Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. Wikipedia
Install BeautifulSoup :
pip install beautifulsoup4
or
pip3 install bs4
Start Coding :
Preparing the html we want scrap :
First we need to get the HTML code of the page contains the list of the top movies : 'https://www.imdb.com/chart/top' we will use a library called 'requests' and 'bs4 or BeautifulSoup' .
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
}
response = requests.get('https://www.imdb.com/chart/top', headers=headers)
soup=BeautifulSoup(response.text, 'html.parser')
Now the variable soup contains the HTML code of the page we want.
Start extracting data :
Next we are going to extract all the titles of movies and store it in a list
tbody=soup.find_all('td',class_='titleColumn')
Then we are going to use the library 'random' to pick a random item from the list we made
randomNmber=random.randint(0,len(tbody))
nameOfMovie=tbody[nmb].find('a').text
year=tbody[nmb].find('span').text
link='https://www.imdb.com'+tbody[nmb].find('a').get('href')
Now what we did in this code is : we pick random number in range(0;len(tbody)) then we pick an item from the list by this random number .
this is easy to understand for people had knowledge about html now the code of the item we picked look like this :
<td class="titleColumn">
1.
<a href="/title/tt0111161/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=06YPM80E849SQRYFH0EP&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_1" title="Frank Darabont (dir.), Tim Robbins, Morgan Freeman">The Shawshank Redemption</a>
<span class="secondaryInfo">(1994)</span>
</td>
now if we want get the name of the movie and the link we are going extract it from this :
<a href="/title/tt0071562/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962-327b42fe94b1&pf_rd_r=06YPM80E849SQRYFH0EP&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_3" title="Francis Ford Coppola (dir.), Al Pacino, Robert De Niro">The Godfather: Part II</a>
the name is in the text of 'a' tag and the link is in the href attr
so we are going to use :
nameOfMovie=tbody[randomNmber].find('a').text
link='https://www.imdb.com'+tbody[randomNmber].find('a').get('href')
same thing for the year is in the span text.
Now to get the trailer link we are going to do like the previously steps:
Get html of the link of movie then get the trailer link:
reso=requests.get(link,headers=headers)
soupmv=BeautifulSoup(reso.text, 'html.parser')
trailer=soupmv.find('a',class_='slate_button prevent-ad-overlay video-modal').get('href')
trailerLINK='https://www.imdb.com'+trailer
Now everything is good we have the name and trailer link so we are going to represent them :
print('how about: '+nameOfMovie+' '+year)
print('trailer link: '+trailerLINK)
We can use a loop to keep suggest films
Full code :
from bs4 import BeautifulSoup
import random
import requests
import webbrowser
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
}
response = requests.get('https://www.imdb.com/chart/top', headers=headers)
soup=BeautifulSoup(response.text, 'html.parser')
tbody=soup.find_all('td',class_='titleColumn')
######
######
while True:
randomNmber=random.randint(0,len(tbody))
nameOfMovie=tbody[randomNmber].find('a').text
year=tbody[randomNmber].find('span').text
link='https://www.imdb.com'+tbody[randomNmber].find('a').get('href')
reso=requests.get(link,headers=headers)
soupmv=BeautifulSoup(reso.text, 'html.parser')
trailer=soupmv.find('a',class_='slate_button prevent-ad-overlay video-modal').get('href')
trailerLINK='https://www.imdb.com'+trailer
print('how about: '+
nameOfMovie
+' '+year) print('trailer link: '+trailerLINK) wht=input('go to trailer hit "go" new one hit "enter" exit(any key)') if wht=='go' or wht=='A': webbrowser.open(trailerLINK) if wht == '': continue else: break
OUTPUT :
Thanks for Reading if you have any problem contact me on contact form below or by my instagram...
#Web scraping Python BeautifulSoup
#KeepCoding