I’m working on a project to get product info from an online store. I’ve got some code that uses BeautifulSoup but I’m stuck on getting the product links. Here’s what I’ve got so far:
import requests
from bs4 import BeautifulSoup
url = 'https://example-store.com'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
page_title = soup.find('title').text
print(f'Page title: {page_title}')
product_container = soup.find('div', class_='product-list')
if product_container:
product_items = product_container.find_all('li', class_='product-item')
for item in product_items:
product_link = item.find('a')['href']
product_name = item.find('h3', class_='product-name').text
print(f'Product: {product_name}, Link: {product_link}')
else:
print('No products found')
I’m trying to get the <li>
elements with product info, but it’s not working. Any ideas on how to fix this? Thanks!
Hiya Steve89! Your project sounds super interesting. Have you considered that the website might be using AJAX to load the product data dynamically?
If that’s the case, the initial HTML might not contain the product info you’re looking for. Maybe try printing out the entire soup and see what’s actually there? Or, you could inspect the network requests in your browser’s dev tools to see if there’s an API endpoint supplying the product data. Just thinking out loud here! What do you reckon? Let us know how it goes!
hey steve, looks like ur code is on the right track! maybe try using a more specific selector for the product container, like ‘#product-grid’ or ‘.product-wrapper’. also, double-check if the site uses javascript to load products - might need selenium instead. good luck with ur project!
Your approach seems solid, but there could be a few reasons why it’s not working as expected. First, ensure the class names in your selectors match exactly what’s on the site - even small differences can cause issues. Also, some e-commerce sites use lazy loading or AJAX to populate product listings, which means the content might not be immediately available in the initial HTML. In such cases, you might need to explore using Selenium or look into the site’s API if available. Lastly, consider adding error handling to your code to catch and log any exceptions, which can provide valuable debugging information.