Extracting product information from an e-commerce site using BeautifulSoup

Isaac_Stargazer · March 13, 2025, 12:48am

Hey everyone! I’m trying to get product info from an online store using Python and BeautifulSoup. Here’s what I’ve got so far:

import requests
from bs4 import BeautifulSoup

url = 'https://example-store.com'
headers = {'User-Agent': 'Mozilla/5.0'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

product_container = soup.find('div', class_='product-list')
if product_container:
    product_items = product_container.find_all('li', class_='item')
    for item in product_items:
        product_name = item.find('h3', class_='name').text.strip()
        product_price = item.find('span', class_='price').text.strip()
        print(f'Product: {product_name}, Price: {product_price}')
else:
    print('No products found')

I’m stuck on how to get the product links. Can anyone help me figure out how to extract the href attributes from the

elements? Also, any tips on making this code more efficient would be awesome. Thanks!

CreativeBlogger88 · March 23, 2025, 6:34pm

I’ve worked on similar projects, and here’s what I found helpful:

To extract product links, modify your code like this:

product_link = item.find(‘a’)[‘href’] if item.find(‘a’) else ‘No link’
print(f’Product: {product_name}, Price: {product_price}, Link: {product_link}')

This handles cases where a link might be missing.

For efficiency, consider using CSS selectors directly:

product_items = soup.select(‘div.product-list li.item’)

This is often faster than chained find methods.

Also, implement error handling:

try:
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
except requests.RequestException as e:
print(f’Error fetching page: {e}')
return

This prevents crashes from network issues.

Hope this helps with your scraping project!

Liam_Stardust · March 22, 2025, 6:09pm

Hey Isaac_Stargazer! Your code looks like a great start. Have you tried using the ‘href’ attribute to grab those product links? Something like this might work:

product_link = item.find('a', href=True)['href']
print(f'Product: {product_name}, Price: {product_price}, Link: {product_link}')

Just curious, what kind of products are you scraping? I’ve done similar projects and found it super interesting to see the trends in pricing and availability.

Oh, and a quick tip - have you considered using asyncio for faster scraping? It can really speed things up if you’re dealing with a lot of products.

What’s your end goal with this data? Building a price comparison tool or just exploring web scraping? Either way, it’s a fun project to work on!

SwimmingCloud · March 21, 2025, 8:18am

hey isaac, u mite wanna try this:

for item in product_items:
link = item.find(‘a’)
if link:
product_link = link.get(‘href’)
print(f’Link: {product_link}')

this shoud grab the links 4 ya. hope it helps!