Python:
from bs4 import BeautifulSoup
import bs4
import requests
import html.parser
import pdb
import logging
logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)
def html_get():
with open("index.html", "w", encoding='utf-8') as f:
htmll = requests.get("https://www.rusplitka.ru/catalog/plitka-dlya-vannoj/")
f.write(htmll.text)
f.close()
def name_produkt():
logging.info('Start script')
html_get()
with open("index.html", 'r', encoding='utf-8') as f:
contents = f.read()
f.close()
for soup in BeautifulSoup(contents,features="html.parser").find_all("div", class_="description-block" ):
for soups in soup.find_all("a", class_='title'):
colektion = "https://www.rusplitka.ru"+soups.get('href')
f = open('colection.txt', 'a')
f.write(colektion)
f.close()
# logging.info('%s ', colektion)
html = requests.get(colektion).text
for a in BeautifulSoup(html, features="html.parser").find('div', class_="plitka grid").find('div', class_='item col-lg-2 col-sm-3').find('div', class_='wrap').find('div', class_='row').find('div', class_='col-sm-6').find_all('a'):
link = a.get('href')
logging.info('%s ', link)
f = open("links.txt", 'a+')
i = 1
while i < 2:
i = i + 1
f.write("https://www.rusplitka.ru"+link+ '\n')
print(name_produkt())
logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)
logging.info('stop script')
Я пишу парсер характеристики с сайта rusplitka.ru, мне нужно спарсить характеристику продукта со всех колекций.
Я написал этот скрипт, но он не закончен так как я с толкнулся с проблемой.
Мне нужно сначало спарсить ссылки на продукт пример для этого я с колекции подробно парсю по div а потом из a вытаскиваю href.
Скрипт мне выдает сначало 4 ссылки потом выдает ошибку.
2020-08-01 10:57:51,822 - Start script | |
2020-08-01 10:57:53,795 - /products/lb-ceramics/ornella-8109/plitka-nastennaya-ornella-ornella-bezh-5032-0203-24524/ | |
2020-08-01 10:57:54,482 - /products/41zero42/biscuit-16529/plitka-nastennaya-biscuit-plain-bianco-111641/ | |
2020-08-01 10:57:55,111 - /products/ape-ceramicas/arts-1-16261/keramogranit-arts-trendy-mix-20-107717/ | |
2020-08-01 10:57:55,914 - /products/arcana/stracciatella-15141/keramogranit-stracciatella-guba-20x20-8025-97792/ | |
Traceback (most recent call last): | |
File "d:/PARS-site/pars.py", line 36, in <module> | |
print(name_produkt()) | |
File "d:/PARS-site/pars.py", line 28, in name_produkt | |
for a in BeautifulSoup(html, features="html.parser").find('div', class_="plitka grid").find('div', class_='item col-lg-2 col-sm-3').find('div', class_='wrap').find('div', class_='row').find('div', class_='col-sm-6').find_all('a'): | |
AttributeError: 'NoneType' object has no attribute 'find' |
Спасибо за помощь
Good day I am writing a parser characteristics from the site rusplitka.ru, I need to parse the characteristics of a product from all collections. I wrote this script, but it is not finished since I ran into a problem. I need to parse the links to the product first. For this, I parse the div in detail from the collection and then pull out the href from a. The script first gives me 4 links and then gives an error. thanks for the help
#bs4 #request #html.pars #loggin #Pyton3
Последнее редактирование: