This project provides a set of tools for web scraping and data extraction. It includes features for crawling websites, extracting data from HTML content, and storing the extracted data in a structured format.
- URL crawling
- Data extraction from HTML
- Data storage in file-based storage
- Clone the repository:
git clone [repository URL]
- Install the dependencies:
pip install -r backend/requirements.txt
(for the backend) andcd frontend && npm install
(for the frontend) - Run the application:
python backend/main.py
(for the backend) andcd frontend && npm run dev
(for the frontend)
- Enter the URL to crawl in the form.
- Click the "Crawl" button.
- View the extracted data in the results display.
Please read the CONTRIBUTING.md file for information on how to contribute to this project.
This project is licensed under the MIT License - see the LICENSE file for details.