Skip to content
/ piculet Public

Extract data from XML or HTML documents using XPath.

License

Notifications You must be signed in to change notification settings

uyar/piculet

Repository files navigation

Piculet

Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library. If available, it will make use of the lxml package for improved performance and better XPath support.

Piculet is used for the parsers of the Cinemagoer project.

Getting started

Piculet works with Python 3.8 and later versions. You can install it using pip:

pip install piculet

Installing Piculet creates a script named piculet which can be used to invoke the command line interface:

$ piculet -h
usage: piculet [-h] [--version] [--html] -s SPEC [document]

For example, say you want to extract some data from the file shining.html. An example specification is given in movie.json. Download both of these files and run the command:

$ piculet -s movie.json shining.html

Getting help

The documentation is available on: https://piculet.readthedocs.io/

The source code can be obtained from: https://github.com/uyar/piculet

License

Copyright (C) 2014-2023 H. Turgut Uyar <[email protected]>

Piculet is released under the LGPL license, version 3 or later. Read the included LICENSE.txt file for details.

About

Extract data from XML or HTML documents using XPath.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages