Parsel
Parsel is a BSD-licensed Python library to extract data from HTML, JSON, and XML documents.
It supports:
JMESPath expressions for JSON documents
Find the Parsel online documentation at https://parsel.readthedocs.org.
Example (open online demo):
>>> from parsel import Selector
>>> text = """
<html>
<body>
<h1>Hello, Parsel!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
<script type="application/json">{"a": ["b", "c"]}</script>
</body>
</html>"""
>>> selector = Selector(text=text)
>>> selector.css('h1::text').get()
'Hello, Parsel!'
>>> selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']
>>> for li in selector.css('ul > li'):
... print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org
>>> selector.css('script::text').jmespath("a").get()
'b'
>>> selector.css('script::text').jmespath("a").getall()
['b', 'c']
Parsel Documentation Contents
Contents:
- Installation
- Usage
- API reference
- History
- 1.9.1 (2024-04-08)
- 1.9.0 (2024-03-14)
- 1.8.1 (2023-04-18)
- 1.8.0 (2023-04-18)
- 1.7.0 (2022-11-01)
- 1.6.0 (2020-05-07)
- 1.5.2 (2019-08-09)
- 1.5.1 (2018-10-25)
- 1.5.0 (2018-07-04)
- 1.4.0 (2018-02-08)
- 1.3.1 (2017-12-28)
- 1.3.0 (2017-12-28)
- 1.2.0 (2017-05-17)
- 1.1.0 (2016-11-22)
- 1.0.3 (2016-07-29)
- 1.0.2 (2016-04-26)
- 1.0.1 (2015-08-24)
- 1.0.0 (2015-08-22)
- 0.9.6 (2015-08-14)
- 0.9.5 (2015-08-11)
- 0.9.4 (2015-08-10)
- 0.9.3 (2015-08-07)
- 0.9.2 (2015-08-07)
- 0.9.1 (2015-08-04)
- 0.9.0 (2015-07-30)