History¶
1.8.0 (2023-04-18)¶
- Add support for JMESPath: you can now create a selector for a JSON document
and call
Selector.jmespath()
. See the documentation for more information and examples. - Selectors can now be constructed from
bytes
(using thebody
andencoding
arguments) instead ofstr
(using thetext
argument), so that there is no internal conversion fromstr
tobytes
and the memory usage is lower. - Typing improvements
- The
pkg_resources
module (which was absent from the requirements) is no longer used - Documentation build fixes
- New requirements:
jmespath
typing_extensions
(on Python 3.7)
1.7.0 (2022-11-01)¶
- Add PEP 561-style type information
- Support for Python 2.7, 3.5 and 3.6 is removed
- Support for Python 3.9-3.11 is added
- Very large documents (with deep nesting or long tag content) can now be
parsed, and
Selector
now takes a new argumenthuge_tree
to disable this - Support for new features of cssselect 1.2.0 is added
- The
Selector.remove()
andSelectorList.remove()
methods are deprecated and replaced with the newSelector.drop()
andSelectorList.drop()
methods which don’t delete text after the dropped elements when used in the HTML mode.
1.6.0 (2020-05-07)¶
- Python 3.4 is no longer supported
- New
Selector.remove()
andSelectorList.remove()
methods to remove selected elements from the parsed document tree - Improvements to error reporting, test coverage and documentation, and code cleanup
1.5.2 (2019-08-09)¶
Selector.remove_namespaces
received a significant performance improvement- The value of
data
within the printable representation of a selector (repr(selector)
) now ends in...
when truncated, to make the truncation obvious. - Minor documentation improvements.
1.5.1 (2018-10-25)¶
has-class
XPath function handles newlines and other separators in class names properly;- fixed parsing of HTML documents with null bytes;
- documentation improvements;
- Python 3.7 tests are run on CI; other test improvements.
1.5.0 (2018-07-04)¶
- New
Selector.attrib
andSelectorList.attrib
properties which make it easier to get attributes of HTML elements. - CSS selectors became faster: compilation results are cached
(LRU cache is used for
css2xpath
), so there is less overhead when the same CSS expression is used several times. .get()
and.getall()
selector methods are documented and recommended over.extract_first()
and.extract()
.- Various documentation tweaks and improvements.
One more change is that .extract()
and .extract_first()
methods
are now implemented using .get()
and .getall()
, not the other
way around, and instead of calling Selector.extract
all other methods
now call Selector.get
internally. It can be backwards incompatible
in case of custom Selector subclasses which override Selector.extract
without doing the same for Selector.get
. If you have such Selector
subclass, make sure get
method is also overridden. For example, this:
class MySelector(parsel.Selector):
def extract(self):
return super().extract() + " foo"
should be changed to this:
class MySelector(parsel.Selector):
def get(self):
return super().get() + " foo"
extract = get
1.4.0 (2018-02-08)¶
Selector
andSelectorList
can’t be pickled because pickling/unpickling doesn’t work forlxml.html.HtmlElement
; parsel now raises TypeError explicitly instead of allowing pickle to silently produce wrong output. This is technically backwards-incompatible if you’re using Python < 3.6.
1.3.1 (2017-12-28)¶
- Fix artifact uploads to pypi.
1.3.0 (2017-12-28)¶
has-class
XPath extension function;parsel.xpathfuncs.set_xpathfunc
is a simplified way to register XPath extensions;Selector.remove_namespaces
now removes namespace declarations;- Python 3.3 support is dropped;
make htmlview
command for easier Parsel docs development.- CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.
1.2.0 (2017-05-17)¶
- Add
SelectorList.get
andSelectorList.getall
methods as aliases forSelectorList.extract_first
andSelectorList.extract
respectively - Add default value parameter to
SelectorList.re_first
method - Add
Selector.re_first
method - Add
replace_entities
argument on.re()
and.re_first()
to turn off replacing of character entity references - Bug fix: detect
None
result from lxml parsing and fallback with an empty document - Rearrange XML/HTML examples in the selectors usage docs
- Travis CI:
- Test against Python 3.6
- Test against PyPy using “Portable PyPy for Linux” distribution
1.1.0 (2016-11-22)¶
- Change default HTML parser to lxml.html.HTMLParser, which makes easier to use some HTML specific features
- Add css2xpath function to translate CSS to XPath
- Add support for ad-hoc namespaces declarations
- Add support for XPath variables
- Documentation improvements and updates
1.0.3 (2016-07-29)¶
- Add BSD-3-Clause license file
- Re-enable PyPy tests
- Integrate py.test runs with setuptools (needed for Debian packaging)
- Changelog is now called
NEWS
1.0.2 (2016-04-26)¶
- Fix bug in exception handling causing original traceback to be lost
- Added docstrings and other doc fixes
1.0.1 (2015-08-24)¶
- Updated PyPI classifiers
- Added docstrings for csstranslator module and other doc fixes
1.0.0 (2015-08-22)¶
- Documentation fixes
0.9.6 (2015-08-14)¶
- Updated documentation
- Extended test coverage
0.9.5 (2015-08-11)¶
- Support for extending SelectorList
0.9.4 (2015-08-10)¶
- Try workaround for travis-ci/dpl#253
0.9.3 (2015-08-07)¶
- Add base_url argument
0.9.2 (2015-08-07)¶
- Rename module unified -> selector and promoted root attribute
- Add create_root_node function
0.9.1 (2015-08-04)¶
- Setup Sphinx build and docs structure
- Build universal wheels
- Rename some leftovers from package extraction
0.9.0 (2015-07-30)¶
- First release on PyPI.