parsel package¶
Submodules¶
parsel.csstranslator module¶
-
class
parsel.csstranslator.
GenericTranslator
[source]¶ Bases:
parsel.csstranslator.TranslatorMixin
,cssselect.xpath.GenericTranslator
-
class
parsel.csstranslator.
HTMLTranslator
(xhtml=False)[source]¶ Bases:
parsel.csstranslator.TranslatorMixin
,cssselect.xpath.HTMLTranslator
parsel.selector module¶
XPath selectors based on lxml
-
class
parsel.selector.
Selector
(text=None, type=None, namespaces=None, root=None, base_url=None, _expr=None)[source]¶ Bases:
object
-
namespaces
¶
-
root
¶
-
selectorlist_cls
¶ alias of
SelectorList
-
text
¶
-
type
¶
-
parsel.utils module¶
-
parsel.utils.
extract_regex
(regex, text)[source]¶ Extract a list of unicode strings from the given text/encoding using the following policies: * if the regex contains a named group called “extract” that will be returned * if the regex contains multiple numbered groups, all those will be returned (flattened) * if the regex doesn’t contain any group the entire regex matching is returned
-
parsel.utils.
flatten
(sequence) → list[source]¶ Returns a single, flat list which contains all elements retrieved from the sequence and all recursively contained sub-sequences (iterables). Examples: >>> [1, 2, [3,4], (5,6)] [1, 2, [3, 4], (5, 6)] >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, (8,9,10)]) [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10] >>> flatten([“foo”, “bar”]) [‘foo’, ‘bar’] >>> flatten([“foo”, [“baz”, 42], “bar”]) [‘foo’, ‘baz’, 42, ‘bar’]
Module contents¶
Parsel lets you extract text from XML/HTML documents using XPath or CSS selectors