parsel package¶
Submodules¶
parsel.csstranslator module¶
-
class
parsel.csstranslator.
GenericTranslator
[source]¶ Bases:
parsel.csstranslator.TranslatorMixin
,cssselect.xpath.GenericTranslator
-
class
parsel.csstranslator.
HTMLTranslator
(xhtml=False)[source]¶ Bases:
parsel.csstranslator.TranslatorMixin
,cssselect.xpath.HTMLTranslator
parsel.selector module¶
XPath selectors based on lxml
-
class
parsel.selector.
Selector
(text=None, type=None, namespaces=None, root=None, base_url=None, _expr=None)[source]¶ Bases:
object
Selector
allows you to select parts of an XML or HTML text using CSS or XPath expressions and extract data from it.text
is aunicode
object in Python 2 or astr
object in Python 3type
defines the selector type, it can be"html"
,"xml"
orNone
(default). Iftype
isNone
, the selector defaults to"html"
.-
css
(query)[source]¶ Apply the given CSS selector and return a
SelectorList
instance.query
is a string containing the CSS selector to apply.In the background, CSS queries are translated into XPath queries using `cssselect`_ library and run
.xpath()
method.
-
extract
()[source]¶ Serialize and return the matched nodes as a list of unicode strings. Percent encoded content is unquoted.
-
namespaces
¶
-
re
(regex)[source]¶ Apply the given regex and return a list of unicode strings with the matches.
regex
can be either a compiled regular expression or a string which will be compiled to a regular expression usingre.compile(regex)
-
register_namespace
(prefix, uri)[source]¶ Register the given namespace to be used in this
Selector
. Without registering namespaces you can’t select or extract data from non-standard namespaces. See Selector examples on XML text.
-
remove_namespaces
()[source]¶ Remove all namespaces, allowing to traverse the document using namespace-less xpaths. See Removing namespaces.
-
root
¶
-
selectorlist_cls
¶ alias of
SelectorList
-
text
¶
-
type
¶
-
xpath
(query)[source]¶ Find nodes matching the xpath
query
and return the result as aSelectorList
instance with all elements flattened. List elements implementSelector
interface too.query
is a string containing the XPATH query to apply.
-
-
class
parsel.selector.
SelectorList
[source]¶ Bases:
list
The
SelectorList
class is a subclass of the builtinlist
class, which provides a few additional methods.-
css
(xpath)[source]¶ Call the
.css()
method for each element in this list and return their results flattened as anotherSelectorList
.query
is the same argument as the one inSelector.css()
-
extract
()[source]¶ Call the
.extract()
method for each element is this list and return their results flattened, as a list of unicode strings.
-
re
(regex)[source]¶ Call the
.re()
method for each element is this list and return their results flattened, as a list of unicode strings.
-
xpath
(xpath)[source]¶ Call the
.xpath()
method for each element in this list and return their results flattened as anotherSelectorList
.query
is the same argument as the one inSelector.xpath()
-
parsel.utils module¶
-
parsel.utils.
extract_regex
(regex, text)[source]¶ Extract a list of unicode strings from the given text/encoding using the following policies: * if the regex contains a named group called “extract” that will be returned * if the regex contains multiple numbered groups, all those will be returned (flattened) * if the regex doesn’t contain any group the entire regex matching is returned
-
parsel.utils.
flatten
(sequence) → list[source]¶ Returns a single, flat list which contains all elements retrieved from the sequence and all recursively contained sub-sequences (iterables). Examples: >>> [1, 2, [3,4], (5,6)] [1, 2, [3, 4], (5, 6)] >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, (8,9,10)]) [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10] >>> flatten([“foo”, “bar”]) [‘foo’, ‘bar’] >>> flatten([“foo”, [“baz”, 42], “bar”]) [‘foo’, ‘baz’, 42, ‘bar’]
Module contents¶
Parsel lets you extract text from XML/HTML documents using XPath or CSS selectors