parsel package¶

Submodules¶

class parsel.csstranslator.GenericTranslator[source]¶: Bases: parsel.csstranslator.TranslatorMixin, cssselect.xpath.GenericTranslator

class parsel.csstranslator.HTMLTranslator(xhtml=False)[source]¶: Bases: parsel.csstranslator.TranslatorMixin, cssselect.xpath.HTMLTranslator

class parsel.csstranslator.TranslatorMixin[source]¶

Bases: object

xpath_text_simple_pseudo_element(xpath)[source]¶: Support selecting text nodes using ::text pseudo-element

class parsel.csstranslator.XPathExpr(path='', element='*', condition='', star_prefix=False)[source]¶

Bases: cssselect.xpath.XPathExpr

classmethod from_xpath(xpath, textnode=False, attribute=None)[source]¶

XPath selectors based on lxml

class parsel.selector.SafeXMLParser(*args, **kwargs)[source]¶: Bases: lxml.etree.XMLParser

class parsel.selector.Selector(text=None, type=None, namespaces=None, root=None, base_url=None, _expr=None)[source]¶

Bases: object

class parsel.selector.SelectorList[source]¶

Bases: list

parsel.selector.create_root_node(text, parser_cls, base_url=None)[source]¶: Create root node for text using given parser class.

parsel.utils.extract_regex(regex, text)[source]¶: Extract a list of unicode strings from the given text/encoding using the following policies: * if the regex contains a named group called “extract” that will be returned * if the regex contains multiple numbered groups, all those will be returned (flattened) * if the regex doesn’t contain any group the entire regex matching is returned

parsel.utils.flatten(sequence) → list[source]¶: Returns a single, flat list which contains all elements retrieved from the sequence and all recursively contained sub-sequences (iterables). Examples: >>> [1, 2, [3,4], (5,6)] [1, 2, [3, 4], (5, 6)] >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, (8,9,10)]) [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10] >>> flatten([“foo”, “bar”]) [‘foo’, ‘bar’] >>> flatten([“foo”, [“baz”, 42], “bar”]) [‘foo’, ‘baz’, 42, ‘bar’]

parsel.utils.iflatten(sequence) → iterator[source]¶: Similar to .flatten(), but returns iterator instead

Parsel lets you extract text from XML/HTML documents using XPath or CSS selectors