Package pypln :: Package workers :: Module extractor
[hide private]

Module extractor

source code

Functions [hide private]
 
parse_html(html, remove_tags=None, remove_inside=None, replace_with=' ') source code
 
get_pdf_metadata(data) source code
 
extract_pdf(data) source code
 
main(file_data) source code
Variables [hide private]
  __meta__ = {'from': 'gridfs-file', 'requires': ['contents'], '...
  regexp_tags = re_compile(r'(<[ \t]*(/?[a-zA-Z]*)[^>]*>)')
Variables Details [hide private]

__meta__

Value:
{'from': 'gridfs-file', 'requires': ['contents'], 'to': 'document', 'p\
rovides': ['text', 'metadata'],}