pymedextcore package¶
Submodules¶
pymedextcore.annotators module¶
- class pymedextcore.annotators.Annotation(type: str, value: str, source: str, source_ID: str, span: Optional[Tuple[int, int]] = None, attributes: Optional[Dict] = None, isEntity: bool = False, ID: Optional[str] = None, ngram: Optional[str] = None)[source]¶
Bases:
object
Based object which contains Annotation. Each Annotator must return a list of Annotations.
- add_child(child)[source]¶
Add a child to current Annotation
- Parameters
child – An annotation to set as child of current node
- Returns
None
- Return type
None
- add_property(neighbor)[source]¶
add property of a neighbor to current annnotation, if both have the same span
- Parameters
neighbor – the Annotation neighbor to add the same property
- Returns
None
- Return type
None
- get_attributes()[source]¶
get Attributes from current and parents Node
- Returns
attributes
- Return type
a dict
- get_children_span()[source]¶
from current node, will return all children span
- Returns
tuple of span
- Return type
list of tuple
- get_entities_children()[source]¶
From current Node, return all children which are Annotation where isEntity =True entities
- Returns
children list
- Return type
list
- get_parent(from_type)[source]¶
return closest parent of the current Annotation of a specific type
- Parameters
from_type – specific type to found
- Returns
Annotation of a specific type
- Return type
- get_parents_properties()[source]¶
return parent properties of current annotations if it’s belong to a specific type
- Parameters
filter_type – list of Annotations types
- Returns
list of current and parents Annotation properties
- Return type
list of dict
- get_properties()[source]¶
return current node Properties if the Annotation is from a specific type
- Parameters
filter_type – list of Annotations type
- Returns
properties
- Return type
list of dictionnary
- set_parent(parent)[source]¶
set Parent to current Annotation
- Parameters
parent – Annotation
- Returns
1
- Return type
int
- class pymedextcore.annotators.Annotator[source]¶
Bases:
object
Abstract class of each Annotator. For that purpose an Annotator must implement the function annotate_function(). This function return a list of Annotations object.
- annotate_function(_input)[source]¶
main annotation function each Annotator must implement this function
- Parameters
_input – a list of Annotation typet
- Returns
a list of annotations. they will be added to Document.annotations
- Return type
List[Annotation]
- get_all_key_input(_input)[source]¶
returns all key input for the Annotator
- param _input
return all annotations of a specific types from the Document
- returns
a list of annotations
- rtype
a list of annotation
Deprecated since version 0.3: This function will be removed soon use instead select_all_inputs
- get_first_key_input(_input)[source]¶
get_first_key_input, return the annotation type [0].
- param _input
list of annotations input for the Annotator
- returns
a list of annotations
- rtype
a list of annotations
Deprecated since version 0.3: This function will be removed soon use instead select_first_input
- get_key_input(_input, i)[source]¶
return a specific annotations type from key_input :param _input: key_input list :param i: the indice of the list to selecy :returns:a list of annotations :rtype:a list of annotation
pymedextcore.bioctransform module¶
- class pymedextcore.bioctransform.BioC[source]¶
Bases:
pymedextcore.datatransform.DataTransform
- static load_collection(bioc_input: str, format: int = 0, is_file: bool = True)[source]¶
load a bioc collection xml or json. It will return a list of Document object.
- Parameters
bioc_input – a str path to a bioc file or a bioc input string
format – xml or to_json type of the bioc file
is_file – if True bioc_input is path else it is a string
- Returns
list of Document
- static save_as_collection(list_of_pymedext_documents: List[pymedextcore.document.Document])[source]¶
save a list of pymedext document as a bioc collection . It will return a bioc collection object.
- Parameters
list_of_pymedext_documents – a list of Document
- Returns
a bioc collection object
pymedextcore.brat_parser module¶
- class pymedextcore.brat_parser.Attribute(id: str, type: str, target: str, values: Tuple[str, ...] = ())[source]¶
Bases:
object
A simple attribute data structure.
- id: str¶
- target: str¶
- type: str¶
- values: Tuple[str, ...] = ()¶
- class pymedextcore.brat_parser.AugmentedEntity(id: str, type: str, span: Tuple[Tuple[int, int], ...], text: str, relations_from_me: Tuple[pymedextcore.brat_parser.Relation, ...], relations_to_me: Tuple[pymedextcore.brat_parser.Relation, ...], attributes: Tuple[pymedextcore.brat_parser.Attribute, ...])[source]¶
Bases:
object
An augmented entity data structure with its relations and attributes.
- attributes: Tuple[pymedextcore.brat_parser.Attribute, ...]¶
- property end: int¶
- id: str¶
- relations_from_me: Tuple[pymedextcore.brat_parser.Relation, ...]¶
- relations_to_me: Tuple[pymedextcore.brat_parser.Relation, ...]¶
- span: Tuple[Tuple[int, int], ...]¶
- property start: int¶
- text: str¶
- type: str¶
- class pymedextcore.brat_parser.Document(entities: List[pymedextcore.brat_parser.Entity], relations: List[pymedextcore.brat_parser.Relation], attributes: List[pymedextcore.brat_parser.Attribute])[source]¶
Bases:
object
- attributes: List[pymedextcore.brat_parser.Attribute]¶
- entities: List[pymedextcore.brat_parser.Entity]¶
- relations: List[pymedextcore.brat_parser.Relation]¶
- class pymedextcore.brat_parser.Entity(id: str, type: str, span: Tuple[Tuple[int, int], ...], text: str)[source]¶
Bases:
object
A simple annotation data structure.
- property end: int¶
- id: str¶
- span: Tuple[Tuple[int, int], ...]¶
- property start: int¶
- text: str¶
- type: str¶
- class pymedextcore.brat_parser.Grouping(id: str, type: str, items: List[pymedextcore.brat_parser.Entity])[source]¶
Bases:
object
- id: str¶
- items: List[pymedextcore.brat_parser.Entity]¶
- property text¶
- type: str¶
- class pymedextcore.brat_parser.Relation(id: str, type: str, subj: str, obj: str)[source]¶
Bases:
object
A simple relation data structure.
- id: str¶
- obj: str¶
- subj: str¶
- type: str¶
- pymedextcore.brat_parser.get_augmented_entities(ann_path: str) → Dict[str, pymedextcore.brat_parser.AugmentedEntity][source]¶
- pymedextcore.brat_parser.get_entities_relations_attributes_groups(ann_path: str) → Tuple[Dict[str, pymedextcore.brat_parser.Entity], Dict[str, pymedextcore.brat_parser.Relation], Dict[str, pymedextcore.brat_parser.Attribute], Dict[str, pymedextcore.brat_parser.Grouping]][source]¶
- pymedextcore.brat_parser.parse(ann_path: str) → pymedextcore.brat_parser.Document[source]¶
- pymedextcore.brat_parser.parse_attribute(attribute_id: str, attribute_content: str) → pymedextcore.brat_parser.Attribute[source]¶
Parse the annotation string into an Attribute structure.
Attribute_id : str The attribute ID in the annotation. (`A1 ` for example)
Attribute_content : str The attribute text content. (Tense T19 Past-Ended for example)
Attribute An Attribute object
- pymedextcore.brat_parser.parse_entity(tag_id: str, tag_content: str) → pymedextcore.brat_parser.Entity[source]¶
Parse the entity string into an Entity structure.
tag_id : str The Tag ID in the annotation. (`T12 ` for example)
tag_content : str The tag text content. (Temporal-Modifier 116 126 history of for example)
Entity An Entity object
- pymedextcore.brat_parser.parse_relation(relation_id: str, relation_content: str) → pymedextcore.brat_parser.Relation[source]¶
Parse the annotation string into a Relation structure.
relation_id : str The Relation ID in the annotation. (`R12 ` for example)
relation_content : str The relation text content. (`Modified-By Arg1:T8 Arg2:T6 ` for example)
Relation A Relation object
- pymedextcore.brat_parser.parse_string(annotation_string: str) → pymedextcore.brat_parser.Document[source]¶
- pymedextcore.brat_parser.parse_string_to_augmented_entities(annotation_string: str) → Dict[str, pymedextcore.brat_parser.AugmentedEntity][source]¶
- pymedextcore.brat_parser.read_file_annotations(ann: str) → Tuple[List[pymedextcore.brat_parser.Entity], List[pymedextcore.brat_parser.Relation], List[pymedextcore.brat_parser.Attribute]][source]¶
Read an annotation file and get the Entities and Relations in it.
ann : str The path to the annotation file to be processed.
Tuple[Set[Entity], Set[Relation], Set[Attribute]] A tuple of sets of Entities, Relations, and Attributes.
pymedextcore.brattransform module¶
Created 2020/04/14
@author: David BAUDOIN
fonction : creation ou update d’un fichier BRAT a partir d’un dic pymedext
- class pymedextcore.brattransform.brat[source]¶
Bases:
pymedextcore.datatransform.DataTransform
- static load_from_brat(ann_file: str, txt_file: Optional[str] = None) → pymedextcore.document.Document[source]¶
Load annotations from a .ann file in the Brat format
- Parameters
ann_file – path to the .ann file
txt_file – path to the corresponding .txt file, if None: defaults to replacing .ann by .txt
- Returns
Document
- Return type
- save_to_brat(folder_path: Optional[str] = None, pym_ann_types: Optional[List[str]] = None, brat_entities_in_pym_types: Optional[List[str]] = None, brat_entities_in_pym_types_value: Optional[List[str]] = None, brat_entities_in_pym_att_values: Optional[dict] = None, brat_entities_in_pym_att_keys: Optional[dict] = None, brat_attributes: Optional[dict] = None, pym_rel_types: Optional[List[str]] = None, brat_ents_of_rel_in_pym_rel_type: Optional[List[str]] = None, brat_ents_of_rel_in_pym_ent_value: Optional[List[str]] = None, brat_ents_of_rel_in_pym_att_values: Optional[dict] = None, brat_type_of_rel_in_pym_rel_types: Optional[List[str]] = None, brat_type_of_rel_in_pym_rel_att_values: Optional[dict] = None, level_annot: Optional[dict] = None)[source]¶
This function will write all Annotations in Brat files at file_path. It will create (or overwrite) 2 files for each pymedext Documents in documents list input:
ID.ann: Brat annotation file (with ID = dic_pymedext.id)
ID.txt: Raw text of the document (with ID = dic_pymedext.id)
It will create (or overwrite) an annotation.conf file.
- param list_of_documents
List of Documents input. Documents should contain same type of annotations
- param folder_path
path in string format. It will store files at this location. Folder needs to be created.
For the other paramters, the extract of this pymedext document will be used in the examples, for a better understanding.
‘’’
- {‘type’: ‘QuickUMLS’,
‘value’: ‘oesophagite’, ‘ngram’: None, ‘span’: (188, 199), ‘source’: ‘QuickUMLS:v1’, ‘source_ID’: ‘6814e9fa-96f7-11eb-a8c8-0242ac110002’, ‘isEntity’: False, ‘attributes’: {‘hypothesis’: ‘certain’,
‘context’: ‘patient’, ‘negation’: ‘aff’, ‘cui’: ‘C0014868’, ‘label’: ‘oesophagite’, ‘semtypes’: [‘T047’], ‘score’: 1.0, ‘snippet’: ‘ La fibroscopie oeso-gastro-duodénale avait révélé une oesophagite peptique de grade II et a permis l’exérèse d’un petit papillome du tiers supérieur de l’œsophage’, ‘snippet_span’: (132, 296)},
- ‘ID’: ‘681c2d82-96f7-11eb-a8c8-0242ac110002’},
- {‘type’: ‘regex’,
‘value’: ‘grade II’, ‘ngram’: None, ‘span’: (212, 220), ‘source’: ‘RegexMatcher:v1’, ‘source_ID’: ‘68155570-96f7-11eb-a8c8-0242ac110002’, ‘isEntity’: True, ‘attributes’: {‘version’: ‘v1’,
‘label’: ‘Grade’, ‘id_regexp’: ‘id_grade’, ‘snippet’: ‘-gastro-duodénale avait révélé une oesophagite peptique de grade II et a permis l’exérèse d’un petit papillome du tiers supérie’, ‘hypothesis’: ‘certain’, ‘context’: ‘patient’, ‘negation’: ‘aff’},
‘ID’: ‘682ca3ec-96f7-11eb-a8c8-0242ac110002’},
‘’’
annotations : :param pym_ann_types: Pymedext types of annotation selected. exemple : [‘QuickUMLS’, ‘regex’] -> annotations in Brat will be about this two types of annotations. Depending on the different opitons filled (explained below), different labels will be displayed in brat.
:param brat_entities_in_pym_types : (optional) if brat entities correpond to annotation types in pymedext, this list should be filled. exemple : [‘regex’] -> in brat, for each regex found, ‘regex’ will be displayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘regex’.
:param brat_entities_in_pym_types_value : if brat entities correpond to the value of annotation types in pymedext, this list should be filled. exemple : [‘QuickUMLS’] -> in brat, for each QuickUMLS found, the quickumls annotation value will be displayed. With the extract given ‘oesophagite’ will be highlighted in the text with the label ‘QuickUMLS’.
:param brat_entities_in_pym_att_values : (optional) if brat entities correspond to annotation attributes values in pymexdext, this dict should be filled. Keys correponds to pymedext annotation type, values correspond to pymedext attributes keys. exemple : {‘regex’: ‘label’} -> in brat for each regex found, the regex label in attributes will be displayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘Grade’.
:param brat_entities_in_pym_att_keys : (optional) if brat entities correspond to annotation attributes keys in pymedext, this dict should be filled. Keys correponds to pymedext annotation type, values correspond to pymedext attributes keys. exemple : {‘regex’: ‘label’} -> in brat, for each regex found, the string “label” will be diplayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘label’.
- param brat_attributes
(optional) Dict with pymedext annotation type as keys, and the correspondant attributes list that should be exported as Brat attributes.
exemple : {“QuickUMLS”: [‘hypothesis’, ‘negation’, ‘context’] -> for each quickumls found, hypothesis, negation and context attribute values will be displayed. Put “all” as value if you want all the attributes for this annotation type exemple :{“QuickUMLS”: “all”} for each QuickUMLS found, all attributes (semType, CUI code, hypothesis,… will be displayed.)
relations : :param pym_rel_types: Pymedext types of relation selected. exemple : [‘Stanza’] -> relations in Brat will be about this two types of relations. Depending on the different opitons filled (explained below), different labels will be displayed in brat.
:param brat_ents_of_rel_in_pym_rel_type : (optional) if brat entities of relations correpond to relations types in pymedext, this list should be filled.
:param brat_ents_of_rel_in_pym_ent_value : (optional) if brat entities of relations correpond to relations types in pymedext, this list should be filled.
- return
1
pymedextcore.connector module¶
- class pymedextcore.connector.APIConnector(baseurl: str, username: str, password: str)[source]¶
Bases:
pymedextcore.connector._Router
Largely inspired of https://github.com/doccano/doccano-client.git work
Pour l’instant copy de la classe DoccanoClient dans doccano_api_client.py :
TODO: investigate alternatives to plaintext login
- Args:
baseurl (str): The baseurl of a Doccano instance. username (str): The Doccano username to use for the client session. password (str): The respective username’s password.
- Returns:
An authorized client instance.
- class pymedextcore.connector.Connector[source]¶
Bases:
object
TODO : make this an abstract class for other connector
- class pymedextcore.connector.DatabaseConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶
Bases:
object
Abstract class specialize in database connection
- class pymedextcore.connector.PostGresConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶
Bases:
pymedextcore.connector.DatabaseConnector
Abstract Connector to a Postgres Database
- class pymedextcore.connector.SSHConnector(scp_host, scp_user, scp_password)[source]¶
Bases:
object
TODO: implement a connection to a server with paramiko, should also extend Connector
- class pymedextcore.connector.SimpleAPIConnector(host)[source]¶
Bases:
object
TODO: implement a connection to a server with paramiko, should also extend Connector @David?
- class pymedextcore.connector.cxORacleConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶
Bases:
pymedextcore.connector.DatabaseConnector
Abstact connector to an Oracle database using cxOracle
pymedextcore.datatransform module¶
Each class which transform pymedext Document to another format must herit from the DataTransform
TODO: put some function such as save and load as mandatory to ease the use of DataTransform object
pymedextcore.doccanoannotator module¶
pymedextcore.doccanodocument module¶
- class pymedextcore.doccanodocument.DoccanoDocument[source]¶
Bases:
object
DoccanoDocument is used to build an evaluation document, that will be sent to Doccano interface. DoccanoDocument contains a set of specific DoccanoAnnotation objects that a user want to evaluate.
pymedextcore.doccanosource module¶
- class pymedextcore.doccanosource.DoccanoSource(baseurl, username, password)[source]¶
Bases:
pymedextcore.source.Source
,pymedextcore.connector.APIConnector
Connection to DoccanoClient
This code is largely inspired of https://github.com/doccano/doccano-client.git work
- create_label(project_id: str, label_name: str, color: str, prefix: str, suffix: str) → requests.models.Response[source]¶
Adds a label to an existing project :param self: DoccanoClient :param project_id: the project id :param label_name: the text of the label :return:
- create_project(name: str, description: str, project_type: str, guidelines: str) → requests.models.Response[source]¶
Creats a project :param name: name of the project :param description: description of the project :param project_type: type of project (“SequenceLabeling”, “DocumentClassification” or “Seq2seq” :return:
- find_project_id(regex: str, date: str, time: str)[source]¶
Finds project id with a item (specific to scanner-covid project). If item is not enough to find the project id, date and time can be used. :param regex: item of interest :param date: date of the project :param time: time of the project :return: a project id
- get_annotation_detail(project_id: int, doc_id: int, annotation_id: int) → requests.models.Response[source]¶
- get_annotation_list(project_id: int, doc_id: int) → requests.models.Response[source]¶
Gets a list of annotations in a given project and document.
- Args:
project_id (int): A project ID to query. doc_id (int): A document ID to query.
- Returns
requests.models.Response: The request response.
- get_document_detail(project_id: int, doc_id: int) → requests.models.Response[source]¶
Gets details of a given document.
- Args:
project_id (int): A project ID to query. doc_id (int): A document ID to query.
- Returns
requests.models.Response: The request response.
- get_document_list(project_id: int, url_parameters: dict = {}) → requests.models.Response[source]¶
Gets a list of documents in a project.
- Args:
project_id (int): url_parameters (dict): limit and offset
- Returns
requests.models.Response: The request response.
- get_features() → requests.models.Response[source]¶
Gets features.
- Returns
requests.models.Response: The request response.
- get_label_detail(project_id: int, label_id: int) → requests.models.Response[source]¶
Gets details of a specific label.
- Args:
project_id (int): A project ID to query. label_id (int): A label ID to query.
- Returns
requests.models.Response: The request response.
- get_label_id(project_id: int, label_name: str)[source]¶
Get the label id with the label name :param project_id: id of the project :param label_name: text of the label :return: id of the label
- get_label_list(project_id: int) → requests.models.Response[source]¶
Gets a list of labels in a given project.
- Args:
project_id (int): A project ID to query.
- Returns
requests.models.Response: The request response.
- get_me() → requests.models.Response[source]¶
Gets this account information. :return: requests.models.Response: The request response.
- get_project_detail(project_id: int) → requests.models.Response[source]¶
Gets details of a specific project.
- Args:
project_id (int): A project ID to query.
- Returns
requests.models.Response: The request response.
- get_project_id(project_name: str)[source]¶
Get the project id with the project name :param self: :param project_name: :return: the project id
- get_project_list() → requests.models.Response[source]¶
Gets projects list.
- Return:requests.models.Response
The request response.
- get_project_statistics(project_id: int) → requests.models.Response[source]¶
Gets project statistics.
- Args:
project_id (int): A project ID to query.
- Returns
requests.models.Response: The request response.
- get_rolemapping_detail(project_id: int, rolemapping_id: int) → requests.models.Response[source]¶
Currently broken!
- get_roles() → requests.models.Response[source]¶
Gets available Doccano user roles.
- Returns
requests.models.Response: The request response.
- get_user_id(username: str)[source]¶
Get the user id with the username :param self: :param username: :return: the userid
- get_user_list() → requests.models.Response[source]¶
Gets user list.
- Returns
requests.models.Response: The request response.
- post_doc_upload(project_id: int, file_format: str, file_name: str, file_path: str = './') → requests.models.Response[source]¶
Uploads a file to a Doccano project.
- Args:
project_id (int): The project id number. file_format (str): The file format, ex: plain, json, or conll. file_name (str): The name of the file. file_path (str): The parent path of the file. Defaults to ./.
- Returns
requests.models.Response: The request response.
- set_rolemapping_list(project_id: str, user_id: str, role_id: str, username: str, rolename: str) → requests.models.Response[source]¶
Set users roles :param self: DoccanoClient :param project_id: :param user_id: :param role_id: :param username: :param rolename: :return: requests.models.Response: The request response.
pymedextcore.doccanotransform module¶
- class pymedextcore.doccanotransform.Doccano[source]¶
Bases:
pymedextcore.datatransform.DataTransform
This class defines a set of transformation methods to build a DoccanoDocument with several pymedext Document objects. A doccanoDocument contains N doccanoAnnotations, that the user want to evaluate in Doccano interface.
Here the transoformation methods are specific to scanner report extractions, and DrWH negation, hypothesis and family context detections. Other transformation methods could be defined according to what the user want to evaluate.
- DoccanoEvalClass(dict_doccano, dictClasses, number_eval, path_to_doc)[source]¶
Adds doccano annotations to DoccanoDocument object until a specified number of evaluations for both classes.
- Parameters
dict_doccano – A doccano dict that will be filled until the two classes reach the desired number of annotations
dictClasses – A dict of doccano classes (ex : negatif vs non negatif) with their current occurences.
number_eval – the number of evaluations desired
- Returns
A list with the modified Doccano Object, and a dict of annotations classes, with their number
- Return type
list
- DoccanoEvalN(dict_doccano, number_annoted, number_eval, path_to_doc)[source]¶
Adds DoccanoAnnotation objects to DoccaDocument until a specified number of evaluations.
- Parameters
dict_doccano – A doccano dict that will be filled until it reachs the desired number of annotations.
number_annoted – the number of evaluations desired.
number_eval – the current number of annotation.
- Returns
A list with the modified DoccanoDocument object, and the number of annotations
- Return type
list
- DoccanoEvalRappel(dict_doccano, path_to_doc)[source]¶
Adds DoccanoAnnotation objects to a DoccanoDocument object, with a dict created with toDoccanoImaRappel
- Parameters
dict_doccano – a dict created with toDoccanoImaRappel method
path_to_doc – the path of the pymedext doc that was used to create dict_doccano
- Returns
DoccanoDocument object
- Return type
- docForDoccano()[source]¶
Creats a DoccanoDoc object with dict in input
- Returns
DoccanoDocument object
:rtype : DoccanoDocument
- toDoccanoDrWH(type, segment)[source]¶
Specific method for DrWH evaluation Selects the drwh annotation and their value and creat a dict with syntagm/sentence as key and class value of the syntagm/sentence as value.
Ex : Extract of a pymedextDocument
- {
“type”: “drwh_syntagms”, “value”: ” Le patient présente un diabète de type II”, “span”: [
47, 91
], “source”: “DRWH_syntagms.v1”, “source_ID”: “74633e84-80a3-11ea-a7f6-180f76073bf2”, “isEntity”: false, “attributes”: null, “id”: “74633e89-80a3-11ea-a7f6-180f76073bf2”
- {
“type”: “drwh_negation”, “value”: “non negatif”, “span”: [
47, 91
], “source”: “DRWH_negation.v1”, “source_ID”: “74633e89-80a3-11ea-a7f6-180f76073bf2”, “isEntity”: false, “attributes”: null, “id”: “74633e95-80a3-11ea-a7f6-180f76073bf2”
An extract of pymedextDocument.toDoccanoDrWH(type=dwh_negation, segment=dwh_syntagm) return will be:
{…,”Le patient présente un diabète de type II”=”non negatif”…,}
- Parameters
type – dwh type of class (“dwh_negation” or “dwh_hypothesis” or “dwh_family”)
segment – syntagm or sentence (“dwh_sentence” or “dwh_syntagm”)
- Returns
a dict with syntagm or sentence as key and class as value
- Return type
dict
- toDoccanoImaPrecision(type, attribute=None)[source]¶
Specific method for scanner extractor evaluation Selects the value extracted of the desired item in the pymedext Document. Returns a dict with the context as key, and the value extracted as value.
Ex : Extract of a pymedextDocument : ” …
- {
“type”: “motif”, “value”: “non évocateur”, “span”: [
2026, 2039
], “source”: “annotator_section_img”, “source_ID”: “eae2fd1e-8096-11ea-9260-e470b8d2ff7c”, “isEntity”: false, “attributes”: “ISCOVID”, “id”: “eae2fd1c-8096-11ea-b180-e470b8d2ff7c”
“
An extract of pymedextDocument.toDoccanoImaPrecision(type=”motif”, attribute=”ISCOVID”) return will be :
{…,<raw_text[2026,2039]> : “non évocateur”,…}
where raw_text[2026,2039] is the context of the extraction, a short extract of the report text around the extraction.
- Parameters
type – the type of the item (“rubrique” or “motif”)
attribute – item of interest (ex : “ISCOVID”)
- Returns
a dict with litteral context as key and value extracted as value
- Return type
dict
- toDoccanoImaRappel(dict_regexp_type, value, span)[source]¶
Specific method for scanner extractor evaluation Founds the absent item in a pymedext Document.
- Parameters
dict_regexp_type – a dict of with the item as key (ex: “ISCOVID”) and their type as value (“motif” or “rubrique”)
value – “Null”
- Returns
A dict with the report text as key and a list of absent item as value
- Return type
dict
- toDoccanoPb(path_to_doc, regexp)[source]¶
Specific method for scanner extractor evaluation A specific format to display documents that were annoted with label “problem” in Doccano
- Parameters
path_to_doc – the path to the doc that was annoted with problem label in Doccano
regexp – the item of interest
- Returns
a dict with the text report as key and the regex as value
pymedextcore.document module¶
- class pymedextcore.document.Document(raw_text, ID=None, attributes=None, source=None, pathToconfig=None, documentDate=None)[source]¶
Bases:
object
Document is the main class of pymedext. It is use to load file and annotate them with annotators
- annotate(annotator)[source]¶
Main function to annotate Document
- Parameters
annotator – annotators list
- Returns
run _annotate which add annotations to Document
- Return type
- static from_dict(d)[source]¶
Create a Document from a dict of document (as created using to_dict) :param d: Dict :returns: Document :rtype: Document
- get_annotations(_type, source_id=None, target_id=None, attributes=None, value=None, span=None)[source]¶
returns an annotations of a specific type from source. Can filter from type, source_id or target_id, span, source_id, attributes and value. :param _type: annotation type :param source_id: annotation source id :param target_id: annotation target id :param attributes: :param value: :param span: :return:
- get_relations(_type=None, head_id=None, target_id=None)[source]¶
returns relations of a specific type from source. Can filter from type, head_id or target_id. :param _type: annotation type :param head_id: annotation source id :param target_id: annotation target id :return:
- load_annotations_files(pathToconfig)[source]¶
Transform json Pymedext to Document
- Parameters
pathToconfig – list of path to json files,
- Returns
add annotations to Document
- Return type
- to_dict()[source]¶
transform Document to dict PyMedExt TODO: Need to add the Document Date if available, the processing date, the annotators used
- Returns
json PyMedExt
- Return type
dict
pymedextcore.ncbisource module¶
- class pymedextcore.ncbisource.PubTatorSource(host='https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/')[source]¶
Bases:
pymedextcore.source.Source
,pymedextcore.connector.SimpleAPIConnector
Connection to PubTator api currently https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/
- get_pmids_annotations(pmid_list, Bioconcept='', returnFormat=0)[source]¶
Return a set of pmid articles from PubTator
- Parameters
pmid_list – a list of articles pmid
Bioconcept – Default (leave it blank) includes all bioconcepts. Otherwise, user can choose
gene, disease, chemical, species, proteinmutation, dnamutation, snp, and cellline. :param returnFormat: 0 return a PyMedExt Document, 1 Return a Bioc Document :returns: PyMedext Document or Bioc Document :rtype:
pymedextcore.omopsource module¶
- class pymedextcore.omopsource.OmopSource(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶
Bases:
pymedextcore.source.Source
,pymedextcore.connector.PostGresConnector
Connection to a POstgres Ommop source
pymedextcore.omoptransform module¶
- class pymedextcore.omoptransform.omop[source]¶
Bases:
pymedextcore.datatransform.DataTransform
- buildNoteNlP(dict_note, note_id, note_nlp_id, nlp_workflow, thisTime, filterType, dataframe=False)[source]¶
- generateNote(to_omop_note, to_date, note_event_id, note_event_field_concept_id, note_type_concept_id, note_class_concept_id, note_title, encoding_concept_id, language_concept_id, provider_id, visit_detail_id, note_source_value)[source]¶
- generateNoteNLP(to_omop_nlp, to_date, note_nlp_id, note_id, section_concept_id, note_nlp_concept_id, note_nlp_source_concept_id, nlp_workflow, term_exist, entity)[source]¶
- generatePerson(to_omop_person, gender_concept_id, year_of_birth, month_of_birth, day_of_birth, birth_datetime, death_datetime, race_concept_id, ethnicity_concept_id, location_id, provider_id, care_site_id, person_source_value, gender_source_value, gender_source_concept_id, race_source_value, race_source_concept_id, ethnicity_source_value, ethnicity_source_concept_id)[source]¶