pymedextcore package¶

Submodules¶

pymedextcore.annotators module¶

class pymedextcore.annotators.Annotation(type: str, value: str, source: str, source_ID: str, span: Optional[Tuple[int, int]] = None, attributes: Optional[Dict] = None, isEntity: bool = False, ID: Optional[str] = None, ngram: Optional[str] = None)[source]¶

Bases: object

Based object which contains Annotation. Each Annotator must return a list of Annotations.

add_child(child)[source]¶

Add a child to current Annotation

Parameters: child – An annotation to set as child of current node
Returns: None
Return type: None

add_property(neighbor)[source]¶

add property of a neighbor to current annnotation, if both have the same span

Parameters: neighbor – the Annotation neighbor to add the same property
Returns: None
Return type: None

get_attributes()[source]¶

get Attributes from current and parents Node

Returns: attributes
Return type: a dict

get_children_span()[source]¶

from current node, will return all children span

Returns: tuple of span
Return type: list of tuple

get_entities_children()[source]¶

From current Node, return all children which are Annotation where isEntity =True entities

Returns: children list
Return type: list

get_ngram()[source]¶

get nGram of the current Annotation

Returns: raw ngram
Return type: string

get_parent(from_type)[source]¶

return closest parent of the current Annotation of a specific type

Parameters: from_type – specific type to found
Returns: Annotation of a specific type
Return type: Annotation

get_parents_properties()[source]¶

return parent properties of current annotations if it’s belong to a specific type

Parameters: filter_type – list of Annotations types
Returns: list of current and parents Annotation properties
Return type: list of dict

get_properties()[source]¶

return current node Properties if the Annotation is from a specific type

Parameters: filter_type – list of Annotations type
Returns: properties
Return type: list of dictionnary

get_span()[source]¶

return current Annotation span

Returns: span(start,end)
Return type: tuple

set_ngram()[source]¶

set nGram of the current Annotation

Returns: 1
Return type: int

set_parent(parent)[source]¶

set Parent to current Annotation

Parameters: parent – Annotation
Returns: 1
Return type: int

set_root(root)[source]¶

set Root to current Annotation

Parameters: root – Annotation
Returns: 1
Return type: int

to_dict()[source]¶: Transform Annotation to a dict object :returns: dict :rtype: dict

to_json()[source]¶

Tranform Annotation to json

Returns: json
Return type: json

class pymedextcore.annotators.Annotator[source]¶

Bases: object

Abstract class of each Annotator. For that purpose an Annotator must implement the function annotate_function(). This function return a list of Annotations object.

annotate_function(_input)[source]¶

main annotation function each Annotator must implement this function

Parameters: _input – a list of Annotation typet
Returns: a list of annotations. they will be added to Document.annotations
Return type: List[Annotation]

get_all_key_input(_input)[source]¶

returns all key input for the Annotator

param _input

return all annotations of a specific types from the Document

returns

a list of annotations

rtype

a list of annotation

Deprecated since version 0.3: This function will be removed soon use instead select_all_inputs

get_first_key_input(_input)[source]¶

get_first_key_input, return the annotation type [0].

param _input

list of annotations input for the Annotator

returns

a list of annotations

rtype

a list of annotations

Deprecated since version 0.3: This function will be removed soon use instead select_first_input

get_key_input(_input, i)[source]¶: return a specific annotations type from key_input :param _input: key_input list :param i: the indice of the list to selecy :returns:a list of annotations :rtype:a list of annotation

select_all_inputs(_input)[source]¶

returns all key input for the Annotator

param _input

return all annotations of a specific types from the Document

returns

a list of annotations

rtype

a list of annotation

New in version 0.3: This function replaced get_all_key_input

select_first_input(_input)[source]¶

return the first annotation from _input key list,

param _input

list of annotations input for the Annotator

returns

a list of annotations

rtype

a list of annotations

New in version 0.3: This function will replace get_first_key_input

class pymedextcore.annotators.Relation(type: str, head: str, target: str, source: str, source_ID: str, attributes: Optional[List] = None, ID: Optional[str] = None)[source]¶

Bases: object

Based object which contains Relation

to_dict()[source]¶

Transform Relation to a dict object

Returns: dict
Return type: dict

to_json()[source]¶

Tranform Relation to json

Returns: json
Return type: json

pymedextcore.bioctransform module¶

class pymedextcore.bioctransform.BioC[source]¶

Bases: pymedextcore.datatransform.DataTransform

static load_collection(bioc_input: str, format: int = 0, is_file: bool = True)[source]¶

load a bioc collection xml or json. It will return a list of Document object.

Parameters

bioc_input – a str path to a bioc file or a bioc input string
format – xml or to_json type of the bioc file
is_file – if True bioc_input is path else it is a string

Returns

list of Document

static save_as_collection(list_of_pymedext_documents: List[pymedextcore.document.Document])[source]¶

save a list of pymedext document as a bioc collection . It will return a bioc collection object.

Parameters: list_of_pymedext_documents – a list of Document
Returns: a bioc collection object

static write_bioc_collection(filename: str, collection: bioc.bioc.BioCCollection)[source]¶

write a BiocCollection as an xml document It will return 1

Parameters

filename – a str filename of the collection
collection – a bioc collection

Returns

1

pymedextcore.brat_parser module¶

class pymedextcore.brat_parser.Attribute(id: str, type: str, target: str, values: Tuple[str, ...] = ())[source]¶

Bases: object

A simple attribute data structure.

id: str¶

target: str¶

type: str¶

values: Tuple[str, ...] = ()¶

class pymedextcore.brat_parser.AugmentedEntity(id: str, type: str, span: Tuple[Tuple[int, int], ...], text: str, relations_from_me: Tuple[pymedextcore.brat_parser.Relation, ...], relations_to_me: Tuple[pymedextcore.brat_parser.Relation, ...], attributes: Tuple[pymedextcore.brat_parser.Attribute, ...])[source]¶

Bases: object

An augmented entity data structure with its relations and attributes.

attributes: Tuple[pymedextcore.brat_parser.Attribute, ...]¶

property end: int¶

id: str¶

relations_from_me: Tuple[pymedextcore.brat_parser.Relation, ...]¶

relations_to_me: Tuple[pymedextcore.brat_parser.Relation, ...]¶

span: Tuple[Tuple[int, int], ...]¶

property start: int¶

text: str¶

type: str¶

class pymedextcore.brat_parser.Document(entities: List[pymedextcore.brat_parser.Entity], relations: List[pymedextcore.brat_parser.Relation], attributes: List[pymedextcore.brat_parser.Attribute])[source]¶

Bases: object

attributes: List[pymedextcore.brat_parser.Attribute]¶

entities: List[pymedextcore.brat_parser.Entity]¶

relations: List[pymedextcore.brat_parser.Relation]¶

class pymedextcore.brat_parser.Entity(id: str, type: str, span: Tuple[Tuple[int, int], ...], text: str)[source]¶

Bases: object

A simple annotation data structure.

property end: int¶

id: str¶

span: Tuple[Tuple[int, int], ...]¶

property start: int¶

text: str¶

type: str¶

class pymedextcore.brat_parser.Grouping(id: str, type: str, items: List[pymedextcore.brat_parser.Entity])[source]¶

Bases: object

id: str¶

items: List[pymedextcore.brat_parser.Entity]¶

property text¶

type: str¶

class pymedextcore.brat_parser.Relation(id: str, type: str, subj: str, obj: str)[source]¶

Bases: object

A simple relation data structure.

id: str¶

obj: str¶

subj: str¶

type: str¶

pymedextcore.brat_parser.get_augmented_entities(ann_path: str) → Dict[str, pymedextcore.brat_parser.AugmentedEntity][source]¶

pymedextcore.brat_parser.get_entities_relations_attributes_groups(ann_path: str) → Tuple[Dict[str, pymedextcore.brat_parser.Entity], Dict[str, pymedextcore.brat_parser.Relation], Dict[str, pymedextcore.brat_parser.Attribute], Dict[str, pymedextcore.brat_parser.Grouping]][source]¶

pymedextcore.brat_parser.list_to_dict(s: List) → Dict[source]¶

pymedextcore.brat_parser.parse(ann_path: str) → pymedextcore.brat_parser.Document [source]¶

pymedextcore.brat_parser.parse_attribute(attribute_id: str, attribute_content: str) → pymedextcore.brat_parser.Attribute [source]¶

Parse the annotation string into an Attribute structure.

Attribute_id : str The attribute ID in the annotation. (`A1 ` for example)
Attribute_content : str The attribute text content. (Tense T19 Past-Ended for example)

Attribute An Attribute object

pymedextcore.brat_parser.parse_entity(tag_id: str, tag_content: str) → pymedextcore.brat_parser.Entity [source]¶

Parse the entity string into an Entity structure.

tag_id : str The Tag ID in the annotation. (`T12 ` for example)
tag_content : str The tag text content. (Temporal-Modifier 116 126 history of for example)

Entity An Entity object

pymedextcore.brat_parser.parse_relation(relation_id: str, relation_content: str) → pymedextcore.brat_parser.Relation [source]¶

Parse the annotation string into a Relation structure.

relation_id : str The Relation ID in the annotation. (`R12 ` for example)
relation_content : str The relation text content. (`Modified-By Arg1:T8 Arg2:T6 ` for example)

Relation A Relation object

pymedextcore.brat_parser.parse_string(annotation_string: str) → pymedextcore.brat_parser.Document [source]¶

pymedextcore.brat_parser.parse_string_to_augmented_entities(annotation_string: str) → Dict[str, pymedextcore.brat_parser.AugmentedEntity][source]¶

pymedextcore.brat_parser.read_file_annotations(ann: str) → Tuple[List[pymedextcore.brat_parser.Entity], List[pymedextcore.brat_parser.Relation], List[pymedextcore.brat_parser.Attribute]][source]¶

Read an annotation file and get the Entities and Relations in it.

ann : str The path to the annotation file to be processed.

Tuple[Set[Entity], Set[Relation], Set[Attribute]] A tuple of sets of Entities, Relations, and Attributes.

pymedextcore.brat_parser.remove_empty(iterable: Sequence[str]) → Sequence[str][source]¶

Returns only non-empty strings from an iterable.

iterable : Iterable An iterable of strings that possibly contains empty strings.

The same iterable with the empty strings removed.

pymedextcore.brat_parser.sanitize_tabs(line: str, max_tabs: int = 2) → str[source]¶

pymedextcore.brattransform module¶

Created 2020/04/14

@author: David BAUDOIN

fonction : creation ou update d’un fichier BRAT a partir d’un dic pymedext

class pymedextcore.brattransform.brat[source]¶

Bases: pymedextcore.datatransform.DataTransform

static load_from_brat(ann_file: str, txt_file: Optional[str] = None) → pymedextcore.document.Document [source]¶

Load annotations from a .ann file in the Brat format

Parameters

ann_file – path to the .ann file
txt_file – path to the corresponding .txt file, if None: defaults to replacing .ann by .txt

Returns

Document

Return type

Document

save_to_brat(folder_path: Optional[str] = None, pym_ann_types: Optional[List[str]] = None, brat_entities_in_pym_types: Optional[List[str]] = None, brat_entities_in_pym_types_value: Optional[List[str]] = None, brat_entities_in_pym_att_values: Optional[dict] = None, brat_entities_in_pym_att_keys: Optional[dict] = None, brat_attributes: Optional[dict] = None, pym_rel_types: Optional[List[str]] = None, brat_ents_of_rel_in_pym_rel_type: Optional[List[str]] = None, brat_ents_of_rel_in_pym_ent_value: Optional[List[str]] = None, brat_ents_of_rel_in_pym_att_values: Optional[dict] = None, brat_type_of_rel_in_pym_rel_types: Optional[List[str]] = None, brat_type_of_rel_in_pym_rel_att_values: Optional[dict] = None, level_annot: Optional[dict] = None)[source]¶

This function will write all Annotations in Brat files at file_path. It will create (or overwrite) 2 files for each pymedext Documents in documents list input:

ID.ann: Brat annotation file (with ID = dic_pymedext.id)

ID.txt: Raw text of the document (with ID = dic_pymedext.id)

It will create (or overwrite) an annotation.conf file.

param list_of_documents

List of Documents input. Documents should contain same type of annotations

param folder_path

path in string format. It will store files at this location. Folder needs to be created.

For the other paramters, the extract of this pymedext document will be used in the examples, for a better understanding.

‘’’

{‘type’: ‘QuickUMLS’,
‘value’: ‘oesophagite’, ‘ngram’: None, ‘span’: (188, 199), ‘source’: ‘QuickUMLS:v1’, ‘source_ID’: ‘6814e9fa-96f7-11eb-a8c8-0242ac110002’, ‘isEntity’: False, ‘attributes’: {‘hypothesis’: ‘certain’,

‘context’: ‘patient’, ‘negation’: ‘aff’, ‘cui’: ‘C0014868’, ‘label’: ‘oesophagite’, ‘semtypes’: [‘T047’], ‘score’: 1.0, ‘snippet’: ‘ La fibroscopie oeso-gastro-duodénale avait révélé une oesophagite peptique de grade II et a permis l’exérèse d’un petit papillome du tiers supérieur de l’œsophage’, ‘snippet_span’: (132, 296)},

‘ID’: ‘681c2d82-96f7-11eb-a8c8-0242ac110002’},

{‘type’: ‘regex’,

‘value’: ‘grade II’, ‘ngram’: None, ‘span’: (212, 220), ‘source’: ‘RegexMatcher:v1’, ‘source_ID’: ‘68155570-96f7-11eb-a8c8-0242ac110002’, ‘isEntity’: True, ‘attributes’: {‘version’: ‘v1’,

‘label’: ‘Grade’, ‘id_regexp’: ‘id_grade’, ‘snippet’: ‘-gastro-duodénale avait révélé une oesophagite peptique de grade II et a permis l’exérèse d’un petit papillome du tiers supérie’, ‘hypothesis’: ‘certain’, ‘context’: ‘patient’, ‘negation’: ‘aff’},

‘ID’: ‘682ca3ec-96f7-11eb-a8c8-0242ac110002’},

‘’’

annotations : :param pym_ann_types: Pymedext types of annotation selected. exemple : [‘QuickUMLS’, ‘regex’] -> annotations in Brat will be about this two types of annotations. Depending on the different opitons filled (explained below), different labels will be displayed in brat.

:param brat_entities_in_pym_types : (optional) if brat entities correpond to annotation types in pymedext, this list should be filled. exemple : [‘regex’] -> in brat, for each regex found, ‘regex’ will be displayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘regex’.

:param brat_entities_in_pym_types_value : if brat entities correpond to the value of annotation types in pymedext, this list should be filled. exemple : [‘QuickUMLS’] -> in brat, for each QuickUMLS found, the quickumls annotation value will be displayed. With the extract given ‘oesophagite’ will be highlighted in the text with the label ‘QuickUMLS’.

:param brat_entities_in_pym_att_values : (optional) if brat entities correspond to annotation attributes values in pymexdext, this dict should be filled. Keys correponds to pymedext annotation type, values correspond to pymedext attributes keys. exemple : {‘regex’: ‘label’} -> in brat for each regex found, the regex label in attributes will be displayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘Grade’.

:param brat_entities_in_pym_att_keys : (optional) if brat entities correspond to annotation attributes keys in pymedext, this dict should be filled. Keys correponds to pymedext annotation type, values correspond to pymedext attributes keys. exemple : {‘regex’: ‘label’} -> in brat, for each regex found, the string “label” will be diplayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘label’.

param brat_attributes: (optional) Dict with pymedext annotation type as keys, and the correspondant attributes list that should be exported as Brat attributes.

exemple : {“QuickUMLS”: [‘hypothesis’, ‘negation’, ‘context’] -> for each quickumls found, hypothesis, negation and context attribute values will be displayed. Put “all” as value if you want all the attributes for this annotation type exemple :{“QuickUMLS”: “all”} for each QuickUMLS found, all attributes (semType, CUI code, hypothesis,… will be displayed.)

relations : :param pym_rel_types: Pymedext types of relation selected. exemple : [‘Stanza’] -> relations in Brat will be about this two types of relations. Depending on the different opitons filled (explained below), different labels will be displayed in brat.

:param brat_ents_of_rel_in_pym_rel_type : (optional) if brat entities of relations correpond to relations types in pymedext, this list should be filled.

:param brat_ents_of_rel_in_pym_ent_value : (optional) if brat entities of relations correpond to relations types in pymedext, this list should be filled.

return: 1

pymedextcore.connector module¶

class pymedextcore.connector.APIConnector(baseurl: str, username: str, password: str)[source]¶

Bases: pymedextcore.connector._Router

Largely inspired of https://github.com/doccano/doccano-client.git work

Pour l’instant copy de la classe DoccanoClient dans doccano_api_client.py :

TODO: investigate alternatives to plaintext login

Args:: baseurl (str): The baseurl of a Doccano instance. username (str): The Doccano username to use for the client session. password (str): The respective username’s password.
Returns:: An authorized client instance.

class pymedextcore.connector.Connector[source]¶

Bases: object

TODO : make this an abstract class for other connector

class pymedextcore.connector.DatabaseConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶

Bases: object

Abstract class specialize in database connection

start_connection()[source]¶: Abstract function where each DatabaseConnector should implement the Database connection

class pymedextcore.connector.PostGresConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶

Bases: pymedextcore.connector.DatabaseConnector

Abstract Connector to a Postgres Database

start_connection()[source]¶: Initialize the connection to the POstGresConnector :returns: 0 :rtype: 0

class pymedextcore.connector.SSHConnector(scp_host, scp_user, scp_password)[source]¶

Bases: object

TODO: implement a connection to a server with paramiko, should also extend Connector

class pymedextcore.connector.SimpleAPIConnector(host)[source]¶

Bases: object

TODO: implement a connection to a server with paramiko, should also extend Connector @David?

start_connection()[source]¶: Initialize a requests object :returns: 0 :rtype: 0

class pymedextcore.connector.cxORacleConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶

Bases: pymedextcore.connector.DatabaseConnector

Abstact connector to an Oracle database using cxOracle

pymedextcore.datatransform module¶

Each class which transform pymedext Document to another format must herit from the DataTransform

TODO: put some function such as save and load as mandatory to ease the use of DataTransform object

class pymedextcore.datatransform.DataTransform[source]¶

Bases: object

static load()[source]¶

Generic method to save another format into a PyMedExt Document

Returns: PyMedext Document

static save()[source]¶

Generic method to transform a PyMedExt Document save as another format

Returns: not a PyMedExt doc

pymedextcore.doccanoannotator module¶

class pymedextcore.doccanoannotator.DoccanoAnnotation(text, labels, meta)[source]¶

Bases: object

Annotation object specific to Doccano

to_dict()[source]¶

Transform DoccanoAnnotation to dict

Returns: a dict
Return type: dict

to_json()[source]¶

Transform DoccanoAnnotation to json

Returns: a json
Return type: json

pymedextcore.doccanodocument module¶

class pymedextcore.doccanodocument.DoccanoDocument[source]¶

Bases: object

DoccanoDocument is used to build an evaluation document, that will be sent to Doccano interface. DoccanoDocument contains a set of specific DoccanoAnnotation objects that a user want to evaluate.

toDictDoccano()[source]¶

Transform a DoccanoDocument object to a list of doccanoAnnotation dict

Returns: a list of doccanoAnnotation dict
Return type: dict

toJsonDoccano()[source]¶: Tranform a DoccanoDocument object to a json. :return: a json :rtype: json

writeJsonDoccano(pathToOutput)[source]¶

write a json file in pathToOuput path with a DoccanoDument object

Parameters: pathToOutput – output path of the file
Returns: a doccano file

pymedextcore.doccanosource module¶

class pymedextcore.doccanosource.DoccanoSource(baseurl, username, password)[source]¶

Bases: pymedextcore.source.Source, pymedextcore.connector.APIConnector

Connection to DoccanoClient

This code is largely inspired of https://github.com/doccano/doccano-client.git work

create_label(project_id: str, label_name: str, color: str, prefix: str, suffix: str) → requests.models.Response[source]¶: Adds a label to an existing project :param self: DoccanoClient :param project_id: the project id :param label_name: the text of the label :return:

create_project(name: str, description: str, project_type: str, guidelines: str) → requests.models.Response[source]¶: Creats a project :param name: name of the project :param description: description of the project :param project_type: type of project (“SequenceLabeling”, “DocumentClassification” or “Seq2seq” :return:

exp_get_doc_list(project_id: int, limit: int, offset: int) → requests.models.Response[source]¶

find_project_id(regex: str, date: str, time: str)[source]¶: Finds project id with a item (specific to scanner-covid project). If item is not enough to find the project id, date and time can be used. :param regex: item of interest :param date: date of the project :param time: time of the project :return: a project id

get_annotation_detail(project_id: int, doc_id: int, annotation_id: int) → requests.models.Response[source]¶

get_annotation_list(project_id: int, doc_id: int) → requests.models.Response[source]¶

Gets a list of annotations in a given project and document.

Args:: project_id (int): A project ID to query. doc_id (int): A document ID to query.

Returns: requests.models.Response: The request response.

get_doc_download(project_id: int, file_format: str = 'json') → requests.models.Response[source]¶

get_document_detail(project_id: int, doc_id: int) → requests.models.Response[source]¶

Gets details of a given document.

Args:: project_id (int): A project ID to query. doc_id (int): A document ID to query.

Returns: requests.models.Response: The request response.

get_document_list(project_id: int, url_parameters: dict = {}) → requests.models.Response[source]¶

Gets a list of documents in a project.

Args:: project_id (int): url_parameters (dict): limit and offset

Returns: requests.models.Response: The request response.

get_features() → requests.models.Response[source]¶

Gets features.

Returns: requests.models.Response: The request response.

get_label_detail(project_id: int, label_id: int) → requests.models.Response[source]¶

Gets details of a specific label.

Args:: project_id (int): A project ID to query. label_id (int): A label ID to query.

Returns: requests.models.Response: The request response.

get_label_id(project_id: int, label_name: str)[source]¶: Get the label id with the label name :param project_id: id of the project :param label_name: text of the label :return: id of the label

get_label_list(project_id: int) → requests.models.Response[source]¶

Gets a list of labels in a given project.

Args:: project_id (int): A project ID to query.

Returns: requests.models.Response: The request response.

get_me() → requests.models.Response[source]¶: Gets this account information. :return: requests.models.Response: The request response.

get_project_detail(project_id: int) → requests.models.Response[source]¶

Gets details of a specific project.

Args:: project_id (int): A project ID to query.

Returns: requests.models.Response: The request response.

get_project_id(project_name: str)[source]¶: Get the project id with the project name :param self: :param project_name: :return: the project id

get_project_list() → requests.models.Response[source]¶

Gets projects list.

Return:requests.models.Response: The request response.

get_project_statistics(project_id: int) → requests.models.Response[source]¶

Gets project statistics.

Args:: project_id (int): A project ID to query.

Returns: requests.models.Response: The request response.

get_rolemapping_detail(project_id: int, rolemapping_id: int) → requests.models.Response[source]¶: Currently broken!

get_rolemapping_list(project_id: int) → requests.models.Response[source]¶

get_roles() → requests.models.Response[source]¶

Gets available Doccano user roles.

Returns: requests.models.Response: The request response.

get_user_id(username: str)[source]¶: Get the user id with the username :param self: :param username: :return: the userid

get_user_list() → requests.models.Response[source]¶

Gets user list.

Returns: requests.models.Response: The request response.

post_approve_labels(project_id: int, doc_id: int) → requests.models.Response[source]¶

post_doc_upload(project_id: int, file_format: str, file_name: str, file_path: str = './') → requests.models.Response[source]¶

Uploads a file to a Doccano project.

Args:: project_id (int): The project id number. file_format (str): The file format, ex: plain, json, or conll. file_name (str): The name of the file. file_path (str): The parent path of the file. Defaults to ./.

Returns: requests.models.Response: The request response.

set_rolemapping_list(project_id: str, user_id: str, role_id: str, username: str, rolename: str) → requests.models.Response[source]¶: Set users roles :param self: DoccanoClient :param project_id: :param user_id: :param role_id: :param username: :param rolename: :return: requests.models.Response: The request response.

pymedextcore.doccanotransform module¶

class pymedextcore.doccanotransform.Doccano[source]¶

Bases: pymedextcore.datatransform.DataTransform

This class defines a set of transformation methods to build a DoccanoDocument with several pymedext Document objects. A doccanoDocument contains N doccanoAnnotations, that the user want to evaluate in Doccano interface.

Here the transoformation methods are specific to scanner report extractions, and DrWH negation, hypothesis and family context detections. Other transformation methods could be defined according to what the user want to evaluate.

DoccanoEvalClass(dict_doccano, dictClasses, number_eval, path_to_doc)[source]¶

Adds doccano annotations to DoccanoDocument object until a specified number of evaluations for both classes.

Parameters

dict_doccano – A doccano dict that will be filled until the two classes reach the desired number of annotations
dictClasses – A dict of doccano classes (ex : negatif vs non negatif) with their current occurences.
number_eval – the number of evaluations desired

Returns

A list with the modified Doccano Object, and a dict of annotations classes, with their number

Return type

list

DoccanoEvalN(dict_doccano, number_annoted, number_eval, path_to_doc)[source]¶

Adds DoccanoAnnotation objects to DoccaDocument until a specified number of evaluations.

Parameters

dict_doccano – A doccano dict that will be filled until it reachs the desired number of annotations.
number_annoted – the number of evaluations desired.
number_eval – the current number of annotation.

Returns

A list with the modified DoccanoDocument object, and the number of annotations

Return type

list

DoccanoEvalRappel(dict_doccano, path_to_doc)[source]¶

Adds DoccanoAnnotation objects to a DoccanoDocument object, with a dict created with toDoccanoImaRappel

Parameters

dict_doccano – a dict created with toDoccanoImaRappel method
path_to_doc – the path of the pymedext doc that was used to create dict_doccano

Returns

DoccanoDocument object

Return type

DoccanoDocument

docForDoccano()[source]¶

Creats a DoccanoDoc object with dict in input

Returns: DoccanoDocument object

:rtype : DoccanoDocument

toDoccanoDrWH(type, segment)[source]¶

Specific method for DrWH evaluation Selects the drwh annotation and their value and creat a dict with syntagm/sentence as key and class value of the syntagm/sentence as value.

Ex : Extract of a pymedextDocument

{

“type”: “drwh_syntagms”, “value”: ” Le patient présente un diabète de type II”, “span”: [

47, 91

], “source”: “DRWH_syntagms.v1”, “source_ID”: “74633e84-80a3-11ea-a7f6-180f76073bf2”, “isEntity”: false, “attributes”: null, “id”: “74633e89-80a3-11ea-a7f6-180f76073bf2”

{

“type”: “drwh_negation”, “value”: “non negatif”, “span”: [

47, 91

], “source”: “DRWH_negation.v1”, “source_ID”: “74633e89-80a3-11ea-a7f6-180f76073bf2”, “isEntity”: false, “attributes”: null, “id”: “74633e95-80a3-11ea-a7f6-180f76073bf2”

An extract of pymedextDocument.toDoccanoDrWH(type=dwh_negation, segment=dwh_syntagm) return will be:

{…,”Le patient présente un diabète de type II”=”non negatif”…,}

Parameters

type – dwh type of class (“dwh_negation” or “dwh_hypothesis” or “dwh_family”)
segment – syntagm or sentence (“dwh_sentence” or “dwh_syntagm”)

Returns

a dict with syntagm or sentence as key and class as value

Return type

dict

toDoccanoImaPrecision(type, attribute=None)[source]¶

Specific method for scanner extractor evaluation Selects the value extracted of the desired item in the pymedext Document. Returns a dict with the context as key, and the value extracted as value.

Ex : Extract of a pymedextDocument : ” …

{

“type”: “motif”, “value”: “non évocateur”, “span”: [

2026, 2039

], “source”: “annotator_section_img”, “source_ID”: “eae2fd1e-8096-11ea-9260-e470b8d2ff7c”, “isEntity”: false, “attributes”: “ISCOVID”, “id”: “eae2fd1c-8096-11ea-b180-e470b8d2ff7c”

“

An extract of pymedextDocument.toDoccanoImaPrecision(type=”motif”, attribute=”ISCOVID”) return will be :

{…,<raw_text[2026,2039]> : “non évocateur”,…}

where raw_text[2026,2039] is the context of the extraction, a short extract of the report text around the extraction.

Parameters

type – the type of the item (“rubrique” or “motif”)
attribute – item of interest (ex : “ISCOVID”)

Returns

a dict with litteral context as key and value extracted as value

Return type

dict

toDoccanoImaRappel(dict_regexp_type, value, span)[source]¶

Specific method for scanner extractor evaluation Founds the absent item in a pymedext Document.

Parameters

dict_regexp_type – a dict of with the item as key (ex: “ISCOVID”) and their type as value (“motif” or “rubrique”)
value – “Null”

Returns

A dict with the report text as key and a list of absent item as value

Return type

dict

toDoccanoPb(path_to_doc, regexp)[source]¶

Specific method for scanner extractor evaluation A specific format to display documents that were annoted with label “problem” in Doccano

Parameters

path_to_doc – the path to the doc that was annoted with problem label in Doccano
regexp – the item of interest

Returns

a dict with the text report as key and the regex as value

pymedextcore.document module¶

class pymedextcore.document.Document(raw_text, ID=None, attributes=None, source=None, pathToconfig=None, documentDate=None)[source]¶

Bases: object

Document is the main class of pymedext. It is use to load file and annotate them with annotators

annotate(annotator)[source]¶

Main function to annotate Document

Parameters: annotator – annotators list
Returns: run _annotate which add annotations to Document
Return type: Document

static from_dict(d)[source]¶: Create a Document from a dict of document (as created using to_dict) :param d: Dict :returns: Document :rtype: Document

get_annotation_by_id(_id)[source]¶

get_annotations(_type, source_id=None, target_id=None, attributes=None, value=None, span=None)[source]¶: returns an annotations of a specific type from source. Can filter from type, source_id or target_id, span, source_id, attributes and value. :param _type: annotation type :param source_id: annotation source id :param target_id: annotation target id :param attributes: :param value: :param span: :return:

get_graph()[source]¶: return the graph associated with the raw_text :returns: :rtype:

get_relation_by_id(_id)[source]¶

get_relations(_type=None, head_id=None, target_id=None)[source]¶: returns relations of a specific type from source. Can filter from type, head_id or target_id. :param _type: annotation type :param head_id: annotation source id :param target_id: annotation target id :return:

load_annotations_files(pathToconfig)[source]¶

Transform json Pymedext to Document

Parameters: pathToconfig – list of path to json files,
Returns: add annotations to Document
Return type: Document

raw_text()[source]¶

return the Document raw_text

Returns: raw_text
Return type: string

to_dict()[source]¶

transform Document to dict PyMedExt TODO: Need to add the Document Date if available, the processing date, the annotators used

Returns: json PyMedExt
Return type: dict

to_json()[source]¶

transform annotations to a json

Returns: transform annotation to json
Return type: json

write_json(pathToOutput)[source]¶

Transform Document to json PyMedExt

Parameters: pathToOutput – path to result file
Returns: none
Return type: none

pymedextcore.fhirtransform module¶

class pymedextcore.fhirtransform.FHIR[source]¶

Bases: pymedextcore.datatransform.DataTransform

load_xml()[source]¶

Parameters: fhir_input – file name of a fhir file
Returns: Document
Return type: PyMedExt Document

pymedextcore.ncbisource module¶

class pymedextcore.ncbisource.PubTatorSource(host='https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/')[source]¶

Bases: pymedextcore.source.Source, pymedextcore.connector.SimpleAPIConnector

Connection to PubTator api currently https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/

get_pmids_annotations(pmid_list, Bioconcept='', returnFormat=0)[source]¶

Return a set of pmid articles from PubTator

Parameters

pmid_list – a list of articles pmid
Bioconcept – Default (leave it blank) includes all bioconcepts. Otherwise, user can choose

gene, disease, chemical, species, proteinmutation, dnamutation, snp, and cellline. :param returnFormat: 0 return a PyMedExt Document, 1 Return a Bioc Document :returns: PyMedext Document or Bioc Document :rtype:

pymedextcore.normalize module¶

class pymedextcore.normalize.normalize[source]¶

Bases: object

static uri(Document, otherSegments=['drwh_family', 'hypothesis'], rootNode='drwh_sentences', filterEntities=['drugs_fast', 'cui'])[source]¶

uri Normalization

Parameters

Document –
otherSegments –
"hypothesis"] –
rootNode –
filterEntities –
'cui'] –

Returns

Return type

pymedextcore.omopsource module¶

class pymedextcore.omopsource.OmopSource(DB_host, DB_name, DB_port, DB_user, DB_password)[source]¶

Bases: pymedextcore.source.Source, pymedextcore.connector.PostGresConnector

Connection to a POstgres Ommop source

getLastNotenlpid()[source]¶

saveToSource(table_person, table_note, table_note_nlp)[source]¶: Generic method to save data to a specific source :returns: :rtype:

class pymedextcore.omopsource.StringIteratorIO(iter: Iterator[str])[source]¶

Bases: io.TextIOBase

read(n: Optional[int] = None) → str[source]¶

Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF. If n is negative or omitted, read until EOF.

readable() → bool[source]¶

Return whether object was opened for reading.

If False, read() will raise OSError.

pymedextcore.omopsource.clean_csv_value(value: Optional[Any]) → str[source]¶

pymedextcore.omoptransform module¶

class pymedextcore.omoptransform.omop[source]¶

Bases: pymedextcore.datatransform.DataTransform

buildNoteNlP(dict_note, note_id, note_nlp_id, nlp_workflow, thisTime, filterType, dataframe=False)[source]¶

generateNote(to_omop_note, to_date, note_event_id, note_event_field_concept_id, note_type_concept_id, note_class_concept_id, note_title, encoding_concept_id, language_concept_id, provider_id, visit_detail_id, note_source_value)[source]¶

generateNoteNLP(to_omop_nlp, to_date, note_nlp_id, note_id, section_concept_id, note_nlp_concept_id, note_nlp_source_concept_id, nlp_workflow, term_exist, entity)[source]¶

generatePerson(to_omop_person, gender_concept_id, year_of_birth, month_of_birth, day_of_birth, birth_datetime, death_datetime, race_concept_id, ethnicity_concept_id, location_id, provider_id, care_site_id, person_source_value, gender_source_value, gender_source_concept_id, race_source_value, race_source_concept_id, ethnicity_source_value, ethnicity_source_concept_id)[source]¶

load(server)[source]¶

Generic method to save another format into a PyMedExt Document

Returns: PyMedext Document

pymedextcore.pymedext module¶

pymedextcore.pymedext_cmdline module¶

pymedextcore.pymedext_cmdline.export(thisDoc, otype, rawFileName, bexclude)[source]¶

pymedextcore.pymedext_cmdline.loadFile(inputfile, folder, rawFileName, itype)[source]¶

pymedextcore.pymedext_cmdline.main()[source]¶: Simple program that greets NAME for a total of COUNT times.

pymedextcore.source module¶

class pymedextcore.source.Source[source]¶

Bases: object

Abstract Class to extend to implement a specific source connector. see Omop Source example

static loadFromSource()[source]¶: Generic method to download Data from a source :returns: :rtype:

static saveToSource()[source]¶: Generic method to save data to a specific source :returns: :rtype:

pymedextcore package¶

Submodules¶

pymedextcore.annotators module¶

pymedextcore.bioctransform module¶

pymedextcore.brat_parser module¶

pymedextcore.brattransform module¶

pymedextcore.connector module¶

pymedextcore.datatransform module¶

pymedextcore.doccanoannotator module¶

pymedextcore.doccanodocument module¶

pymedextcore.doccanosource module¶

pymedextcore.doccanotransform module¶

pymedextcore.document module¶

pymedextcore.fhirtransform module¶

pymedextcore.ncbisource module¶

pymedextcore.normalize module¶

pymedextcore.omopsource module¶

pymedextcore.omoptransform module¶

pymedextcore.pymedext module¶

pymedextcore.pymedext_cmdline module¶

pymedextcore.source module¶

Module contents¶