pymedextcore package

Submodules

pymedextcore.annotators module

class pymedextcore.annotators.Annotation(type: str, value: str, source: str, source_ID: str, span: Optional[Tuple[int, int]] = None, attributes: Optional[Dict] = None, isEntity: bool = False, ID: Optional[str] = None, ngram: Optional[str] = None)[source]

Bases: object

Based object which contains Annotation. Each Annotator must return a list of Annotations.

add_child(child)[source]

Add a child to current Annotation

Parameters

child – An annotation to set as child of current node

Returns

None

Return type

None

add_property(neighbor)[source]

add property of a neighbor to current annnotation, if both have the same span

Parameters

neighbor – the Annotation neighbor to add the same property

Returns

None

Return type

None

get_attributes()[source]

get Attributes from current and parents Node

Returns

attributes

Return type

a dict

get_children_span()[source]

from current node, will return all children span

Returns

tuple of span

Return type

list of tuple

get_entities_children()[source]

From current Node, return all children which are Annotation where isEntity =True entities

Returns

children list

Return type

list

get_ngram()[source]

get nGram of the current Annotation

Returns

raw ngram

Return type

string

get_parent(from_type)[source]

return closest parent of the current Annotation of a specific type

Parameters

from_type – specific type to found

Returns

Annotation of a specific type

Return type

Annotation

get_parents_properties()[source]

return parent properties of current annotations if it’s belong to a specific type

Parameters

filter_type – list of Annotations types

Returns

list of current and parents Annotation properties

Return type

list of dict

get_properties()[source]

return current node Properties if the Annotation is from a specific type

Parameters

filter_type – list of Annotations type

Returns

properties

Return type

list of dictionnary

get_span()[source]

return current Annotation span

Returns

span(start,end)

Return type

tuple

set_ngram()[source]

set nGram of the current Annotation

Returns

1

Return type

int

set_parent(parent)[source]

set Parent to current Annotation

Parameters

parent – Annotation

Returns

1

Return type

int

set_root(root)[source]

set Root to current Annotation

Parameters

root – Annotation

Returns

1

Return type

int

to_dict()[source]

Transform Annotation to a dict object :returns: dict :rtype: dict

to_json()[source]

Tranform Annotation to json

Returns

json

Return type

json

class pymedextcore.annotators.Annotator[source]

Bases: object

Abstract class of each Annotator. For that purpose an Annotator must implement the function annotate_function(). This function return a list of Annotations object.

annotate_function(_input)[source]

main annotation function each Annotator must implement this function

Parameters

_input – a list of Annotation typet

Returns

a list of annotations. they will be added to Document.annotations

Return type

List[Annotation]

get_all_key_input(_input)[source]

returns all key input for the Annotator

param _input

return all annotations of a specific types from the Document

returns

a list of annotations

rtype

a list of annotation

Deprecated since version 0.3: This function will be removed soon use instead select_all_inputs

get_first_key_input(_input)[source]

get_first_key_input, return the annotation type [0].

param _input

list of annotations input for the Annotator

returns

a list of annotations

rtype

a list of annotations

Deprecated since version 0.3: This function will be removed soon use instead select_first_input

get_key_input(_input, i)[source]

return a specific annotations type from key_input :param _input: key_input list :param i: the indice of the list to selecy :returns:a list of annotations :rtype:a list of annotation

select_all_inputs(_input)[source]

returns all key input for the Annotator

param _input

return all annotations of a specific types from the Document

returns

a list of annotations

rtype

a list of annotation

New in version 0.3: This function replaced get_all_key_input

select_first_input(_input)[source]

return the first annotation from _input key list,

param _input

list of annotations input for the Annotator

returns

a list of annotations

rtype

a list of annotations

New in version 0.3: This function will replace get_first_key_input

class pymedextcore.annotators.Relation(type: str, head: str, target: str, source: str, source_ID: str, attributes: Optional[List] = None, ID: Optional[str] = None)[source]

Bases: object

Based object which contains Relation

to_dict()[source]

Transform Relation to a dict object

Returns

dict

Return type

dict

to_json()[source]

Tranform Relation to json

Returns

json

Return type

json

pymedextcore.bioctransform module

class pymedextcore.bioctransform.BioC[source]

Bases: pymedextcore.datatransform.DataTransform

static load_collection(bioc_input: str, format: int = 0, is_file: bool = True)[source]

load a bioc collection xml or json. It will return a list of Document object.

Parameters
  • bioc_input – a str path to a bioc file or a bioc input string

  • format – xml or to_json type of the bioc file

  • is_file – if True bioc_input is path else it is a string

Returns

list of Document

static save_as_collection(list_of_pymedext_documents: List[pymedextcore.document.Document])[source]

save a list of pymedext document as a bioc collection . It will return a bioc collection object.

Parameters

list_of_pymedext_documents – a list of Document

Returns

a bioc collection object

static write_bioc_collection(filename: str, collection: bioc.bioc.BioCCollection)[source]

write a BiocCollection as an xml document It will return 1

Parameters
  • filename – a str filename of the collection

  • collection – a bioc collection

Returns

1

pymedextcore.brat_parser module

class pymedextcore.brat_parser.Attribute(id: str, type: str, target: str, values: Tuple[str, ...] = ())[source]

Bases: object

A simple attribute data structure.

id: str
target: str
type: str
values: Tuple[str, ...] = ()
class pymedextcore.brat_parser.AugmentedEntity(id: str, type: str, span: Tuple[Tuple[int, int], ...], text: str, relations_from_me: Tuple[pymedextcore.brat_parser.Relation, ...], relations_to_me: Tuple[pymedextcore.brat_parser.Relation, ...], attributes: Tuple[pymedextcore.brat_parser.Attribute, ...])[source]

Bases: object

An augmented entity data structure with its relations and attributes.

attributes: Tuple[pymedextcore.brat_parser.Attribute, ...]
property end: int
id: str
relations_from_me: Tuple[pymedextcore.brat_parser.Relation, ...]
relations_to_me: Tuple[pymedextcore.brat_parser.Relation, ...]
span: Tuple[Tuple[int, int], ...]
property start: int
text: str
type: str
class pymedextcore.brat_parser.Document(entities: List[pymedextcore.brat_parser.Entity], relations: List[pymedextcore.brat_parser.Relation], attributes: List[pymedextcore.brat_parser.Attribute])[source]

Bases: object

attributes: List[pymedextcore.brat_parser.Attribute]
entities: List[pymedextcore.brat_parser.Entity]
relations: List[pymedextcore.brat_parser.Relation]
class pymedextcore.brat_parser.Entity(id: str, type: str, span: Tuple[Tuple[int, int], ...], text: str)[source]

Bases: object

A simple annotation data structure.

property end: int
id: str
span: Tuple[Tuple[int, int], ...]
property start: int
text: str
type: str
class pymedextcore.brat_parser.Grouping(id: str, type: str, items: List[pymedextcore.brat_parser.Entity])[source]

Bases: object

id: str
items: List[pymedextcore.brat_parser.Entity]
property text
type: str
class pymedextcore.brat_parser.Relation(id: str, type: str, subj: str, obj: str)[source]

Bases: object

A simple relation data structure.

id: str
obj: str
subj: str
type: str
pymedextcore.brat_parser.get_augmented_entities(ann_path: str)Dict[str, pymedextcore.brat_parser.AugmentedEntity][source]
pymedextcore.brat_parser.get_entities_relations_attributes_groups(ann_path: str)Tuple[Dict[str, pymedextcore.brat_parser.Entity], Dict[str, pymedextcore.brat_parser.Relation], Dict[str, pymedextcore.brat_parser.Attribute], Dict[str, pymedextcore.brat_parser.Grouping]][source]
pymedextcore.brat_parser.list_to_dict(s: List)Dict[source]
pymedextcore.brat_parser.parse(ann_path: str)pymedextcore.brat_parser.Document[source]
pymedextcore.brat_parser.parse_attribute(attribute_id: str, attribute_content: str)pymedextcore.brat_parser.Attribute[source]

Parse the annotation string into an Attribute structure.

  • Attribute_id : str The attribute ID in the annotation. (`A1 ` for example)

  • Attribute_content : str The attribute text content. (Tense T19 Past-Ended for example)

  • Attribute An Attribute object

pymedextcore.brat_parser.parse_entity(tag_id: str, tag_content: str)pymedextcore.brat_parser.Entity[source]

Parse the entity string into an Entity structure.

  • tag_id : str The Tag ID in the annotation. (`T12 ` for example)

  • tag_content : str The tag text content. (Temporal-Modifier 116 126 history of for example)

  • Entity An Entity object

pymedextcore.brat_parser.parse_relation(relation_id: str, relation_content: str)pymedextcore.brat_parser.Relation[source]

Parse the annotation string into a Relation structure.

  • relation_id : str The Relation ID in the annotation. (`R12 ` for example)

  • relation_content : str The relation text content. (`Modified-By Arg1:T8 Arg2:T6 ` for example)

  • Relation A Relation object

pymedextcore.brat_parser.parse_string(annotation_string: str)pymedextcore.brat_parser.Document[source]
pymedextcore.brat_parser.parse_string_to_augmented_entities(annotation_string: str)Dict[str, pymedextcore.brat_parser.AugmentedEntity][source]
pymedextcore.brat_parser.read_file_annotations(ann: str)Tuple[List[pymedextcore.brat_parser.Entity], List[pymedextcore.brat_parser.Relation], List[pymedextcore.brat_parser.Attribute]][source]

Read an annotation file and get the Entities and Relations in it.

  • ann : str The path to the annotation file to be processed.

  • Tuple[Set[Entity], Set[Relation], Set[Attribute]] A tuple of sets of Entities, Relations, and Attributes.

pymedextcore.brat_parser.remove_empty(iterable: Sequence[str])Sequence[str][source]

Returns only non-empty strings from an iterable.

  • iterable : Iterable An iterable of strings that possibly contains empty strings.

  • The same iterable with the empty strings removed.

pymedextcore.brat_parser.sanitize_tabs(line: str, max_tabs: int = 2)str[source]

pymedextcore.brattransform module

Created 2020/04/14

@author: David BAUDOIN

fonction : creation ou update d’un fichier BRAT a partir d’un dic pymedext

class pymedextcore.brattransform.brat[source]

Bases: pymedextcore.datatransform.DataTransform

static load_from_brat(ann_file: str, txt_file: Optional[str] = None)pymedextcore.document.Document[source]

Load annotations from a .ann file in the Brat format

Parameters
  • ann_file – path to the .ann file

  • txt_file – path to the corresponding .txt file, if None: defaults to replacing .ann by .txt

Returns

Document

Return type

Document

save_to_brat(folder_path: Optional[str] = None, pym_ann_types: Optional[List[str]] = None, brat_entities_in_pym_types: Optional[List[str]] = None, brat_entities_in_pym_types_value: Optional[List[str]] = None, brat_entities_in_pym_att_values: Optional[dict] = None, brat_entities_in_pym_att_keys: Optional[dict] = None, brat_attributes: Optional[dict] = None, pym_rel_types: Optional[List[str]] = None, brat_ents_of_rel_in_pym_rel_type: Optional[List[str]] = None, brat_ents_of_rel_in_pym_ent_value: Optional[List[str]] = None, brat_ents_of_rel_in_pym_att_values: Optional[dict] = None, brat_type_of_rel_in_pym_rel_types: Optional[List[str]] = None, brat_type_of_rel_in_pym_rel_att_values: Optional[dict] = None, level_annot: Optional[dict] = None)[source]

This function will write all Annotations in Brat files at file_path. It will create (or overwrite) 2 files for each pymedext Documents in documents list input:

  • ID.ann: Brat annotation file (with ID = dic_pymedext.id)

  • ID.txt: Raw text of the document (with ID = dic_pymedext.id)

It will create (or overwrite) an annotation.conf file.

param list_of_documents

List of Documents input. Documents should contain same type of annotations

param folder_path

path in string format. It will store files at this location. Folder needs to be created.

For the other paramters, the extract of this pymedext document will be used in the examples, for a better understanding.

‘’’

{‘type’: ‘QuickUMLS’,

‘value’: ‘oesophagite’, ‘ngram’: None, ‘span’: (188, 199), ‘source’: ‘QuickUMLS:v1’, ‘source_ID’: ‘6814e9fa-96f7-11eb-a8c8-0242ac110002’, ‘isEntity’: False, ‘attributes’: {‘hypothesis’: ‘certain’,

‘context’: ‘patient’, ‘negation’: ‘aff’, ‘cui’: ‘C0014868’, ‘label’: ‘oesophagite’, ‘semtypes’: [‘T047’], ‘score’: 1.0, ‘snippet’: ‘ La fibroscopie oeso-gastro-duodénale avait révélé une oesophagite peptique de grade II et a permis l’exérèse d’un petit papillome du tiers supérieur de l’œsophage’, ‘snippet_span’: (132, 296)},

‘ID’: ‘681c2d82-96f7-11eb-a8c8-0242ac110002’},
{‘type’: ‘regex’,

‘value’: ‘grade II’, ‘ngram’: None, ‘span’: (212, 220), ‘source’: ‘RegexMatcher:v1’, ‘source_ID’: ‘68155570-96f7-11eb-a8c8-0242ac110002’, ‘isEntity’: True, ‘attributes’: {‘version’: ‘v1’,

‘label’: ‘Grade’, ‘id_regexp’: ‘id_grade’, ‘snippet’: ‘-gastro-duodénale avait révélé une oesophagite peptique de grade II et a permis l’exérèse d’un petit papillome du tiers supérie’, ‘hypothesis’: ‘certain’, ‘context’: ‘patient’, ‘negation’: ‘aff’},

‘ID’: ‘682ca3ec-96f7-11eb-a8c8-0242ac110002’},

‘’’

annotations : :param pym_ann_types: Pymedext types of annotation selected. exemple : [‘QuickUMLS’, ‘regex’] -> annotations in Brat will be about this two types of annotations. Depending on the different opitons filled (explained below), different labels will be displayed in brat.

:param brat_entities_in_pym_types : (optional) if brat entities correpond to annotation types in pymedext, this list should be filled. exemple : [‘regex’] -> in brat, for each regex found, ‘regex’ will be displayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘regex’.

:param brat_entities_in_pym_types_value : if brat entities correpond to the value of annotation types in pymedext, this list should be filled. exemple : [‘QuickUMLS’] -> in brat, for each QuickUMLS found, the quickumls annotation value will be displayed. With the extract given ‘oesophagite’ will be highlighted in the text with the label ‘QuickUMLS’.

:param brat_entities_in_pym_att_values : (optional) if brat entities correspond to annotation attributes values in pymexdext, this dict should be filled. Keys correponds to pymedext annotation type, values correspond to pymedext attributes keys. exemple : {‘regex’: ‘label’} -> in brat for each regex found, the regex label in attributes will be displayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘Grade’.

:param brat_entities_in_pym_att_keys : (optional) if brat entities correspond to annotation attributes keys in pymedext, this dict should be filled. Keys correponds to pymedext annotation type, values correspond to pymedext attributes keys. exemple : {‘regex’: ‘label’} -> in brat, for each regex found, the string “label” will be diplayed. With the extract given ‘grade II’ will be highlighted in the text with the label ‘label’.

param brat_attributes

(optional) Dict with pymedext annotation type as keys, and the correspondant attributes list that should be exported as Brat attributes.

exemple : {“QuickUMLS”: [‘hypothesis’, ‘negation’, ‘context’] -> for each quickumls found, hypothesis, negation and context attribute values will be displayed. Put “all” as value if you want all the attributes for this annotation type exemple :{“QuickUMLS”: “all”} for each QuickUMLS found, all attributes (semType, CUI code, hypothesis,… will be displayed.)

relations : :param pym_rel_types: Pymedext types of relation selected. exemple : [‘Stanza’] -> relations in Brat will be about this two types of relations. Depending on the different opitons filled (explained below), different labels will be displayed in brat.

:param brat_ents_of_rel_in_pym_rel_type : (optional) if brat entities of relations correpond to relations types in pymedext, this list should be filled.

:param brat_ents_of_rel_in_pym_ent_value : (optional) if brat entities of relations correpond to relations types in pymedext, this list should be filled.

return

1

pymedextcore.connector module

class pymedextcore.connector.APIConnector(baseurl: str, username: str, password: str)[source]

Bases: pymedextcore.connector._Router

Largely inspired of https://github.com/doccano/doccano-client.git work

Pour l’instant copy de la classe DoccanoClient dans doccano_api_client.py :

TODO: investigate alternatives to plaintext login

Args:

baseurl (str): The baseurl of a Doccano instance. username (str): The Doccano username to use for the client session. password (str): The respective username’s password.

Returns:

An authorized client instance.

class pymedextcore.connector.Connector[source]

Bases: object

TODO : make this an abstract class for other connector

class pymedextcore.connector.DatabaseConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]

Bases: object

Abstract class specialize in database connection

start_connection()[source]

Abstract function where each DatabaseConnector should implement the Database connection

class pymedextcore.connector.PostGresConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]

Bases: pymedextcore.connector.DatabaseConnector

Abstract Connector to a Postgres Database

start_connection()[source]

Initialize the connection to the POstGresConnector :returns: 0 :rtype: 0

class pymedextcore.connector.SSHConnector(scp_host, scp_user, scp_password)[source]

Bases: object

TODO: implement a connection to a server with paramiko, should also extend Connector

class pymedextcore.connector.SimpleAPIConnector(host)[source]

Bases: object

TODO: implement a connection to a server with paramiko, should also extend Connector @David?

start_connection()[source]

Initialize a requests object :returns: 0 :rtype: 0

class pymedextcore.connector.cxORacleConnector(DB_host, DB_name, DB_port, DB_user, DB_password)[source]

Bases: pymedextcore.connector.DatabaseConnector

Abstact connector to an Oracle database using cxOracle

pymedextcore.datatransform module

Each class which transform pymedext Document to another format must herit from the DataTransform

TODO: put some function such as save and load as mandatory to ease the use of DataTransform object

class pymedextcore.datatransform.DataTransform[source]

Bases: object

static load()[source]

Generic method to save another format into a PyMedExt Document

Returns

PyMedext Document

static save()[source]

Generic method to transform a PyMedExt Document save as another format

Returns

not a PyMedExt doc

pymedextcore.doccanoannotator module

class pymedextcore.doccanoannotator.DoccanoAnnotation(text, labels, meta)[source]

Bases: object

Annotation object specific to Doccano

to_dict()[source]

Transform DoccanoAnnotation to dict

Returns

a dict

Return type

dict

to_json()[source]

Transform DoccanoAnnotation to json

Returns

a json

Return type

json

pymedextcore.doccanodocument module

class pymedextcore.doccanodocument.DoccanoDocument[source]

Bases: object

DoccanoDocument is used to build an evaluation document, that will be sent to Doccano interface. DoccanoDocument contains a set of specific DoccanoAnnotation objects that a user want to evaluate.

toDictDoccano()[source]

Transform a DoccanoDocument object to a list of doccanoAnnotation dict

Returns

a list of doccanoAnnotation dict

Return type

dict

toJsonDoccano()[source]

Tranform a DoccanoDocument object to a json. :return: a json :rtype: json

writeJsonDoccano(pathToOutput)[source]

write a json file in pathToOuput path with a DoccanoDument object

Parameters

pathToOutput – output path of the file

Returns

a doccano file

pymedextcore.doccanosource module

class pymedextcore.doccanosource.DoccanoSource(baseurl, username, password)[source]

Bases: pymedextcore.source.Source, pymedextcore.connector.APIConnector

Connection to DoccanoClient

This code is largely inspired of https://github.com/doccano/doccano-client.git work

create_label(project_id: str, label_name: str, color: str, prefix: str, suffix: str)requests.models.Response[source]

Adds a label to an existing project :param self: DoccanoClient :param project_id: the project id :param label_name: the text of the label :return:

create_project(name: str, description: str, project_type: str, guidelines: str)requests.models.Response[source]

Creats a project :param name: name of the project :param description: description of the project :param project_type: type of project (“SequenceLabeling”, “DocumentClassification” or “Seq2seq” :return:

exp_get_doc_list(project_id: int, limit: int, offset: int)requests.models.Response[source]
find_project_id(regex: str, date: str, time: str)[source]

Finds project id with a item (specific to scanner-covid project). If item is not enough to find the project id, date and time can be used. :param regex: item of interest :param date: date of the project :param time: time of the project :return: a project id

get_annotation_detail(project_id: int, doc_id: int, annotation_id: int)requests.models.Response[source]
get_annotation_list(project_id: int, doc_id: int)requests.models.Response[source]

Gets a list of annotations in a given project and document.

Args:

project_id (int): A project ID to query. doc_id (int): A document ID to query.

Returns

requests.models.Response: The request response.

get_doc_download(project_id: int, file_format: str = 'json')requests.models.Response[source]
get_document_detail(project_id: int, doc_id: int)requests.models.Response[source]

Gets details of a given document.

Args:

project_id (int): A project ID to query. doc_id (int): A document ID to query.

Returns

requests.models.Response: The request response.

get_document_list(project_id: int, url_parameters: dict = {})requests.models.Response[source]

Gets a list of documents in a project.

Args:

project_id (int): url_parameters (dict): limit and offset

Returns

requests.models.Response: The request response.

get_features()requests.models.Response[source]

Gets features.

Returns

requests.models.Response: The request response.

get_label_detail(project_id: int, label_id: int)requests.models.Response[source]

Gets details of a specific label.

Args:

project_id (int): A project ID to query. label_id (int): A label ID to query.

Returns

requests.models.Response: The request response.

get_label_id(project_id: int, label_name: str)[source]

Get the label id with the label name :param project_id: id of the project :param label_name: text of the label :return: id of the label

get_label_list(project_id: int)requests.models.Response[source]

Gets a list of labels in a given project.

Args:

project_id (int): A project ID to query.

Returns

requests.models.Response: The request response.

get_me()requests.models.Response[source]

Gets this account information. :return: requests.models.Response: The request response.

get_project_detail(project_id: int)requests.models.Response[source]

Gets details of a specific project.

Args:

project_id (int): A project ID to query.

Returns

requests.models.Response: The request response.

get_project_id(project_name: str)[source]

Get the project id with the project name :param self: :param project_name: :return: the project id

get_project_list()requests.models.Response[source]

Gets projects list.

Return:requests.models.Response

The request response.

get_project_statistics(project_id: int)requests.models.Response[source]

Gets project statistics.

Args:

project_id (int): A project ID to query.

Returns

requests.models.Response: The request response.

get_rolemapping_detail(project_id: int, rolemapping_id: int)requests.models.Response[source]

Currently broken!

get_rolemapping_list(project_id: int)requests.models.Response[source]
get_roles()requests.models.Response[source]

Gets available Doccano user roles.

Returns

requests.models.Response: The request response.

get_user_id(username: str)[source]

Get the user id with the username :param self: :param username: :return: the userid

get_user_list()requests.models.Response[source]

Gets user list.

Returns

requests.models.Response: The request response.

post_approve_labels(project_id: int, doc_id: int)requests.models.Response[source]
post_doc_upload(project_id: int, file_format: str, file_name: str, file_path: str = './')requests.models.Response[source]

Uploads a file to a Doccano project.

Args:

project_id (int): The project id number. file_format (str): The file format, ex: plain, json, or conll. file_name (str): The name of the file. file_path (str): The parent path of the file. Defaults to ./.

Returns

requests.models.Response: The request response.

set_rolemapping_list(project_id: str, user_id: str, role_id: str, username: str, rolename: str)requests.models.Response[source]

Set users roles :param self: DoccanoClient :param project_id: :param user_id: :param role_id: :param username: :param rolename: :return: requests.models.Response: The request response.

pymedextcore.doccanotransform module

class pymedextcore.doccanotransform.Doccano[source]

Bases: pymedextcore.datatransform.DataTransform

This class defines a set of transformation methods to build a DoccanoDocument with several pymedext Document objects. A doccanoDocument contains N doccanoAnnotations, that the user want to evaluate in Doccano interface.

Here the transoformation methods are specific to scanner report extractions, and DrWH negation, hypothesis and family context detections. Other transformation methods could be defined according to what the user want to evaluate.

DoccanoEvalClass(dict_doccano, dictClasses, number_eval, path_to_doc)[source]

Adds doccano annotations to DoccanoDocument object until a specified number of evaluations for both classes.

Parameters
  • dict_doccano – A doccano dict that will be filled until the two classes reach the desired number of annotations

  • dictClasses – A dict of doccano classes (ex : negatif vs non negatif) with their current occurences.

  • number_eval – the number of evaluations desired

Returns

A list with the modified Doccano Object, and a dict of annotations classes, with their number

Return type

list

DoccanoEvalN(dict_doccano, number_annoted, number_eval, path_to_doc)[source]

Adds DoccanoAnnotation objects to DoccaDocument until a specified number of evaluations.

Parameters
  • dict_doccano – A doccano dict that will be filled until it reachs the desired number of annotations.

  • number_annoted – the number of evaluations desired.

  • number_eval – the current number of annotation.

Returns

A list with the modified DoccanoDocument object, and the number of annotations

Return type

list

DoccanoEvalRappel(dict_doccano, path_to_doc)[source]

Adds DoccanoAnnotation objects to a DoccanoDocument object, with a dict created with toDoccanoImaRappel

Parameters
  • dict_doccano – a dict created with toDoccanoImaRappel method

  • path_to_doc – the path of the pymedext doc that was used to create dict_doccano

Returns

DoccanoDocument object

Return type

DoccanoDocument

docForDoccano()[source]

Creats a DoccanoDoc object with dict in input

Returns

DoccanoDocument object

:rtype : DoccanoDocument

toDoccanoDrWH(type, segment)[source]

Specific method for DrWH evaluation Selects the drwh annotation and their value and creat a dict with syntagm/sentence as key and class value of the syntagm/sentence as value.

Ex : Extract of a pymedextDocument

{

“type”: “drwh_syntagms”, “value”: ” Le patient présente un diabète de type II”, “span”: [

47, 91

], “source”: “DRWH_syntagms.v1”, “source_ID”: “74633e84-80a3-11ea-a7f6-180f76073bf2”, “isEntity”: false, “attributes”: null, “id”: “74633e89-80a3-11ea-a7f6-180f76073bf2”

{

“type”: “drwh_negation”, “value”: “non negatif”, “span”: [

47, 91

], “source”: “DRWH_negation.v1”, “source_ID”: “74633e89-80a3-11ea-a7f6-180f76073bf2”, “isEntity”: false, “attributes”: null, “id”: “74633e95-80a3-11ea-a7f6-180f76073bf2”

An extract of pymedextDocument.toDoccanoDrWH(type=dwh_negation, segment=dwh_syntagm) return will be:

{…,”Le patient présente un diabète de type II”=”non negatif”…,}

Parameters
  • type – dwh type of class (“dwh_negation” or “dwh_hypothesis” or “dwh_family”)

  • segment – syntagm or sentence (“dwh_sentence” or “dwh_syntagm”)

Returns

a dict with syntagm or sentence as key and class as value

Return type

dict

toDoccanoImaPrecision(type, attribute=None)[source]

Specific method for scanner extractor evaluation Selects the value extracted of the desired item in the pymedext Document. Returns a dict with the context as key, and the value extracted as value.

Ex : Extract of a pymedextDocument : ” …

{

“type”: “motif”, “value”: “non évocateur”, “span”: [

2026, 2039

], “source”: “annotator_section_img”, “source_ID”: “eae2fd1e-8096-11ea-9260-e470b8d2ff7c”, “isEntity”: false, “attributes”: “ISCOVID”, “id”: “eae2fd1c-8096-11ea-b180-e470b8d2ff7c”

An extract of pymedextDocument.toDoccanoImaPrecision(type=”motif”, attribute=”ISCOVID”) return will be :

{…,<raw_text[2026,2039]> : “non évocateur”,…}

where raw_text[2026,2039] is the context of the extraction, a short extract of the report text around the extraction.

Parameters
  • type – the type of the item (“rubrique” or “motif”)

  • attribute – item of interest (ex : “ISCOVID”)

Returns

a dict with litteral context as key and value extracted as value

Return type

dict

toDoccanoImaRappel(dict_regexp_type, value, span)[source]

Specific method for scanner extractor evaluation Founds the absent item in a pymedext Document.

Parameters
  • dict_regexp_type – a dict of with the item as key (ex: “ISCOVID”) and their type as value (“motif” or “rubrique”)

  • value – “Null”

Returns

A dict with the report text as key and a list of absent item as value

Return type

dict

toDoccanoPb(path_to_doc, regexp)[source]

Specific method for scanner extractor evaluation A specific format to display documents that were annoted with label “problem” in Doccano

Parameters
  • path_to_doc – the path to the doc that was annoted with problem label in Doccano

  • regexp – the item of interest

Returns

a dict with the text report as key and the regex as value

pymedextcore.document module

class pymedextcore.document.Document(raw_text, ID=None, attributes=None, source=None, pathToconfig=None, documentDate=None)[source]

Bases: object

Document is the main class of pymedext. It is use to load file and annotate them with annotators

annotate(annotator)[source]

Main function to annotate Document

Parameters

annotator – annotators list

Returns

run _annotate which add annotations to Document

Return type

Document

static from_dict(d)[source]

Create a Document from a dict of document (as created using to_dict) :param d: Dict :returns: Document :rtype: Document

get_annotation_by_id(_id)[source]
get_annotations(_type, source_id=None, target_id=None, attributes=None, value=None, span=None)[source]

returns an annotations of a specific type from source. Can filter from type, source_id or target_id, span, source_id, attributes and value. :param _type: annotation type :param source_id: annotation source id :param target_id: annotation target id :param attributes: :param value: :param span: :return:

get_graph()[source]

return the graph associated with the raw_text :returns: :rtype:

get_relation_by_id(_id)[source]
get_relations(_type=None, head_id=None, target_id=None)[source]

returns relations of a specific type from source. Can filter from type, head_id or target_id. :param _type: annotation type :param head_id: annotation source id :param target_id: annotation target id :return:

load_annotations_files(pathToconfig)[source]

Transform json Pymedext to Document

Parameters

pathToconfig – list of path to json files,

Returns

add annotations to Document

Return type

Document

raw_text()[source]

return the Document raw_text

Returns

raw_text

Return type

string

to_dict()[source]

transform Document to dict PyMedExt TODO: Need to add the Document Date if available, the processing date, the annotators used

Returns

json PyMedExt

Return type

dict

to_json()[source]

transform annotations to a json

Returns

transform annotation to json

Return type

json

write_json(pathToOutput)[source]

Transform Document to json PyMedExt

Parameters

pathToOutput – path to result file

Returns

none

Return type

none

pymedextcore.fhirtransform module

class pymedextcore.fhirtransform.FHIR[source]

Bases: pymedextcore.datatransform.DataTransform

load_xml()[source]
Parameters

fhir_input – file name of a fhir file

Returns

Document

Return type

PyMedExt Document

pymedextcore.ncbisource module

class pymedextcore.ncbisource.PubTatorSource(host='https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/')[source]

Bases: pymedextcore.source.Source, pymedextcore.connector.SimpleAPIConnector

Connection to PubTator api currently https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/

get_pmids_annotations(pmid_list, Bioconcept='', returnFormat=0)[source]

Return a set of pmid articles from PubTator

Parameters
  • pmid_list – a list of articles pmid

  • Bioconcept – Default (leave it blank) includes all bioconcepts. Otherwise, user can choose

gene, disease, chemical, species, proteinmutation, dnamutation, snp, and cellline. :param returnFormat: 0 return a PyMedExt Document, 1 Return a Bioc Document :returns: PyMedext Document or Bioc Document :rtype:

pymedextcore.normalize module

class pymedextcore.normalize.normalize[source]

Bases: object

static uri(Document, otherSegments=['drwh_family', 'hypothesis'], rootNode='drwh_sentences', filterEntities=['drugs_fast', 'cui'])[source]

uri Normalization

Parameters
  • Document

  • otherSegments

  • "hypothesis"]

  • rootNode

  • filterEntities

  • 'cui']

Returns

Return type

pymedextcore.omopsource module

class pymedextcore.omopsource.OmopSource(DB_host, DB_name, DB_port, DB_user, DB_password)[source]

Bases: pymedextcore.source.Source, pymedextcore.connector.PostGresConnector

Connection to a POstgres Ommop source

getLastNotenlpid()[source]
saveToSource(table_person, table_note, table_note_nlp)[source]

Generic method to save data to a specific source :returns: :rtype:

class pymedextcore.omopsource.StringIteratorIO(iter: Iterator[str])[source]

Bases: io.TextIOBase

read(n: Optional[int] = None)str[source]

Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF. If n is negative or omitted, read until EOF.

readable()bool[source]

Return whether object was opened for reading.

If False, read() will raise OSError.

pymedextcore.omopsource.clean_csv_value(value: Optional[Any])str[source]

pymedextcore.omoptransform module

class pymedextcore.omoptransform.omop[source]

Bases: pymedextcore.datatransform.DataTransform

buildNoteNlP(dict_note, note_id, note_nlp_id, nlp_workflow, thisTime, filterType, dataframe=False)[source]
generateNote(to_omop_note, to_date, note_event_id, note_event_field_concept_id, note_type_concept_id, note_class_concept_id, note_title, encoding_concept_id, language_concept_id, provider_id, visit_detail_id, note_source_value)[source]
generateNoteNLP(to_omop_nlp, to_date, note_nlp_id, note_id, section_concept_id, note_nlp_concept_id, note_nlp_source_concept_id, nlp_workflow, term_exist, entity)[source]
generatePerson(to_omop_person, gender_concept_id, year_of_birth, month_of_birth, day_of_birth, birth_datetime, death_datetime, race_concept_id, ethnicity_concept_id, location_id, provider_id, care_site_id, person_source_value, gender_source_value, gender_source_concept_id, race_source_value, race_source_concept_id, ethnicity_source_value, ethnicity_source_concept_id)[source]
load(server)[source]

Generic method to save another format into a PyMedExt Document

Returns

PyMedext Document

pymedextcore.pymedext module

pymedextcore.pymedext_cmdline module

pymedextcore.pymedext_cmdline.export(thisDoc, otype, rawFileName, bexclude)[source]
pymedextcore.pymedext_cmdline.loadFile(inputfile, folder, rawFileName, itype)[source]
pymedextcore.pymedext_cmdline.main()[source]

Simple program that greets NAME for a total of COUNT times.

pymedextcore.source module

class pymedextcore.source.Source[source]

Bases: object

Abstract Class to extend to implement a specific source connector. see Omop Source example

static loadFromSource()[source]

Generic method to download Data from a source :returns: :rtype:

static saveToSource()[source]

Generic method to save data to a specific source :returns: :rtype:

Module contents