This Guideline is done to help and give standard method of annotation for medication extraction from french electronic health records. It’s strongly inspired from Preliminary Annotation Guidelines of the i2b2 Medication Extraction Challenge of 2009. For each patient report provided, the goal is to extract information about all of the medications that are known to be taken by the patient or related with him. Some of the medications are provided in semi-structured (list) form, e.g., sections labeled as “medications on admission” or “medications at discharge”. The final output of medications of the patient should include these; however, the real interest is in extracting medications that are mentioned in the narratives of records.
The input of the medication annotation will be discharges summaries of Electronics Health records, pre annotated by a rule based system. The annotation will be done with “brat”1. Those discharges are free text.
The output created from these annotations will be a list of medications and their informations. For each listed medication, the following information needs to be annotated if missing or corrected if false in the pre annotation :
Each entity can be annotate by attribute markers :
Annotations must be done even if there are spelling mistakes, unless those spelling mistakes could induce confusion.
In this document, accents and some punctuation signs have been removed from french example sentences to agree with our first study on extraction of medication informations.
All Medications listed in discharge summary and given (present, past or future) or contraindicated to an experiencer.
Drug name, generics, class of medication or substance
Medications include:
Medications exclude:
Class include :
Class exclude:
“traitement” without precision
action followed by “par”
class included in medical device
To annotate medications, the text has to include an explicit statement indicating that the patient either took this medication, is taking the medication, is prescribed the medication, is suggested to take the medication, had side effect taking the medication or can’t take it because of contraindication.
For medication suggested or uncertain, a certainty attribute“suggested”/“uncertain” must be add. For medication not taken or not given, a certainty attribute “negated” must be add (relation on a negated drug can be annotate, e.g., relation between avk and duration must be annotate for “pas d’avk pendant 2 jours”). For medication mentioned as a contraindication, a certainty “contraindicated” must be annotate. If medications concerned other people, it must be annotated with an experiencer attribute (“family”/“other”).
Annotate the complete noun phrase that correspond to the name of the medication, e.g., amoxicilline acide clavulanique. Annotation must be done even if there are spelling mistakes. Don’t Include words such as “injectable”, “creme”, “nebuliseur”, “solution” as part of the medication name even when they appear immediately after the medication name, e.g., selenium injectable, xylocaïne nebuliseur. Don’t include numeric informations as part of the medication, e.g., renutril 500 unless it concerns a kind of the substance, e.g., iodure 131
Pronouns that refer to a drug, mustn’t be included, but its attributes are related with the element referred.
Each co reference of a medication (class or drug name) or its generics, including with spelling istakes, in the same sentence, must be annotated.
If a drug is written as medication and a class in the same sentence, annotate both (as a class and as a drug). Medication association with class in the same sentence results of one annotation “drug” by medication :
Medication enumeration sharing a word must be annotated together :
But :
Annotate drugs name even if their attributes are negated
The amount of a single medication used in each administration, e.g., un comprimé, une dose, 30 mg.
The numeric and/or the textual information that mark the amount and the unit of administration of a medication used in a single administration. Annotate relation with the drug concerned by a link from dosage to drug name
Includes (not exhaustive):
Exclude:
if dose is negated and drug is given, e.g., don’t annotate dose ,
Cumulative dosages (because too much variability in the meaning):
Annotate all mentioned dosages of all medications present in the discharge summary and their relation with it even if it is part of the medication name.
Annotate all the partial dosage as “Dosage”
Annotate different ways of referring to the same dosage in separate entries:
Annotate immediately adjacent part of a dosage in seperate entry :
Annotate a range of dosage as one entry. In this example, there is multiple dosage for the same drug but in different sentence:
Annotate only one pattern (ordonnance prescription) for all drug when dosage concerned both :
Terms, phrases, or abbreviations that describe how often each dose of the medication should be taken.
Any expression that indicates the frequency of administering a single dose of a medication should be annotated.
Includes :
Frequency :
Temporal phrases which specify when a medication should be taken (These tend to be prepositional phrases. Preposition should be included in the extracted information):
Apply the same basic principles that you use for tagging dose. Annotate each frequency even if repeated in the same sentence
Annotate immediately adjacent part of a frequency as one entry :
If frequency is segmented and concerns same entity, annotate the most informative part :
A elapsed time expression that indicate for how long the medication is to be administered. Such expressions are often noun phrases, prepositional phrases, or clauses.
Expressions that describe the total time period for which the medication should be taken at a given dose. In case of medications that are stopped, the duration indicates for how long the medication has been stopped.
Includes:
Time expressions:
Excludes:
Time expressions that indicate when each dose should be taken. Include these under frequency.
Time expression of starting or stopping a medication :
Cumulative dosages (because too much variability in the meaning):
Follow the same basic principles as for annotating frequency. Don’t include complete prepositional. Duration must be annotated with temporal attribute. if missing, “present” will be considered by default as temporal attribute.
Describes the method for administering the medication.
Text that expresses mode/route of administration, even when it is expressed as part of the medication name or the dosage.
Includes:
per os
Follow the same basic principles as for annotating duration. If route apply to multiple medication, add a relation for each. Multiple route can be related to one drug name
Changes in mode of administration of a drug should be included as separate entries.
Different ways of referring to the same mode of administration should be included in separate entries.
Cases where one mode applies to multiple medications need to be handled properly.
Expressions that indicate condition for which the medication is to be given. Such expressions are often conditional proposal and start with a conditional expression such as “si”, “en cas de”, “en fonction de”…
Condition for which the medication is to be given.
Includes:
Always annotate the most informative base adjective phrase or the longest base noun phrase as the condition for the medication. Longest base noun phrase has the form (det* adj* N+ adj*). Longest adjective phrase often occurs as (adj+). Do not include complex phrases, do not include coordinated phrases. Instead, extract from these phrases the base phrase, even when this means you will end up with multiple conditions. A condition can be related to drug name or an event.
A certainty attribute“conditional” must not be add on the entity concerned by the condition.
If there are different conditions mentioned for the same medication then include one entry per condition. Add relation with entity for each. In cases where multiple medications are given with the same condition, list the condition with all of the medications and add relation for each.
If a condition is composed of multiple sub-conditions (separated by “et”), annotate them together with one entry.
Different ways of referring to the same condition for medication should be treated as separate conditions. Add relation “is_equiv” between them and to the related entity from the closest one.
Information on whether the medication is started, stopped, continued, increase or decrease at a defined time. This information is usually expressed in the main verb of the sentence or by a date. Annotate the event indicated by the most precise date, or, if not possible, by the main word related to the medication.
A date of starting, stopping, continuing, increasing or decreasing a medication. If missing, annotate the main word (verb, noun…) highlighting the event such as “mise en route”, “début”, “poursuite”, “relais”…
“Arg” of a medication : links the event to the related medication affected by the event. To an entity. a medication could have multiple events related.
Choose from possible values: start, stop, continue, start-stop, increase or decrease.
By default, “present” is the temporal attribute of Events. It can be “past”, “present” or “future”. It must be defined according to if the event is before, during or after current hospitalization.
If there are two Events on the same expression (even if both of them are the same, for example 2 start events), you should annotate the expression twice with an event “Start”
Switching one medication for another includes two events on the same expression. One medication is stopped and another one is started.
If there are two events for an entity, include two separate entries. Make each entry as specific and complete as possible. If there are multiple medications for one event, include separate entries for each. Add a relation for each.
Information that indicates when events are to take place, and whether they are factual, suggested, conditional or uncertain
Information about whether the medication was administered in the past, is being administered currently, or will be administered in the future, to the extent that this information is expressed in the tense of the verbs and auxiliary verbs used to express events. One temporal attributes for each event.
Events and duration can be marked.
Choose from possible values for each event: past, present, future. The default temporal attributes is “present”.
See Duration and Events for examples
Information on whether the event occurs. Certainty can be expressed by uncertainty words, e.g., “suggested”, or via modals, e.g., “should” indicates suggestion.
Choose from possible values: conditional, suggestion, factual, uncertain or negated. The default Certainty attributes is “factual”
See previous chapter for examples
All Medications and their attribute listed in discharge summary and given (present, past or future) or contraindicated to an experiencer.
There is two kind of pattern : medication prescription (medication blob) and ordonnance prescription (ordonnance blob). A drug (and its equivalent) and all of its attributes must be annoted such as medication prescription. A sentence with multiple drugs and their attribute must be annoted such as ordonnance prescription. if an attributes is related to multiple drugs, it must not be in a medication prescription but only in ordonnance prescription.
From the first word (medication, attributes or event) until the last. Event type must be applied on the Prescription patterns related. if a drug have no attributes, do not do a prescription pattern.
Examples :
if an attribute or event is related to multiple drugs, include it only in the “ordonnance_blob” pattern which will include the drugs. Here “toujours” is related to the ordonnance.
The informations’ offsets will be generated automatically by “brat”. It starts by 0 for the first character of the discharge summary. Each character, even space, counts as one character. Return to the line count as 2 characters.
Each entry consist of one annotation (entity or relation) and will be printed on its own line with its offset. All annotations follow the same basic structure: each annotation is given an ID that appears first on the line, separated from the rest of the annotation by a single TAB character. The rest of the structure varies by annotation type.
T# “type of entity” start stop “Entity”
T1 drugs 1760 1770 solumedrol
T2 dose 1771 1777 180 mg
T# “type of entity” start1 stop1;start2 stop2 “Entity”
R# “type of relation” Arg1:fromID Arg2:toID
R1 is_dosage Arg1:T2 Arg2:T1
E# “Event type”:“Event ID” Arg:toID
T100 start 3915 3927 introduction E1 start:T100 Arg:T99
A# “Attribute class” “ID of Entity marked” “Attribute name”
A1 certainty T109 uncertain
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou and Jun’ichi Tsujii (2012). brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations Session at EACL 2012.↩