FASDIM
Presentation
FASDIM stands for Fast And Simple De-Identification Method. It is a method designed for automated removing of PHI
(Protected Health Information). Although it is based on pattern matching, the originality of the method is that a list
of authorized words is not necessary: such a list can be constructed on the course of the method. However, if you intend
to de-identify French letters, you will save some time, as a list of words already goes with the source code.
This piece of software is open source and can be downloaded on this page.
Is FASDIM the method I need?
FASDIM is probably the method you need when you are in such a situation:
- you don't have any de-identification software, there is no existing method in your language
- you have 40 hours (all included) to anonymize 100,000 free-text discharge letters (in DOC or TXT format)
- or you have a smaller collection of documents
FASDIM is probably not the best option if:
- other methods are widely available (i.e. for English language)
- you already have a learning set of manually de-identified and annotated letters (then you should prefer machine learning approaches)
- you want to automatically annotate the documents (i.e. to tag precisely the first name, the last name, the date, etc.)
- you have more than 5,000,000 letters to de-identify without spending time and with a perfect accuracy
Does FASDIM obtain good results?
The FASDIM method has been published in the
International Journal of Medical Informatics (IJMI, IF=2.061).
The detailed results are available in the scientific paper (free full-text).
The main results are:
- Accuracy:
- Recall: 98.1% of the PHIs (personal health identifier) are deleted (63.7% of the remaining terms are places, 23% are healthcare professionals, and 0% are patient names)
- Precision: 89.2% of the deleted terms are PHIs
- Harmonique mean: 93.4%
- Safe over-scrubbing: although some words are erroneously deleted, this does not alter the medical meaning of the reports:
- 99.02% of the medical terms are appropriately protected, and more specifically
- 99.49% of the diagnoses
- 99.66% of the medical procedures
- Fast and simple implementation:
- the implementation from scratch (without preexisting material) required 40 hours (including software development)
to de-identify 27,000 letters with the best accuracy
- however, if you have to de-identify French letters, the source code and a list of authorized words are already available!
How can I get FASDIM?
Two steps:
- read and accept the GNU General Public License v3.0
- download the source code, that is ready to execute in French (packed with a set of letters for the example,
as well as a list of words - the code is written in English):
This distribution works in the following environment:
- MS Windows
- with a MySQL database
- with PHP installed (a web server is not necessary, only the CLI mode is used)
- input: Word *.doc documents of *.txt simple text
However, if you are familiar with PHP code, it will be easy for you to adapt it to other environments.
How can I learn more?
To learn more about FASDIM, you can: