FASDIM stands for Fast And Simple De-Identification Method. It is a method designed for automated removing of PHI
(Protected Health Information). Although it is based on pattern matching, the originality of the method is that a list
of authorized words is not necessary: such a list can be constructed on the course of the method. However, if you intend
to de-identify French letters, you will save some time, as a list of words already goes with the source code.
This piece of software is open source and can be downloaded on this page.
FASDIM is probably the method you need when you are in such a situation:
- you don't have any de-identification software, there is no existing method in your language
- you have 40 hours (all included) to anonymize 100,000 free-text discharge letters (in DOC or TXT format)
- or you have a smaller collection of documents
FASDIM is probably not the best option if:
- other methods are widely available (i.e. for English language)
- you already have a learning set of manually de-identified and annotated letters (then you should prefer machine learning approaches)
- you want to automatically annotate the documents (i.e. to tag precisely the first name, the last name, the date, etc.)
- you have more than 5,000,000 letters to de-identify without spending time and with a perfect accuracy
The FASDIM method has been published in the
International Journal of Medical Informatics (IJMI, IF=2.061).
The detailed results are available in the scientific paper (free full-text).
The main results are:
- Accuracy:
- Recall: 98.1% of the PHIs (personal health identifier) are deleted (63.7% of the remaining terms are places, 23% are healthcare professionals, and 0% are patient names)
- Precision: 89.2% of the deleted terms are PHIs
- Harmonique mean: 93.4%
- Safe over-scrubbing: although some words are erroneously deleted, this does not alter the medical meaning of the reports:
- 99.02% of the medical terms are appropriately protected, and more specifically
- 99.49% of the diagnoses
- 99.66% of the medical procedures
- Fast and simple implementation:
- the implementation from scratch (without preexisting material) required 40 hours (including software development)
to de-identify 27,000 letters with the best accuracy
- however, if you have to de-identify French letters, the source code and a list of authorized words are already available!
Two steps:
This distribution works in the following environment:
- MS Windows
- with a MySQL database
- with PHP installed (a web server is not necessary, only the CLI mode is used)
- input: Word *.doc documents of *.txt simple text
However, if you are familiar with PHP code, it will be easy for you to adapt it to other environments.
To learn more about FASDIM, you can: