AccueilProfessionnelVoile, catamaranContact, infos

Rubrique professionnelle :

CVEnseignementsPublicationsProjets financés



FASDIM stands for Fast And Simple De-Identification Method. It is a method designed for automated removing of PHI (Protected Health Information). Although it is based on pattern matching, the originality of the method is that a list of authorized words is not necessary: such a list can be constructed on the course of the method. However, if you intend to de-identify French letters, you will save some time, as a list of words already goes with the source code. This piece of software is open source and can be downloaded on this page.

Is FASDIM the method I need?

FASDIM is probably the method you need when you are in such a situation:

  • you don't have any de-identification software, there is no existing method in your language
  • you have 40 hours (all included) to anonymize 100,000 free-text discharge letters (in DOC or TXT format)
  • or you have a smaller collection of documents 

FASDIM is probably not the best option if:

  • other methods are widely available (i.e. for English language)
  • you already have a learning set of manually de-identified and annotated letters (then you should prefer machine learning approaches)
  • you want to automatically annotate the documents (i.e. to tag precisely the first name, the last name, the date, etc.) 
  • you have more than 5,000,000 letters to de-identify without spending time and with a perfect accuracy 

Does FASDIM obtain good results?

The FASDIM method has been published in the  International Journal of Medical Informatics (IJMI, IF=2.061). The detailed results are available in the scientific paper (free full-text). The main results are:

  • Accuracy: 
    • Recall: 98.1% of the PHIs (personal health identifier) are deleted (63.7% of the remaining terms are places, 23% are healthcare professionals, and 0% are patient names)
    • Precision: 89.2% of the deleted terms are PHIs
    • Harmonique mean: 93.4%
  • Safe over-scrubbing: although some words are erroneously deleted, this does not alter the medical meaning of the reports:
    • 99.02% of the medical terms are appropriately protected, and more specifically 
    • 99.49% of the diagnoses 
    • 99.66% of the medical procedures
  • Fast and simple implementation: 
    • the implementation from scratch (without preexisting material) required 40 hours (including software development) to de-identify 27,000 letters with the best accuracy
    • however, if you have to de-identify French letters, the source code and a list of authorized words are already available!

How can I get FASDIM?

Two steps:

This distribution works in the following environment:

  • MS Windows
  • with a MySQL database
  • with PHP installed (a web server is not necessary, only the CLI mode is used)
  • input: Word *.doc documents of *.txt simple text

However, if you are familiar with PHP code, it will be easy for you to adapt it to other environments. 

How can I learn more?

To learn more about FASDIM, you can: - Copyright 2001-2016
Page générée par nos soins le 10/10/2016 à 21:27:37