Duplicate patient checker

cfapress wrote on Wednesday, June 24, 2009:

I’ve been working on a tool to locate duplicate patients and allow you to merge them together. Earlier today I committed the code to CVS for anyone to check out. It’s not finished, it will not commit any changes to the database right now.

You can find it here:
<oemr>/contrib/util/dupecheck

I expect to be finished with it soon. If you have any suggestions for improvements, let me know and I’ll see about including them.

Basically, here is how it works:
- Choose the fields in which to search for duplication.
It defaults to name and date of birth
- Choose how you’d like the search to be sorted
- Limit your search to the first XXX records of the patient_data table. I included this because our database includes over 19,000 patients
- Click on Go and the list of duplicates will appear in the box below

Once you see the duplicates you can click on the ‘?’ button to drill into the patient’s details. Or you can click on the patient name to be the ‘master’ record. The non-master records will be merged into the master. All tables that reference the PID will be changed to the master record.

I’m not describing it very well but I hope you get the idea. Basically you choose the patient record to keep and the other duplicates are merged into it.

Jason

cfapress wrote on Wednesday, June 24, 2009:

The dupe-checker is now fully functional. It should be used with extreme care.

Jason

cfapress wrote on Wednesday, June 24, 2009:

Check-that.

I just screwed myself by removing a necessary file. Shit. I’m going to see what can be done to retrieve anything from CVS. The file I meant to delete was dupecheck.php but instead I removed mergerecords.php.

BAH!

Jason

cfapress wrote on Wednesday, June 24, 2009:

OK, the panic is over thanks to CVS on SourceForge.

I was able to locate the ‘dead’ file, retrieve it, commit it back to CVS, and remove the file I originally intended.

Whew –

Jason

bgregg wrote on Thursday, October 18, 2012:

This is a great tool! I’ve used this quite a few times in the older versions and gave it a test in 4.1.1 to see what would happen, and it’s pretty outdated with all of the new tables. Just wanted to give a quick heads up to the community in case anyone wants to give this a try.

kevmccor wrote on Thursday, October 18, 2012:

I don’t know enough about the database to do this, but maybe someone could write an array of all tables and the name of the patient id field, which would have to be updated whenever a new table is added.  Duplicate checks can’t be the only thing this would help. 

tmccormi wrote on Thursday, October 18, 2012:

show columns from patient_data like ‘%pid%’;

+--------+--------------+------+-----+---------+-------+
| Field  | Type         | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| pubpid | varchar(255) | NO   |     |         |       |
| pid    | bigint(20)   | NO   | PRI | 0       |       |
+--------+--------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

That works so a loop could build a dynamic table list…  There are some tables that have ‘patient_id’ instead of pid, but they should be fixed any way.

-Tony