ICD, rxNorm and Snomed data loads

bradymiller wrote on Saturday, May 05, 2012:

Hi Mac,
Sounds odd. Hard to say what’s going on without seeing the code. Could always throw in a bunch of error_log() debug statements.
-brady
OpenEMR

yehster wrote on Saturday, May 05, 2012:

The best way to identify a performance problem is to profile code.  Use xdebug and wincachegrind or kcachegrind depending on your OS of choice.  Google for details.
It’s more involved initially, but ultimately a better approach because everything else is pretty much just guess work.

mcaloon wrote on Friday, May 11, 2012:

Hello there,
     I got to the bottom of that performance problem. The Kcachegrind/xdebug tool combo did the trick. The culprit was the ICD9 load was using a non-indexed column. That load, after the index was created, runs in a few seconds.

    I am about to commit the ICD code to the hub but wanted to post a review of the approach taken. The general semantics of the data load user interface remain unchanged. So the user still places the zip file into the contrib/<target unload> directory and clicks the Install button. The ICD messages are a little different as some of the processing is handled differently - read on. The implementation is based on the best of the rxNorm load and the Snomed load functions. I took the LOAD INFILE text file load technique from Snomed and took the table driven technique from the rxNorm as the basis for the ICD load implementation. It uses a metadata table that contains the substitution values that are used in the LOAD INFILE command for each of the zip files being loaded. Remember the ICD 10 implementation contains not only the actual code tables but the mapping tables to aid in the migration to ICD10:

ICD 10 File Types

General Equivalence Mappings

Reimbursement Mappings

Diagnostic Codes

Procedure Codes

The load produces the following tables:

Table NameDescription

icd10_pcs_order_codeICD10 Procedure Codes

icd10_dx_order_codeICD10 Diagnostic Codes

icd10_gem_dx_10_9General Equivalence Mappings For Diagnostics from ICD10 to ICD9

icd10_gem_dx_9_10General Equivalence Mappings For Diagnostics from ICD9 to ICD10

icd10_gem_pcs_10_9General Equivalence Mappings For Procedures from ICD10 to ICD9

icd10_gem_pcs_9_10General Equivalence Mappings For Procedures from ICD9 to ICD10

icd10_reimbr_dx_9_10Reimbursement Mappings For Diagnostics from ICD9 to ICD10

icd10_reimbr_pcs_9_10Reimbursement Mappings For Procedures from ICD9 to ICD10

icd9_dx_codeICD9 Diagnostic Codes

icd9_sg_codeICD9 Procedure Codes

    Each of these tables is setup with a surrogate key (mySql SERIAL attribute aka auto-increment) using alter statements once the text file has been loaded. This approach is a best practice and will be helpful in any future foreign key mappings that we might want to implement reporting and analytics based on the ICD10 code base.

    The framework of how rxNorm and Snomed has not been changed so much but I did make some minor changes about how the version and revision dates are handled in the standard_tables_track table. The existing rxNorm and Snomed load files have some details regarding the release date embedded in the filename, whereas the ICD 9 and 10 files don’t really follow the same approach. So when managing the ICD revision dates the code actually unzips the incoming release and uses the same approach as the existing code of storing the revision dates in an array in reverse order. The only difference is that the ICD dates are the actual last update dates on the content files within the zip files versus the date derived from the file name as is used in the rxNorm and Snomed loads.

    I am wrapping up the testing and will be posting the load code soon (by weekend’s end) that takes the raw data and loads it to the staging tables listed above. I haven’t integrated the staging tables into the code base as I wanted to sync up my code to the tree that has had some changes since I took my branch. This will most likely be a tweak to this code or perhaps somewhere else in the code tree that I have yet to discover. Look forward to the code review after the commit. :slight_smile:

Mac

bradymiller wrote on Friday, May 11, 2012:

Hi Mac,
Looking forward to seeing the code :slight_smile:
-brady

mcaloon wrote on Friday, May 11, 2012:

Hello,
    Here is where I pushed the ICD 9 / 10 code

https://github.com/mcaloon/openemr/commit/d247563ec144fe4e5e181126e9dbb572cec9bfb2

Not sure if this is the way to send it up…

I have not retested the upgrade processing for RxNorm or Snomed, Here are the test cases that I did test:

1.) Install staging tables for ICD 9 with no ICD staging tables pre-existing

2.) Install staging tables for ICD 10 with no ICD staging tables pre-existing

3.) Install staging tables for ICD 10 with ICD 9 staging tables installed

4.) Install staging tables for RxNorm with no RxNorm staging tables installed

5.) Install staging tables for Snomed with no Snomed staging tables pre-existing

Mac

bradymiller wrote on Saturday, May 12, 2012:

Hi Mac,

It turns out you have 3 commits on your github branch that are pertinent (rather than just the one linked to above). To get the review expedited, I have combined (via rebase) all three of your commits into one commit and also have placed it into an updated OpenEMR codebase (via rebase). It can be found here:
http://github.com/bradymiller/openemr/commits/mcaloon-ICD10-databaseLoadCode_1
(in the first commit)

I’m currently reviewing/testing it; I’ll let you know when the review is done.

thanks,
-brady

bradymiller wrote on Saturday, May 12, 2012:

Hi Mac,

I just finished my review/testing on github. Overall, this is really nice code. The best way to ensure you see all my comments on github is to search for bradymiller on the commit.

This should be enough to start integrating. I’m happy to start doing this while you keep working on the loading. I should be able to quickly integrate it into the main parts via some mods in the custom/code_types.inc.php .

thanks,
-brady
OpenEMR

bradymiller wrote on Saturday, May 12, 2012:

Hi,

Here’s code that begins to integrate both the external ICD9 and ICD10 tables:
http://github.com/bradymiller/openemr/commit/8d6623079b2985f07649ce40a79c3c6f28eeb4f0

Try it out:
To make the ICD9 work, set external to 4 for ICD9 item in Administration->Lists->Code Types
To make the ICD10 work, simply click active toggle for the ICD10 item in Administration->Lists->Code Types.

Issues to deal with in integration:
1. Stuff mentioned on the code review in github
2. Decimals in the diagnosis codes
3. How to deal with the procedure codes
4. How to allow customizing (adding fees etc.); my thoughts here are to actually store the customize data in the codes table.
5. Get full integration (above commit does quite a bit, but there are still places where ICD9 and searching the codes table are hard-coded in.

-brady
OpenEMR

mcaloon wrote on Saturday, May 12, 2012:

Brady,
    I notice your next is to sdd the diagnostic codnes to the the codes table. Does openemr use the procedure codes anywhere?

Mac

bradymiller wrote on Saturday, May 12, 2012:

Hi Mac,

Currently openemr uses the CPT4 or HCPCS as the procedure codes (which is specific to US system; also from a quick google search it appears the US will keep using these procedure codes even after switch to ICD10 dx codes).

At this point the custom/code_types.inc.php code only collects/searches for diagnosis codes from the external (ie. not in codes table) SNOMED/ICD9/ICD10 tables. There are toggles in code_types table (see Administration->Lists->‘Code Types’) that classifies a code set as being diagnostic and/or billing and it appears the codebase mostly supports a code set being one or the other. But once the code has been fully modularized to support the diagnosis codes for SNOMED/ICD9/ICD10, it should not then be too tough to also get the procedure codes from these elements also supported.

Also, don’t plan to add codes to the codes table. The plan is to use the codes table (if using external code sets) as a holding area for codes that have had a modifier/category/reportable/fees associated with them in the Administration->Fees gui.

-brady
OpenEMR

bradymiller wrote on Saturday, May 12, 2012:

Hi Mac,

Also, in order to allow customizing of these external codes using the codes table and to support users that have already brought ICD9 into their codes table, it is probably best that the code column in the codes table, which is a varchar and currently holds icd codes with decimals, is compatible with the dx_code (currently holds codes without decimals) in the icd9_dx_code tables to allow joining of these tables when searching for customizations. And for ICD10, should consider including the decimals if that is the standard way to view these things (not sure, though, what the standard is). Another possibility here is to include multiple columns in the tables (such as one without decimals and one with, which would provide the most flexibility when making sql join queries).

-brady
OpenEMR

mcaloon wrote on Saturday, May 12, 2012:

Brady,
     I havent looked at it yet, but another approach could be to treat the embedded decimal point as a rendering excercise. We would need to know whether we would be joining on that column or not. We can always use the surrogate keys for connecting tables to each other which is best practice. This approach requires appropriate lookup during insert/update operations.
Mac

mcaloon wrote on Sunday, May 13, 2012:

found this ….

You might find that the ICD-9 codes follow the following format
    All codes are 6 characters long
    The decimal point comes between the 3rd and 4th characters
    If the code starts with a alpha character the decimal point comes between the 2nd and 3rd characters

background wikipedia … http://en.wikipedia.org/wiki/List_of_ICD-9_codes

mcaloon wrote on Sunday, May 13, 2012:

Hello,
    After a little more research I think this previous post might be a little off… I have seen that most of the iCD9 codes are 4 characters and that the decimal point is between the 3rd and 4th characters when code starts with a number or between the 4th and 5th characters when the codes starts with an alpha character. With this in mind we could easily, during the load of the code tables insert another column that has been properly transformed using this set of formatting rules. I definitely want someone else to review and concur that this is indeed the correct interpretation.

Mac

mcaloon wrote on Sunday, May 13, 2012:

please excuse the multiple posts on this formatting stuff but here are some more details on how the codes differ and how they are formatted….
**
What Does an ICD-9 Code Look Like?**

ICD-9 codes and ICD-10 codes actually look quite different from each other. ICD-9 codes, which you’ll find on current paperwork, are being phased out through by 2013. (That is the current date - the original date was in 2011. It could be delayed again.) However, you’ll still ICD-9 codes on paperwork until then, and you’ll see them recorded for deaths.

Most ICD-9 codes are comprised of three characters to the left of a decimal point, and one or two digits to the right of the decimal point.

**
What Does an ICD-10 Code Look Like?**

Over the next few years, the old ICD-9 codes will be replaced by ICD-10 codes. There are a number of changes to the system, and that includes the codes themselves.

ICD-10 codes are approached differently and are quite different from their ICD-9 counterparts. These codes are broken down into chapters and subchapters. They are comprised of a letter plus two digits to the left of the decimal point, then one digit to the right. The letters group diseases. All codes preceded by a C indicate a malignancy (cancer), codes preceded by a K indicate gastrointestinal problems, and so forth.

bradymiller wrote on Monday, May 14, 2012:

Hi,

I think you are right on the ICD9 code formatting; a sanity check for this is to compare your formatted column with the current ICD9 data included in sql/icd9-codes-insert.sql .

Here’s a nice comparison between ICD9/ICD10:
https://docs.google.com/viewer?a=v&q=cache:ce-CvhPzKI0J:www.ama-assn.org/ama1/pub/upload/mm/399/icd10-icd9-differences-fact-sheet.pdf+&hl=en&gl=us&pid=bl&srcid=ADGEESgwgZBCZhqprrqamqjPT4O-LT9P2L6lNSi4QF7H5yzWWpdZtMqu_-8nCoDAhsdQ_V0iviDlyNrs3AlDF7gJEKXZGFZeqF7Z56tIRRY7U2BN56IKir-ZFLc_d7GTlFyHuiRVkqKO&sig=AHIEtbQMSLcPtqtaYlDHwM4OL-4jrZn7Fg&pli=1

My take is that ICD10 dx codes always seems to place the decimal between the third and fourth value.

Since you’re also uploading the procedure codes, we should we also have a column for these in standard formatting (ie. with decimal, if applicable, in the correct place).

-brady
OpenEMR

mcaloon wrote on Monday, May 14, 2012:

Hello Brady,
    I am not quite sure I did the right thing on the hub but I have the integration points implemented with this commit.

https://github.com/mcaloon/openemr/tree/ICD10-integration

Pretty sure I took your branch from the other day as input and added my changes. I still don’t have a real good testing database setup so I didn’t really test this code. Sorry about that, but I figured you might be able to take a quick look and plug it into your testing environment (from the sounds of your previous posts, is well established). I noticed an old post about a sample database… anything laying around a.k.a. dump file???

Mac

mcaloon wrote on Tuesday, May 15, 2012:

Brady,
    I see what you meant about the SG codes. They don’t seem to be the procedure codes. Not sure what to make of them. I have the DX codes formatted column ready to commit soon.

Mac

bradymiller wrote on Tuesday, May 15, 2012:

Hi,

Here’s the next revision on the integration code:
http://github.com/bradymiller/openemr/commit/8ed9c5658783f5583429f50ad0389764cd88d51e
(testing well, so plan to commit soon; please feel free to review it)

Mac, above commit does not include your importing stuff since still waiting on the formatting stuff. For formatting of the ICD9/10 procedure code, check out the google docs link I posted above. Also, haven’t had a chance to test your commit with some of the intergration code yet.

Regarding sample database, I generally just make specific one from scratch within OpenEMR for testing. Not very good with billing, so testing may be a bit slow going.

-brady
OpenEMR

bradymiller wrote on Tuesday, May 15, 2012:

To clarify above, I think the sg is in fact procedure codes.