ICD, rxNorm and Snomed data loads

kevmccor wrote on Friday, July 13, 2012:

Hello,
I have done a small bit of looking at this project.  It is interesting and I hope it gets running reliably.  I was wondering if the directory path issue could be helped witht he php DIRECTORY_SEPARATOR constant?
The error at standard_tables_capture.inc:251 may be the line $file = $dir . ‘/’ . $file; which could become:

$file = $dir.DIRECTORY_SEPARATOR.$file;

The DIRECTORY_SEPARATOR is big and ugly, but I am wondering how important it is for cross platform compatibility.
Regards

bradymiller wrote on Friday, July 13, 2012:

Error in the first try. it should be:

$incoming['SHORT_DX'] = array('#TABLENAME#' => 'icd9_dx_code',
                '#FLD1#' => 'dx_code', '#POS1#' => 1, '#LEN1#' => 5,
                '#FLD2#' => 'short_desc', '#POS2#' => 7, '#LEN2#' => 60);

bradymiller wrote on Friday, July 13, 2012:

Testing on linux (actually a ubuntu appliance running on Windows) with this branch: http://github.com/bradymiller/openemr/commits/external-database-import_1
(same branch as above)

ICD9 import : 5 minutes
ICD10 import : 15 seconds
SNOMED import : 3 minutes
RXNORM import : 5 minutes

What is very odd to me is that the ICD9 import takes 5 minutes (and burns up the cpu the entire time), while the much larger ICD10 tables take only 15 seconds. Any reason to explain this?

-brady
OpenEMR

bradymiller wrote on Saturday, July 14, 2012:

Hi,

Mystery solved on why the ICD9 import was taking it so long. I just added another index to each of the tables and now the import takes about 3-5 seconds. Placed the commit on the following branch: http://github.com/bradymiller/openemr/commits/external-database-import_1

-brady
OpenEMR

mcaloon wrote on Saturday, July 14, 2012:

Brady,
    I got to see why last night my new “tip of the tree” ICD9 was hanging…. nice catch. I think my old test branch must have had the index created as my tests never had that issue. :slight_smile: I just verified ICD10 takes about 33 seconds to load on my Mac OS X MacBook Pro. I don’t have a Windows platform to test with, so waiting on Arnab (or anyone else) to check out your ideas on that PHP table issue.

How do you get the RxNorm and Snomed to run so fast. My old branch took FORever!!! I haven’t looked over those tables in your latest branch. Did you change anything for those loads?

Mac

arnabnaha wrote on Saturday, July 14, 2012:

Hi Brady….
Tried the method stated in post no. 22 but still getting the same error:

ERROR: query failed: LOAD DATA LOCAL INFILE ‘C:\Windows\TEMP/ICD10/2012_I10gem.txt’ INTO TABLE icd10_gem_dx_10_9 FIELDS TERMINATED BY ‘’ (@var) SET revision = 0, dx_icd10_source = trim(Substring(@var, 1, 7)),dx_icd9_target = trim(Substring(@var, 9, 5)),flags = trim(Substring(@var, 15, 5))

Error: File ‘C:WindowsTEMP\ICD10\2012_I10gem.txt’ not found (Errcode: 2)

Please help!!

mcaloon wrote on Sunday, July 15, 2012:

Arnab,
    Please test this commit https://github.com/mcaloon/openemr/commit/ff6de47d37101b0f424735e77042ec667aae6d0b that applies the logic from post #21 from kevmccor. Thanks Kev.

Mac

mcaloon wrote on Sunday, July 15, 2012:

I tested that commit on Mac OS X

Mac

arnabnaha wrote on Sunday, July 15, 2012:

HI Mac…
It doesnt fix the problem…still showing the same error…

kevmccor wrote on Monday, July 16, 2012:

Error: File ‘C:WindowsTEMP\ICD10\2012_I10gem.txt’ not found (Errcode: 2

)

Where does the “WindowsTEMP” part of the path come from?

In the /interface/globals.php the temporary files directory is:
  $GLOBALS = rtrim(sys_get_temp_dir(),’/’);

The temporary directory in Windows is usually “C:Windows\Temp”, so I would expect the temporary directory to be “C:\Windows\Temp” in the $GLOBALS.  My experience with Windows and Xampp is quite limited, so I don’t know about different configurations.  

In the /interface/globals.php the temporary files directory is:
  $GLOBALS = rtrim(sys_get_temp_dir(),’/’);

See http://php.net/manual/en/function.sys-get-temp-dir.php

line 241: $dir_icd = $GLOBALS['temporary_files_dir'].DIRECTORY_SEPARATOR.$type.DIRECTORY_SEPARATOR;
line 314: $db_load = "LOAD DATA LOCAL INFILE '#INFILE#' INTO TABLE #TABLENAME# FIELDS TERMINATED BY '\0' (@var) SET revision = 0, ";
line 334: $run_sql = $db_load;
line 335: $run_sql = str_replace("#INFILE#", $dir_icd . $filename, $run_sql);

It looks to me like ‘C:WindowsTEMP\ICD10’ part of the path is from the $dir_icd variable.  What does the php_info() function give for the temporary directory?

bradymiller wrote on Tuesday, July 17, 2012:

Hi,

Here’s the next revision of this code (it is very close to being committed to sourceforge):
http://github.com/bradymiller/openemr/commit/ecd04620cf621892517f27ca99cf4dee2b99c9ce

I made some rather extensive changes to the previous code. If you are curious, here are the two commits shoping the modifications I made to the code (note these are just to see the code changes I made; if you are testing this out, then use the commit above):
http://github.com/bradymiller/openemr/commit/b727c737954a3d6af7750c89b13b93f7ac03cdb1
(most of work was to make the staging algorithm more strict, so as to not allow  importing an old revision and requiring entire file set when importing from multiple files - see commit comments for more info)
http://github.com/bradymiller/openemr/commit/9caad4e357d167dd0461386e3469ec3b60ed5e6e
(fix so the ICD9/ICD10 works on windows)

Here is the current working branch (ie. work from this branch if make any mods to this feature):
http://github.com/bradymiller/openemr/commits/external-database-import_3

Mac,
I have a specific question that I’d like to clarify/answer before this code gets committed to sourceforge:
Why are the icd10_dx_order_code and icd10_pcs_order_code not using SERIAL in the index column like all the other tables?

And for everyone else, please feel free to review/test this code. This feature is going to add 12 new mysql tables, so suggest getting your input in now before it gets committed to sourceforge if have issues with this (or agree with this).

-brady
OpenEMR

mcaloon wrote on Tuesday, July 17, 2012:

All,
    Has anyone ever tested the previous implementation of the Snomed/RxNorm on Windows. I based all the new ICD stuff on the old code and only tested Mac OS X and Ubuntu implementations. I think the problem Arnab is having is a path construction issue. If you see his path name in the error message there are 2 issues:

C:\Windows\TEMP\/ICD10/2012_I10gem.txt

this one has mixed slashes, both Windows and unix types, which obviously is a bug. The second path name issue I see in the log is

C:WindowsTEMP\ICD10\2012_I10gem.txt'

where the leading backslash and the backslash between the first two tuples is also missing. Another defect.

My last commit was to just change the paths to the temp files where the incoming zips get unzipped and processed. Obvious issues remain. I think *all* the the path names in the code need to modified to use the DIRECTORY_SEPARATOR PHP constant as kevmccor had recommended in an earlier post.

Arnab - could you post the following items?

1. most recent error message to see if it is exactly the same
2. GLOBAL setting for the temp directory

I will try to look at the code soon.

Mac

mcaloon wrote on Tuesday, July 17, 2012:

Brady,
    Those 2 ICD10 tables don’t use SERIAL keys because the raw data feed already contains a serialized sequence number in column one. I just imported those columns “as is” as dx_id and pcs_id accordingly.

Mac

mcaloon wrote on Tuesday, July 17, 2012:

Brady,
    Nice job with the stricter revision approach. Thanks.

Mac

bradymiller wrote on Tuesday, July 17, 2012:

Hi Mac,

The most recent code works in windows. Here’s a commit that shows the windows bug fix for ICD9/10 code:
http://github.com/bradymiller/openemr/commit/9caad4e357d167dd0461386e3469ec3b60ed5e6e
(the fix is the second change there)

Note this fix is included in the most recent commit for this code (same as link several posts above):
http://github.com/bradymiller/openemr/commit/ecd04620cf621892517f27ca99cf4dee2b99c9ce

-brady
OpenEMR

bradymiller wrote on Tuesday, July 17, 2012:

Hi Mac,

I tested the upgrade for the ICD10 (easy to mimick by simply increasing the load_release_date of all the ICD10 entries) and the icd10_dx_order_code and icd10_pcs_order_code tables do not get upgraded (all the other ICD10 tables do). I suspect this is because are importing the dx_id and pcs_id columns as is rather than creating them as SERIAL elements. Note I am pretty certain I caused this by placing primary keys on dx_id and pcs_id. So, the question is what will be better:
1. Remove the primary keys from dx_id and pcs_id columns?
2. Or to make them SERIAL columns and don’t manually add them during the import (I like this option, since it is then consistent with all the other table loads, and if ever load data from other sources into these tables, then may run into problems if no manual index exists as it does in the CMS stuff)

Thoughts? If it makes sense to not import these ids and instead use SERIAL, then was hoping you could show the best way to modify the icd_import() function to do this.

thanks,
-Brady
OpenEMR

arnabnaha wrote on Tuesday, July 17, 2012:

HI…
The ICD 9 and 10 installation works flawlessly…Thanks to Brady and Mac for this awesome addition to openemr.

mcaloon wrote on Tuesday, July 17, 2012:

Arnab, Whee!

Brady, agreed using SERIAL is the way to go and is the reason why the upgrade doesn’t work as all the ids already exist in the table from the prior load.

Mac

bradymiller wrote on Tuesday, July 17, 2012:

Hi,

Here’s the new commit after migrating to SERIAL for icd10_dx_order_code and icd10_pcs_order_code: http://github.com/bradymiller/openemr/commit/82354e7eb85d9938adb32b517297d4875a077f53
(it is testing well)

Mac, here’s the commit showing what I did to change to SERIAL. Let me know if any issues in how I did this:
http://github.com/bradymiller/openemr/commit/433b2c9366eac7d54459cabc74187e46b7e11178

Plan to commit this to souceforge over the next day or so.

-brady
OpenEMR

mcaloon wrote on Tuesday, July 17, 2012:

Brady, SERIAL changes look good

Mac