Use pear's MDB2 to avoid SQL injections?

acmoore wrote on Wednesday, June 02, 2010:

I posted an example patch to show how we might use one of the several database libraries to help avoid SQL injection attacks.

The example vulnerability I addressed is described at:
https://sourceforge.net/tracker/?func=detail&atid=493001&aid=2987502&group_id=60081

I fixed it by using pear’s MDB2 library, which provides the ability to bind variables to SQL placeholders. I submitted an example patch for review at:
https://sourceforge.net/tracker/?func=detail&aid=3010583&group_id=60081&atid=1245239

Brady suggested I start a forum topic on this instead of burying my stuff in the tracker.

Some things I’m interested in hearing from other developers are your thoughts on:

* is it OK to add a dependency like this to pear’s libraries? They’re very commonly used, but we’d have to ensure the dependency is met on any machines we install on.
* is MDB2 the best library to use? I know there are others, but haven’t used them.

If this seems like a reasonable way to address this systemic problem, I would add some more functions to make our use of MDB2 easier, add some more documentation to the wiki on how to best use it, and start refactoring some of the code to make use of the new tools.

I’m eager to hear all of your thoughts.

-Andy

stephen-smith wrote on Wednesday, June 02, 2010:

PEAR::MDB2 is probably the best choice for a DB abstraction layer.  PEAR::DB and PEAR::MDB are deprecated in favor of it.  PEAR::DBA is for BDB-style databases, which is not what we want.  ADOdb is another contender, though the comparative review I read indicated that MDB2 was easier to use and performed just as well.

In fact, we are ALREADY USING an old version of ADOdb (yes, it has known, minor security issues) that was checked in to library/adodb.  We don’t use it well - we create the connection with ADOdb, then ask it for the internal db-specific handle, then make mysql_ calls against that handle.

ADOdb also supports using placeholders, emulating it with per-type auto-escaping if the underlying driver doesn’t support it.

My impression is that whatever is used will have to “just work” on MS Windows by exploding the tarball somewhere in the web root and running the OpenEMR setup script.  I don’t know if PEAR packages can be used like that - my intuition is that they can, but it is better to have PEAR running so you can be aware of available updates (possibly security-related).

bradymiller wrote on Wednesday, June 02, 2010:

hey,
It’s great that others are starting to fix this stuff. There’s been some upstream hackery to ADODB for the logging feature, so would be easier to use ADODB(if it can do this).

Visolve, was there any documentation placed on the wiki regarding the logging stuff.

-brady

acmoore wrote on Thursday, June 03, 2010:

Brady -

OK, I’ll look into adodb and see if I can figure it out. I’ve never used it, so don’t expect the moon! I’ll see what I can whip up.

Thanks!
-Andy

visolveemr wrote on Thursday, June 03, 2010:

Hi Brady,

Yes… Information about logging  is present in http://openmedsoftware.org/mw/images/d/d2/LoggingInOpenEMR.pdf

Do share your views here. You can place the same at the appropriate location.

Thanks
ViCarePlus Team

stephen-smith wrote on Thursday, June 03, 2010:

https://sourceforge.net/tracker/?func=detail&aid=3011127&group_id=60081&atid=493001

Our existing ADOdb needs an upgrade, and the old version in combination with a new PHP5 version from unstable blew up one of my testing environments today.

sunsetsystems wrote on Thursday, June 03, 2010:

I don’t see why we need to distribute adodb with OpenEMR.  Would you like to look into the possibility of taking it out, and using whatever is appropriate as a dependency?

Rod
www.sunsetsystems.com

acmoore wrote on Thursday, June 03, 2010:

I’ve submitted an example patch using the ADODB functions in a more appropriate manner:
https://sourceforge.net/tracker/?func=detail&aid=3011164&group_id=60081&atid=1245239

After doing this, I think I like this approach better than forcing MDB2 on the project, even though it seems more popular.

Before this is accepted, we would have to:
* upgrade ADODB, I think
* fix a bunch of SQL calls that were broken by my change, which I can do.
* gradually change old queries to use placeholders instead of directly using variables in SQL
* encourage people to start using SQL placeholders moving forward

Does anyone object to this direction, or see any unintended, unwanted consequences?

Thanks,
-Andy

stephen-smith wrote on Thursday, June 03, 2010:

I don’t see why we need to distribute adodb with OpenEMR.

Well, right now our contains a few custom changes.  The only one I can think of is the changes to the Execute member function, that causes it to use our logging function.  Using the version downloaded from the ADOdb project or provided by the system (libphp-adodb package on Debian) would lose those changes.

I’m fine with that.  I think the logging / auditing is probably better done at a level higher than ADOdb.  The problem is, I’m not sure where that fits in best.  Our stuff in library/sql.inc does it, but that doesn’t have any support for bind variables, which is really what kicked off this thread.

Would you like to look into the possibility of taking it out…

My main concern with dropping it entirely is breaking the CVS Demo and Developer Appliance.  I suppose we could have the install/setup script download it, if needed, at install time.  There’s also the conflicting goals of getting updates for security issues vs. installing a version of ADOdb we never tested a release against.  Finally, some support for a system-provided ADOdb probably means fixing up our includes a bit.

Brady, are the script(s) that maintain the CVS demo and the developer’s appliance in the tree?  I’d like to look at them and see if I can’t update them, both for Git and this possible change.

sunsetsystems wrote on Thursday, June 03, 2010:

Well we don’t want to break logging.  Certainly some sort of wrapper for adodb calls would be in order, for that and as insurance against other possible issues later.  Don’t bite off too much at once, maybe do the wrapper first and then decide if you want to replace adodb, or take it out.

Agreed that a higher level of logging would be better, but let’s make that a different problem.

Rod
www.sunsetsystems.com

stephen-smith wrote on Thursday, June 03, 2010:

I’d been speaking (er, writing?) in the abstract.  Andy’s patch still goes through the stuff in library/sql.inc so we still get logging, and we also get bind variables.  Mostly, I think it is a good patch, and I already threw up a comment on the patch tracker about the biggest problem I see with it.  It doesn’t support prepared statements, which I normally associate with bind variables, but I can’t think of any time they would be a performance gain in the codebase.

Dropping ADOdb from the CVS repository is something I’d like to do, but it is a medium-sized project and can be held off for later.

acmoore wrote on Thursday, June 03, 2010:

Thanks for looking it over. I’ll work on one that doesn’t use exceptions so we can continue to support PHP4. (yuk!)

I’d love to support prepared statements in some way eventually. I’m hoping to make small steps here for now.

I’m glad we seem to be on the same page. I’m still waiting for someone to point out something that I’m missing that makes
this impossible. If that doesn’t happen soon, I’ll put together a more comprehensive patch that we can actually apply.

Thanks!
-Andy

bradymiller wrote on Thursday, June 03, 2010:

acmoore,
Haven’t been able to test it, but like the concept of this. What happens if a variable has already been escaped, and then placed
into a binded sql statement? I’m guessing wouldn’t work, so some variable processing may require refactoring. So, then th flow
of a variable is to 1) remove escapes (if magic quotes is set) and then 2) would put this right into the sql command with bind. 
Out of curiosity, is there a way to globally deal with magic quotes of all variables in the POST, PRE, etc, for each script (ie. would need to deal with recursive array issues). That would be nice and clean: Then a variable could go directly into the sql bind statement, thus avoiding the need for the current rather confusing functions in formdata.inc.php .

-brady

bradymiller wrote on Thursday, June 03, 2010:

hey,
disregard the PRE thing, meant to say GET

stephen-smith wrote on Thursday, June 03, 2010:

Brady: (untested, but should do it. )
$gpc_superglobals = array($_GET, $_POST, $_COOKIE);
function demagic(&$v, $k) { $v = strip_escape_custom($v); }
array_walk_recursive($gpc_superglobals, demagic);

stephen-smith wrote on Thursday, June 03, 2010:

function demagic(&$v, $k) { $v = strip_escape_custom($v); }
array_walk_recursive($gpc_superglobals, demagic);

Actually, that’s really inefficient 'cause it make lots of calls to get_magic_quotes_gpc(). (Also it uses a bareword -> BAD!).  Something like this is probably better:
if (get_magic_quotes_gpc()) {
$gpc_superglobals = array($_GET, $_POST, $_COOKIE);
function ssip(&$v, $k) { $v = stripslashes($v); }
array_walk_recursive($gpc_superglobals, ‘ssip’);
}

bradymiller wrote on Thursday, June 03, 2010:

hey,

So if we do a complete code walk-through to institute acmoore’s stuff, then perhaps we should also incorporate stephen-smith’s above superfunction (assuming it tests well). That would be very cool.

-brady

bradymiller wrote on Thursday, June 03, 2010:

Only thing to add to above function is to abstract the call to get_magic_quotes_gpc() to a central fucntion, since can’t use it in php6.

acmoore wrote on Friday, June 04, 2010:

I didn’t consider magic_quotes. This is the first time I’ve ever heard of them. What an odd feature!

Perhaps the thing to do is to stripslashes() on all elements of the $bind arrays inside the sql.inc functions if magic_quotes is on. I’m not sure I totally get it, though.

-Andy

bradymiller wrote on Friday, June 04, 2010:

Stephen,

Regarding your above point, this won’t have any effect on the demos or appliances. The demos and appliances follow this flow:
1) Turn on
2) Download code from SF cvs (via a mandriva startup script)
3) Copy the downloaded code to web directory
4) Run following script: openemr/contrib/util/installScripts/cvsDemoInstall

When I have time, I’ll post the mandriva startup and shutdown script to cvs
(good idea to have them as transparent as possible).

Regarding git, one big issue I’ve encountered so far is that it takes way to long to download the repository (downloading the code over cvs from SF is actually very fast)

-brady