He had the following comments and questions for xss prevention in the tracker:
This patch escapes the facilities names to prevent cross-site
scripting attacks on various pages throughout the site.
There is still one place where the facility name is not escaped on the
calendar page, but I haven’t figured that out yet. There are also tons
of other database fields that come from user input that are not being
escaped.
I have a couple of questions about this type of work:
1) should I be using htmlspecialchars() like I am, or htmlentities()?
2) The common code that generates the facility drop-down should probably be extracted to a common location. I didn’t see a logical place for it, though. Is it desired that this code be duplicated as it is, or would it make sense to make a set of shared pieces of code for this kind of thing. Does that go in the “include” directory?
We should make a standard, which can then put on the wiki. Stephen, I note your using the htmlspecialchars with the ENT_NOQUOTES flag. Should we use that as our standard?
Here’s the type of change I was thinking about for #2. If this looks like a reasonable change, then I can actually submit it as a patch for code review. Since I’m relatively new to this project and somewhat new to PHP, I’m not sure what types of standards make sense for factoring out common code like this. http://gist.github.com/421036
The only difference is for characters that are outside of ASCII. htmlentities() encodes some punctuation, mathematical symbols and Greek letters, and the “extended ASCII” character. htmlspecialchars() just does the minimum required to ensure the string is treated as text and not as markup.
htmlspecialchars() makes the resulting HTML easier to read (if you are already using a Unicode viewer), performs fewer changes to the source string, and returns shorter strings.
htmlentities() is shorter to type, and makes the HTML easier to read (if you are using a ASCII-only viewer).
For “escaping” a string into HTML, htmlspecialchars() is more appropriate.
I use ENT_NOQUOTES for stuff that is not an attribute value, but it’s optional - if you don’t specify it, your double-quotes become “"”. I specify it to make the resulting source easier on my eyes and fewer bytes long.
I use ENT_QUOTES for stuff that IS an attribute value, and it’s not entirely optional - if you don’t specify it, htmlspecialchars() won’t do anything to single-quote characters, which means you should use double-quote characters to surround your attribute values.
I don’t think we need to standardize on the second argument. If we have too, ENT_QUOTES is certainly desirable sometimes, and it is suitable anytime; it just “escapes” more than required.
acmoore,
Noted you used htmlentities in one of your patches. I thought we were gonna stick with htmlspecialchars(). This seems more
appropriate since in the php documentation it appears htmlentities may do unwanted things to some of the special utf8 character
sets (ie. greek).
-brady
Regarding 2 above, I’d put the main function in library/options.inc.php at the top, and place a comment saying plan to convert it to a data type. This would be a good learning exercies of how the layout stuff works. Would have a new datatype called Facility which would be number 29 (see line 34 of openemr/interface/super/edit_layout.php for list of datatypes). Then can put the guts of your function into these layotu functions in library/options.inc.php and presto, you’ve created another widget that can be used in form and layout builder. Even if you don’t do this, still a good idea to put your function in this file as a reminder for somebody to do it.
-brady
REFORMATTED PREVIOUS MESSAGE WITH LINE BREAKS
hey,
Regarding 2 above, I’d put the main function in library/options.inc.php at the top, and place a comment saying plan to convert it
to a data type. This would be a good learning exercies of how the layout stuff works. Would have a new datatype called Facility
which would be number 29 (see line 34 of openemr/interface/super/edit_layout.php for list of datatypes). Then can put the guts
of your function into these layotu functions in library/options.inc.php and presto, you’ve created another widget that can be used
in form and layout builder. Even if you don’t do this, still a good idea to put your function in this file as a reminder for somebody
to do it.
-brady
Will do. That sounds reasonable. Thanks for helping me fit in to the standards and practices already in place. I hate feeling
like I’m barging into someone’s living room and re-arranging the furniture when all I want to do is help a little bit.
Andy,
Regarding 2 above, after you migrate the function to options.inc.php, if it tests fine for you, then feel free to commit this to the cvs on SF (we commit to the development tip).
thanks,
-brady
I had extracted that code into library/patient.inc. Perhaps it does make more sense to have it in options.inc.php, though I don’t really see how to make use of any of the functions already there. I’ll simply move the function over.
hey,
Just put in the comment of the function in options.inc.php that plan to make it a data type. Once somebody makes it a data type, then can be called via your function directly and also called by the layout engine, which is a powerful openemr customization tool.
-brady
I split that patch into two pieces: one with the code and one with the tests. I’ll reserve the one with the tests until the test suite is mature enough to handle it gracefully. I’ll apply the one with the actual code soon unless there are objections.
Thanks,
-Andy
as we’ve been discussing code review procedures, i popped this thread open, to review the code, and can find no tracker link pointing to the patch in question.
andy, i would expect a “i’m committing this if no-one screams” to have a pointer to exactly what this is. could you point me to the tracker item so i can review it?
(i’ll start digging through the tracker momentarily. just wanted to scream early “wait! wait! i’ll review it!”)
Finally got some time to look over the patch in some detail so here’s a quick review. I didn’t post it to the tracker item, because I don’t think BBCode is allowed for the tracker items.
In dropdown_facility():
* You aren’t using the $name parameter. I believe it is supposed to be used as the value of the name attribute on the select element. You’ll need to use htmlspecialchars on it, probably with the ENT_QUOTES option.
* The HTML 4 Specification recommends that we always have at least one value pre-selected. Your code does not always generate a selected attribute, and it should.
* When you generate the selected attribute, you are using attribute minimization, which isn’t supported in XML, making you document valid HTML 4 but invalid XHTML 1.
* You output $facid without using htmlspecialchars(). That’s safe with the current database schema, but using htmlspecialchars() is “always” safe.
* You aren’t outputting end tags for the option elements, and they aren’t EMPTY in the DTD, so you must use end tags.
* Stylisticly, I think using “True” looks a bit odd. I prefer it in a single case (i.e. “TRUE” or “true”), but PHP doesn’t care and we don’t have a policy on it, so handle it as you will.
Thanks for your help here, Stephen. I think I agree with you on all of these points, and can address them. Many of them seem to be carry-overs from the code I was extracting, and I’m glad to see that it’s OK to “improve” code at the same time as I’m moving it around.
You will probably catch me making a zillion HTML errors, so please don’t let up on those.
The “True” thing is probably from writing python. I’ll work harder to speak in my PHP accent.
Code improvements are always welcome as long as they are tested and compatible with existing code. When refactoring code, it is a great time to improve it, since it “combines” two times patches that are being maintained outside our control might need to be updated manually. While we should generally focus on user-visible issues, that shouldn’t stop us from trying to produce the best code we can.
Stylistic improvements (e.g. reflowing the code without changing it, or altering the name of a variable without how it is used, etc.) should be limited to lines that you are already touching with a patch. They can make patches outside our control break for no good reason.
The two point above focus on patches outside our control, but they also become internally important if we start leveraging branching in the code base. Those two guidelines make it easier to merge branches in the same way that it makes it easier for someone maintaining external patches merge in our CVS HEAD.
Andy, I’ve appreciative of you work on OpenEMR so far and sorry that I haven’t been able to work on any patches, but $DAYJOB has me working on something a bit tangential to OpenEMR right now. (Migrating data from other formats to OpenEMR.)