Near As I Can Figure

Wednesday, August 4, 2010

Sanitizing PHP Input for MySQL

There are several ways to sanitize form input for database entry. PHP has its magic_quotes_gpc option that, when on, effectively performs addslashes() on GET, POST, and COOKIE data. That is, single and double quotes, backslashes, and NULLs are automatically escaped with backslashes. magic_quotes_gpc is on by default, so be careful not to addslashes() or you'll get extra backslashes on everything. A good way of checking for whether or not magic_quotes_gpc is on is by using get_magic_quotes_gpc().

magic_quotes_runtime is another variable that, when on, ensures that most functions that return data from external sources, including databases and text files, will have quotes escaped with a backslash. These functions include MSSQL and MySQLi functions, but not MySQL.

The third magic_quotes variable is magic_quotes_sybase. This works with magic_quotes_gpc and magic_quotes_runtime. If on a single-quote is escaped with a single-quote instead of a backslash, but only if magic_quotes_gpc or magic_quotes_runtime are enabled.

Now, after telling you all this, I'll say very clearly: do not use magic_quotes. The methodology behind using them is broken and PHP has reflected this in deprecating them for PHP 6. There is no point in escaping input, especially by default, when the data target isn't known.

The better way to handle input is by using the correct function for wherever you want to use the data. So use mysql_real_escape_string() for MySQL input, for example, urlencode() if you mean to use the data as a URL, htmlentities() if it'll be output on an HTML page, and so on. And remember to check if magic_quotes_gpc was on and to stripslashes() if it was. This will return the data to how the user input it, so it'll be potentially dangerous until handled another way, a more appropriate way.

Saturday, June 26, 2010

Handling User Time Zones

I've been working on a project where I need to record the dates and times when users perform certain actions. This wouldn't be so difficult if it weren't for the fact that the users themselves need to be able to see these times, and ideally, relative to their local time zones. For my purposes, the server time is adequate for keeping track, but it is both inconvenient to users and unfair to expect them to calculate their local time based on the server's. I'm sure most people on the Internet eventually become used to the fact that lots of sites and servers are set to not only US time, but EST/EDT in particular, but I'd like to offer the option to choose their own time zones.

When I first realized I wanted to implement this, I didn't think it entirely through, and assumed that it would only involve

adding a drop-down box to the sign-up/settings form,
some kind of easy identifier, e.g. an integer indicating the difference from GMT, as a field to the user information table, and
some extra code to look up time zones when displaying information.

Manageable, right?

Before I even make the drop-down box, I have to decide which time zones I'll work with. First, since I'm using PHP/MySQL, let's have a look at PHP's supported time zones. Far too many to throw at the user and the names aren't very intuitive, but they do have the benefit of PHP's inbuilt DateTime functions. The list Windows uses in its date and time settings is much shorter, and its more descriptive names can be mapped one to one with PHP's. So I can use the DateTime functions after all, though now I have to store the mapping somewhere.

Storing the mappings in an array in PHP would mean I lose the power and point of using databases but if I store it in the database, that means I'll have to query it just to display a sign-up form. Which I happen to have on the default index page, when users aren't logged in, so it will be shown every time somebody visits the homepage, more than the usual sign-up page. And because I want to be a little fancy, I intend on displaying the current local time along with each time zone name in the drop-down box. This takes away the option of caching the index page, and will force, at the least, the PHP drop-down box creation each time.

If I do decide on storing it in MySQL, it would probably make the most sense to have a lookup table with the PHP format, the Windows name that the users will see, and a unique ID I can just slot into the user information table.

But before I can confirm that I want to go this route, I have to decide if I really need to use PHP's DateTime functions. If I use just a simple number to show the difference from GMT along with the Windows name and store that in the user information table, I can avoid so much of the hassle, EXCEPT that I'll have to deal with DST myself, which is a hassle all on its own. Every time there's a change in the timezones, I'll need to update the calculating functions. PHP's DateTime will let me avoid much of that, though I will still need to sort the drop-down box display, of course, with varying DSTs moving some places in front of others depending on the season.

It's a tossup whether it's better to use a PHP array to store the time zones or MySQL, for they both have their benefits and disadvantages. With PHP, there is a loss in consistency with some fields still being stored in MySQL, increasing the possibility of changes to either causing problems later. It also weakens the power available from a full database for any manipulations, which, unlikely as seems now, could be an issue later.

When I store the actual date and time of a transaction, I will use the server time. The options when displaying to the user are either to convert from server time to user local time for each record or to store both server AND local times. The tradeoff is between speed and time, and I think in this case, I will sacrifice the space.

As I've describe each issue in this article, I've making one option more appealing, not because it is the best choice in every circumstance but because I think it suits the situation in my case and is what I would do.

So, I will store the 75 Windows time zones and their PHP mappings in my database and query each time for the sign-up form. Hopefully, the writing of this has clarified my plan in implementing this and I will be able to proceed with actually getting it to work.

Edited June 2, 2010:
Implementation for above.

timezones.txt is a mapping of the 75 "common" Windows time zones to their PHP DateTime objects. Note that each pair is separated by a pipe ("|"), "&amp" is used instead of "&" and that there is a blank line at the end. The second is a matter of choice, but the first and last are needed for the script to work.

To create the table:
mysql_query("CREATE TABLE IF NOT EXISTS `timezones` ( `tzid` INT NOT NULL AUTO_INCREMENT , `phptz` VARCHAR(30) NOT NULL , `timezone` VARCHAR(60) NOT NULL , PRIMARY KEY (`tzid`))");
phptz is the PHP name, timezone is the Windows name, and tzid is simply the ID for each mapping. tzid is what you should use for related tables and as the shared attribute for joins for looking up time zones.

PHP code to populate the above table is given below. Note that the timezones.txt file needs to be in the same directory as this code, though of course that can be altered.
$timezonefile = file_get_contents("timezones.txt"); preg_match_all("/(.*?)\|(.*?)\s\\n/", $timezonefile, $timezones); $count = count($timezones[0]); for ($i = 0; $i < $count; $i++) { mysql_query( "INSERT INTO timezones (phptz, timezone) VALUES (\"{$timezones[1][$i]}\", \"{$timezones[2][$i]}\")"); }

The function to generate the <select> code is as follows. Its usage is fairly straightforward, with a typical call being displaytimezones("Select your local time zone", "timezone"). There is also a parameter to choose the selected time zone which, if not given, is assumed to be America's Eastern time. The option values correspond to the tzid fields from the timezones database table.
function displaytimezones($label, $name, $selected = "Eastern Time $US & Canada$") { global $database; $timezones = $database->query("SELECT * FROM timezones"); while ($tzrow = mysql_fetch_array($timezones)) { $ndtz = new DateTimeZone($tzrow['phptz']); $currentDate = new DateTime("now", $ndtz); $displaytzid[] = $tzrow['tzid']; $displaytznow[] = $currentDate->format("j M Y (g:ia)"); $displaytzname[] = $tzrow['timezone']; $displaytzoffset[] = -1*($ndtz->getOffset($currentDate)); } array_multisort($displaytzoffset, $displaytzname, $displaytzid, $displaytznow); $count = count($displaytzoffset); $display = " $label\n"; $display = $display." <select name="\" id="\">\n"; for ($i = 0; $i < $count; $i++) { $display = $display." \n"; } $display = $display." </select>\n "; $display = preg_replace("/

A sample of the drop-down box can be seen here.

Wednesday, June 23, 2010

MVC and PAC

Most software applications can be theoretically divided into three sections. The front-end, which is what the user sees and interacts with; the back-end, the server and possible database and all the dirty details; and then the parts that connect the two. There are different ways in how designers approach these, sometimes separating how each are handled, or even defined, and sometimes combining parts.

One well-known architectural pattern is Model-View-Controller. As the name suggests, the three parts are the model, the view, and the controller. Very simply put, the model controls the data, the view what the user sees, and the controller controls the complex interaction between the two. I put in the word "complex" to delineate between interactions that, while relatively simple to us, involve changing data in the model. This is because when the view simply wants to extract data for eventually presenting to the user, it can access the model directly. Thus, there is some overlap between what we might consider the front- and middle-ends.

Another popular pattern is Presentation-Abstraction-Control, used in, among others, Drupal. In PAC, the three parts can be considered analogous in name to the three in MVC, with presentation => view, control => controller, and model => abstraction, but the way in which they behave is different. The presentation is strictly for viewing, and any actual input-handling or the extraction of data for output is handled by control. The three sections are much more isolated and independent in PAC than in MVC. Another difference between the two is that there can be separate layers of PAC within an application. For example, in a complex system that does more than the simple get data->display data->get input->store input, there will be at least several different subsystems involved. Each gets its own isolated PAC components, which are in turn isolated from each other. The control for each PAC layer is what accesses and interacts with the other PAC layers.

It should be noted that often, in the case of these two patterns, a hybrid may be used. Often, an entirely different architecture is utilized.

Further reading:
MVC vs PAC - Garfield Tech
Garfixia Software Architectures

Saturday, June 5, 2010

Link Shell Extension on Vista

I like to organize my files in as intuitive and thorough a way as possible. This occasionally requires choosing between two different folders to place them in. For example, if I sort my ebooks by author, and I come across one on which multiple authors have worked, my options with Windows Vista, as far as I knew, were to
a) Make duplicate copies and put one in each author's folder
b) Put the actual ebook in one author's folder and then make a shortcut to the file and put that in the other authors' folders.

The first option seems terribly messy and feels like only a temporary solution. Depending on the filesizes, and the number of duplicates, this could quickly grow unwieldy. The second option is marginally better but it presumes the files will never be moved or copied elsewhere, so again, temporary.

The ideal solution would be able to link to the same file from all the folders. And as I mentioned before, I thought this was impossible in Windows, though observed in other operating systems, like *Nix. However, this is where symlinks come in.

Whereas you were forced to use third-party software to create links in Windows XP, Vista has added command line support but it's just awkward enough to make those external programs still useful. Of the ones around, the most useful seems to be the Link Shell Extension. Note that while LSE works on most Windows flavors, it is made for NTFS drives only.

Linking Files
When linking only a file, you have the option of creating a hardlink or a symlink. Hardlinks allow the file system to treat the created hardlink as the actual file, as if there were two filenames pointing to the same file object. If you copy a hardlink and paste elsewhere, you'll be pasting the actual file. Cutting and pasting will move just the hardlink, and deleting just deletes the hardlink. An important feature to note about hardlinks is that because they're multiple links to the same file on a drive, they are limited to just that drive. Thus, they can't be moved to another drive.

Symlinks will show up on the filesystem as links to the file, but not as the actual file itself. However, as was the case with hardlinks, copying and pasting will create another copy of the actual file and cut-paste will move the symlink itself. Deleting the symlink only has an effect on that symlink.

Linking to folders instead of files requires the use of junctions. Like symlinks, junctions simply link to the same folder. Anything you do, including the deletion of files within the junction, will result in the same action being performed on the files in the source folder.

Hardlink clones create folders with hardlinks pointing to the files in the original folder with the same single volume limitations of hardlinks. Deletion of hardlinks will not affect the original files. Similarly symlink clones create folders with symlinks pointing to the files in the original folder.

Smart copy of folders allows the copying and retaining of hardlinks.

I will play around with these further to see what's the best for sorting.