Web Programming Step by Step, 2nd Edition
Chapter 15: Web Security
Except where otherwise noted, the contents of this document are
Copyright 2012 Marty Stepp, Jessica Miller, and Victoria Kirst.
All rights reserved.
Any redistribution, reproduction, transmission, or storage of part
or all of the contents in any form is prohibited without the author's
expressed written permission.
15.1: Security Principles
-
15.1: Security Principles
-
15.2: Cross-Site Scripting (XSS)
-
15.3: Validating Input Data
-
15.4: SQL Injection
-
15.5: Session-Based Attacks
Our current view of security
- until now, we have assumed:
- valid user input
- non-malicious users
- nothing will ever go wrong
- this is unrealistic!
The real world
- in order to write secure code, we must assume:
- invalid input
- evil users
- incompetent users
- everything that can go wrong, will go wrong
- everybody is out to get you
- botnets, hackers, script kiddies, KGB, etc. are out there
- assume nothing; trust no one
Attackers' goals
Why would an attacker target my site?
- Read private data (user names, passwords, credit card numbers, grades, prices)
- Change data (change a student's grades, prices of products, passwords)
- Spoofing (pretending to be someone they are not)
- Damage or shut down the site, so that it cannot be successfully used by others
- Harm the reputation or credibility of the organization running the site
- Spread viruses and other malware
Tools that attackers use
Assume that the attacker knows about web dev and has the same tools you have:
-
-
extensions e.g.
-
, e.g.
- network sniffers, e.g.
,
,
Some kinds of attacks
- Denial of Service (DoS): Making a server unavailable by bombarding it with requests.
- Social Engineering: Tricking a user into willingly compromising the security of a site (e.g. phishing).
- Privilege Escalation: Causing code to run as a "privileged" context (e.g. "root").
- Information Leakage: Allowing an attacker to look at data, files, etc. that he/she should not be allowed to see.
- Man-in-the-Middle: Placing a malicious machine in the network and using it to intercept traffic.
- Session Hijacking: Stealing another user's session cookie to masquerade as that user.
- Cross-Site Scripting (XSS) or HTML Injection: Inserting malicious HTML or JavaScript content into a web page.
- SQL Injection: Inserting malicious SQL query code to reveal or modify sensitive data.
Information leakage
when the attacker can look at data, files, etc. that he/she should not be allowed to see
-
files on web server that should not be there
- or have too generous of permissions (read/write to all)
-
directories that list their contents (indexing)
- can be disabled on web server
-
guess the names of files, directories, resources
- see
loginfail.php
, try loginsuccess.php
- see
user.php?id=123
, try user.php?id=456
- see
/data/public
, try /data/private
15.2: Cross-Site Scripting (XSS)
-
15.1: Security Principles
-
15.2: Cross-Site Scripting (XSS)
-
15.3: Validating Input Data
-
15.4: SQL Injection
-
15.5: Session-Based Attacks
HTML injection
a flaw where a user is able to inject arbitrary HTML content into your page
-
This flaw often exists when a page accepts user input and inserts it bare into the page.
-
example: magic 8-ball
-
What kinds of silly or malicious content can we inject into the page? Why is this bad?
Injecting HTML content
8ball.php?question=<em>lololol</em>
- injected content can lead to:
- annoyance / confusion
- damage to data on the server
- exposure of private data on the server
- financial gain/loss
- end of the human race as we know it
- why is HTML injection bad? It allows others to:
- disrupt the flow/layout of your site
- put words into your mouth
- possibly run malicious code on your users' computers
Cross-site scripting (XSS)
a flaw where a user is able to inject and execute arbitrary JavaScript code in your page
8ball.php?question=<script type='text/javascript'>alert('pwned');</script>
- JavaScript is often able to be injected because of a previous HTML injection
-
Try submitting this as the 8-ball's question in Firefox:
-
<script type="text/javascript" src="http://panzi.github.com/Browser-Ponies/basecfg.js" id="browser-ponies-config"></script><script type="text/javascript" src="http://panzi.github.com/Browser-Ponies/browserponies.js" id="browser-ponies-script"></script><script type="text/javascript">/* <![CDATA[ */ (function (cfg) {BrowserPonies.setBaseUrl(cfg.baseurl);BrowserPonies.loadConfig(BrowserPoniesBaseConfig);BrowserPonies.loadConfig(cfg);})({"baseurl":"http://panzi.github.com/Browser-Ponies/","fadeDuration":500,"volume":1,"fps":25,"speed":3,"audioEnabled":false,"showFps":false,"showLoadProgress":true,"speakProbability":0.1,"spawn":{"applejack":1,"fluttershy":1,"pinkie pie":1,"rainbow dash":1,"rarity":1,"twilight sparkle":1},"autostart":true}); /* ]]> */</script>
- injected script code can:
- masquerade as the original page and trick the user into entering sensitive data
- steal the user's cookies
- masquerade as the user and submit data on their behalf (submit forms, click buttons, etc.)
- ...
Another XSS example
-
example: Buy-a-Grade Form
-
Recall that the user submits his name, section, and credit card number to the server,
which are then displayed on the page.
-
How can we inject HTML/JavaScript into the page? Why is this bad?
-
What could we do to steal the user's sensitive information?
Securing against HTML injection / XSS
- one idea: disallow harmful characters
- HTML injection is impossible without < >
- can strip those characters from input, or reject the entire request if they are present
- another idea: allow them, but escape them
$text = "<p>hi 2 u & me</p>";
$text = htmlspecialchars($text);
15.3: Validating Input Data
-
15.1: Security Principles
-
15.2: Cross-Site Scripting (XSS)
-
15.3: Validating Input Data
-
15.4: SQL Injection
-
15.5: Session-Based Attacks
What is form validation?
- validation: ensuring that form's values are correct
- some types of validation:
- preventing blank values (email address)
- ensuring the type of values
- integer, real number, currency, phone number, Social Security number, postal address, email address, date, credit card number, ...
- ensuring the format and range of values (ZIP code must be a 5-digit integer)
- ensuring that values fit together (user types email twice, and the two must match)
A real form that uses validation
Client vs. server-side validation
Validation can be performed:
- client-side (before the form is submitted)
- can lead to a better user experience, but not secure (why not?)
- server-side (in PHP code, after the form is submitted)
- needed for truly secure validation, but slower
- both
- best mix of convenience and security, but requires most effort to program
An example form to be validated
<form action="http://foo.com/foo.php" method="get">
<div>
City: <input name="city" /> <br />
State: <input name="state" size="2" maxlength="2" /> <br />
ZIP: <input name="zip" size="5" maxlength="5" /> <br />
<input type="submit" />
</div>
</form>
- Let's validate this form's data on the server...
Basic server-side validation code
$city = $_REQUEST["city"];
$state = $_REQUEST["state"];
$zip = $_REQUEST["zip"];
if (!$city || strlen($state) != 2 || strlen($zip) != 5) {
print "Error, invalid city/state/zip submitted.";
}
- basic idea: examine parameter values, and if they are bad, show an error message and abort. But:
- How do you test for integers vs. real numbers vs. strings?
- How do you test for a valid credit card number?
- How do you test that a person's name has a middle initial?
- (How do you test whether a given string matches a particular complex format?)
Regular expressions
/^[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}$/
- regular expression ("regex"): a description of a pattern of text
- can test whether a string matches the expression's pattern
- can use a regex to search/replace characters in a string
- regular expressions are extremely powerful but tough to read
(the above regular expression matches email addresses)
- regular expressions occur in many places:
- Java:
Scanner
, String
's split
method (CSE 143 sentence generator)
- supported by PHP, JavaScript, and other languages
- many text editors (TextPad) allow regexes in search/replace
- regex syntax: strings that begin and end with
/
, such as "/[AEIOU]+/"
function |
description |
preg_match(regex, string)
|
returns TRUE if string matches regex
|
preg_replace(regex, replacement, string)
|
returns a new string with all substrings that match regex replaced by replacement
|
preg_split(regex, string)
|
returns an array of strings from given string broken apart using given regex as delimiter (like explode but more powerful)
|
PHP form validation w/ regexes
$state = $_REQUEST["state"];
if (!preg_match("/^[A-Z]{2}$/", $state)) {
print "Error, invalid state submitted.";
}
preg_match
and regexes help you to validate parameters
- sites often don't want to give a descriptive error message here (why?)
Basic regular expressions
/abc/
- in PHP, regexes are strings that begin and end with
/
- the simplest regexes simply match a particular substring
- the above regular expression matches any string containing
"abc"
:
-
YES:
"abc"
,
"abcdef"
,
"defabc"
,
".=.abc.=."
,
...
-
NO:
"fedcba"
,
"ab c"
,
"PHP"
,
...
Wildcards: .
- A dot
.
matches any character except a \n
line break
/.oo.y/
matches
"Doocy"
,
"goofy"
,
"LooNy"
,
...
- A trailing
i
at the end of a regex (after the closing /
) signifies a case-insensitive match
-
/mart/i
matches
"Marty Stepp"
,
"smart fellow"
,
"WALMART"
,
...
Special characters: |
, ()
, \
|
means OR
/abc|def|g/
matches "abc"
, "def"
, or "g"
- There's no AND symbol. Why not?
()
are for grouping
/(Homer|Marge) Simpson/
matches "Homer Simpson"
or "Marge Simpson"
\
starts an escape sequence
- many characters must be escaped to match them literally:
/ \ $ . [ ] ( ) ^ * + ?
/<br \/>/
matches lines containing <br />
tags
Quantifiers: *
, +
, ?
*
means 0 or more occurrences
/abc*/
matches "ab"
, "abc"
, "abcc"
, "abccc"
, ...
/a(bc)*/
matches "a"
, "abc"
, "abcbc"
, "abcbcbc"
, ...
/a.*a/
matches "aa"
, "aba"
, "a8qa"
, "a!?xyz__9a"
, ...
+
means 1 or more occurrences
/a(bc)+/
matches "abc"
, "abcbc"
, "abcbcbc"
, ...
/Goo+gle/
matches "Google"
, "Gooogle"
, "Goooogle"
, ...
?
means 0 or 1 occurrences
/a(bc)?/
matches "a"
or "abc"
More quantifiers: {min,max}
{min,max}
means between min and max occurrences (inclusive)
/a(bc){2,4}/
matches "abcbc"
, "abcbcbc"
, or "abcbcbcbc"
- min or max may be omitted to specify any number
{2,}
means 2 or more
{,6}
means up to 6
{3}
means exactly 3
Anchors: ^
and $
^
represents the beginning of the string or line;
$
represents the end
-
/Jess/
matches all strings that contain Jess
;
/^Jess/
matches all strings that start with Jess
;
/Jess$/
matches all strings that end with Jess
;
/^Jess$/
matches the exact string "Jess"
only
-
/^Mart.*Stepp$/
matches "MartStepp"
, "Marty Stepp"
, "Martin D Stepp"
, ...
but NOT "Marty Stepp stinks"
or "I H8 Martin Stepp"
-
(on the other slides, when we say,
/PATTERN/
matches "text"
, we really mean that it matches any string that contains that text)
Character sets: []
-
[]
group characters into a character set; will match any single character from the set
/[bcd]art/
matches strings containing "bart"
, "cart"
, and "dart"
- equivalent to
/(b|c|d)art/
but shorter
- inside
[]
, many of the modifier keys act as normal characters
/what[!*?]*/
matches "what"
, "what!"
, "what?**!"
, "what??!"
, ...
- What regular expression matches DNA (strings of A, C, G, or T)?
Character ranges: [start-end]
- inside a character set, specify a range of characters with
-
/[a-z]/
matches any lowercase letter
/[a-zA-Z0-9]/
matches any lower- or uppercase letter or digit
- an initial
^
inside a character set negates it
/[^abcd]/
matches any character other than a, b, c, or d
- inside a character set,
-
must be escaped to be matched
/[+\-]?[0-9]+/
matches an optional +
or -
, followed by at least one digit
- What regular expression matches letter grades such as A, B+, or D- ?
Escape sequences
- special escape sequence character sets:
-
\d
matches any digit (same as [0-9]
);
\D
any non-digit ([^0-9]
)
-
\w
matches any word character
(same as [a-zA-Z_0-9]
);
\W
any non-word char
-
\s
matches any whitespace character ( , \t
, \n
, etc.);
\S
any non-whitespace
- What regular expression matches dollar amounts of at least $100.00 ?
Regular expression PHP example
$str = "the quick brown fox";
$str = preg_replace("/[aeiou]/", "*", $str);
$words = preg_split("/[ ]+/", $str);
for ($i = 0; $i < count($words); $i++) {
if (preg_match("/\\*{2,}/", $words[$i])) {
$words[$i] = strtoupper($words[$i]);
}
}
- notice how
\
must be escaped to \\
Regular expressions in JavaScript
string.match(regex)
- if string fits the pattern, returns the matching text; else returns
null
- can be used as a Boolean truthy/falsey test:
var name = $("name").value;
if (name.match(/[a-z]+/)) { ... }
- an
i
can be placed after the regex for a case-insensitive match
name.match(/Marty/i)
will match "marty"
, "MaRtY"
, ...
Replacing text with regular expressions
string.replace(regex, "text")
- replaces the first occurrence of given pattern with the given text
var str = "Marty Stepp";
str.replace(/[a-z]/, "x")
returns "Mxrty Stepp"
- returns the modified string as its result; must be stored
str = str.replace(/[a-z]/, "x")
- a
g
can be placed after the regex for a global match (replace all occurrences)
str.replace(/[a-z]/g, "x")
returns "Mxxxx Sxxxx"
- replace with empty string to use a regex as a filter
str = str.replace(/[^A-Z]+/g, "")
turns str
into "MS"
15.4: SQL Injection
-
15.1: Security Principles
-
15.2: Cross-Site Scripting (XSS)
-
15.3: Validating Input Data
-
15.4: SQL Injection
-
15.5: Session-Based Attacks
SQL injection
a flaw where the user is able to inject arbitrary SQL into your query
-
This flaw often exists when a page accepts user input and inserts it bare into the query.
-
example: simpsons grade lookup
-
What kinds of SQL can we inject into the query? Why is this bad?
Too true...
- injected SQL can:
- change the query to output others' data (revealing private information)
- insert a query to modify existing data (increase bank account balance)
- delete existing data (
; DROP TABLE students; --
)
- bloat the query to slow down the server (
JOIN a JOIN b JOIN c ...
)
- ...
Securing against SQL injection
- similar to securing against HTML injection, escape the string before you include it in your query
quote
|
returns a SQL-escaped version of a string
|
$username = $db->quote($_POST["username"]);
$password = $db->quote($_POST["password"]);
$query = "SELECT name, ssn, dob FROM users
WHERE username = $username AND password = $password";
- replaces
'
with \'
, etc., and surrounds with quotes
15.5: Session-Based Attacks
-
15.1: Security Principles
-
15.2: Cross-Site Scripting (XSS)
-
15.3: Validating Input Data
-
15.4: SQL Injection
-
15.5: Session-Based Attacks
Man-in-the-middle attack
when the attacker listens on your network and reads and/or modifies your data
-
works if attacker can access and compromise any server/router between you and your server
-
also works if you are on the same local area network as the attacker
-
often, the attacker still sends your info back and forth to/from the real server, but he silently logs or modifies some of it along the way to his own benefit
-
e.g. listens for you to send your user name / password / credit card number / ...
Secure HTTP (HTTPS)
-
: encrypted version of HTTP protocol
-
all messages between client and server are encrypted so men in the middle cannot easily read them
-
servers can have certificates that verify their identity
Session hijacking
when the attacker gets a hold of your session ID and masquerades as you
-
exploit sites that use HTTPS for only the initial login:
- HTTPS: browser → server (POST login.php)
- HTTPS: browser ← server (login.php + PHPSESSID cookie)
- HTTP: browser → server (GET whatever.php + PHPSESSID cookie)
- HTTP: browser ← server (whatever.php + PHPSESSID cookie)
- attacker can listen to the network, get your session ID cookie, and make requests to the same server with that same session ID cookie to masquerade as you!
- example: