Web Programming Step by Step, 2nd Edition

Chapter 15: Web Security

Except where otherwise noted, the contents of this document are Copyright 2012 Marty Stepp, Jessica Miller, and Victoria Kirst. All rights reserved. Any redistribution, reproduction, transmission, or storage of part or all of the contents in any form is prohibited without the author's expressed written permission.

15.1: Security Principles

15.1: Security Principles
15.2: Cross-Site Scripting (XSS)
15.3: Validating Input Data
15.4: SQL Injection
15.5: Session-Based Attacks

Our current view of security

until now, we have assumed:
- valid user input
- non-malicious users
- nothing will ever go wrong
this is unrealistic!

The real world

in order to write secure code, we must assume:
- invalid input
- evil users
- incompetent users
- everything that can go wrong, will go wrong
- everybody is out to get you
- botnets, hackers, script kiddies, KGB, etc. are out there
assume nothing; trust no one

Attackers' goals

Why would an attacker target my site?

Read private data (user names, passwords, credit card numbers, grades, prices)
Change data (change a student's grades, prices of products, passwords)
Spoofing (pretending to be someone they are not)
Damage or shut down the site, so that it cannot be successfully used by others
Harm the reputation or credibility of the organization running the site
Spread viruses and other malware

Tools that attackers use

Assume that the attacker knows about web dev and has the same tools you have:

Firebug
extensions e.g. Web Dev Toolbar
port scanners, e.g. nmap
network sniffers, e.g. Wireshark, EtherDetect, Firesheep

Some kinds of attacks

Denial of Service (DoS): Making a server unavailable by bombarding it with requests.
Social Engineering: Tricking a user into willingly compromising the security of a site (e.g. phishing).
Privilege Escalation: Causing code to run as a "privileged" context (e.g. "root").
Information Leakage: Allowing an attacker to look at data, files, etc. that he/she should not be allowed to see.
Man-in-the-Middle: Placing a malicious machine in the network and using it to intercept traffic.
Session Hijacking: Stealing another user's session cookie to masquerade as that user.
Cross-Site Scripting (XSS) or HTML Injection: Inserting malicious HTML or JavaScript content into a web page.
SQL Injection: Inserting malicious SQL query code to reveal or modify sensitive data.

Information leakage

when the attacker can look at data, files, etc. that he/she should not be allowed to see

files on web server that should not be there
- or have too generous of permissions (read/write to all)
directories that list their contents (indexing)
- can be disabled on web server
guess the names of files, directories, resources
- see loginfail.php, try loginsuccess.php
- see user.php?id=123, try user.php?id=456
- see /data/public, try /data/private

15.2: Cross-Site Scripting (XSS)

15.1: Security Principles
15.2: Cross-Site Scripting (XSS)
15.3: Validating Input Data
15.4: SQL Injection
15.5: Session-Based Attacks

HTML injection

a flaw where a user is able to inject arbitrary HTML content into your page

This flaw often exists when a page accepts user input and inserts it bare into the page.
example: magic 8-ball
What kinds of silly or malicious content can we inject into the page? Why is this bad?

Injecting HTML content

8ball.php?question=<em>lololol</em>

injected content can lead to:
- annoyance / confusion
- damage to data on the server
- exposure of private data on the server
- financial gain/loss
- end of the human race as we know it
why is HTML injection bad? It allows others to:
- disrupt the flow/layout of your site
- put words into your mouth
- possibly run malicious code on your users' computers

Cross-site scripting (XSS)

a flaw where a user is able to inject and execute arbitrary JavaScript code in your page

8ball.php?question=<script type='text/javascript'>alert('pwned');</script>

JavaScript is often able to be injected because of a previous HTML injection
Try submitting this as the 8-ball's question in Firefox:
- <script type="text/javascript" src="http://panzi.github.com/Browser-Ponies/basecfg.js" id="browser-ponies-config"></script><script type="text/javascript" src="http://panzi.github.com/Browser-Ponies/browserponies.js" id="browser-ponies-script"></script><script type="text/javascript">/* <![CDATA[ */ (function (cfg) {BrowserPonies.setBaseUrl(cfg.baseurl);BrowserPonies.loadConfig(BrowserPoniesBaseConfig);BrowserPonies.loadConfig(cfg);})({"baseurl":"http://panzi.github.com/Browser-Ponies/","fadeDuration":500,"volume":1,"fps":25,"speed":3,"audioEnabled":false,"showFps":false,"showLoadProgress":true,"speakProbability":0.1,"spawn":{"applejack":1,"fluttershy":1,"pinkie pie":1,"rainbow dash":1,"rarity":1,"twilight sparkle":1},"autostart":true}); /* ]]> */</script>
injected script code can:
- masquerade as the original page and trick the user into entering sensitive data
- steal the user's cookies
- masquerade as the user and submit data on their behalf (submit forms, click buttons, etc.)
- ...

Another XSS example

example: Buy-a-Grade Form
Recall that the user submits his name, section, and credit card number to the server, which are then displayed on the page.
How can we inject HTML/JavaScript into the page? Why is this bad?
What could we do to steal the user's sensitive information?

Securing against HTML injection / XSS

one idea: disallow harmful characters
- HTML injection is impossible without < >
- can strip those characters from input, or reject the entire request if they are present
another idea: allow them, but escape them

htmlspecialchars returns an HTML-escaped version of a string

$text = "<p>hi 2 u & me</p>";
$text = htmlspecialchars($text);   # "&lt;p&gt;hi 2 u &amp; me&lt;/p&gt;"

15.3: Validating Input Data

15.1: Security Principles
15.2: Cross-Site Scripting (XSS)
15.3: Validating Input Data
15.4: SQL Injection
15.5: Session-Based Attacks

What is form validation?

validation: ensuring that form's values are correct
some types of validation:
- preventing blank values (email address)
- ensuring the type of values
  - integer, real number, currency, phone number, Social Security number, postal address, email address, date, credit card number, ...
- ensuring the format and range of values (ZIP code must be a 5-digit integer)
- ensuring that values fit together (user types email twice, and the two must match)

A real form that uses validation

Client vs. server-side validation

Validation can be performed:

client-side (before the form is submitted)
- can lead to a better user experience, but not secure (why not?)
server-side (in PHP code, after the form is submitted)
- needed for truly secure validation, but slower
both
- best mix of convenience and security, but requires most effort to program

An example form to be validated

<form action="http://foo.com/foo.php" method="get">
	<div>
		City:  <input name="city" /> <br />
		State: <input name="state" size="2" maxlength="2" /> <br />
		ZIP:   <input name="zip" size="5" maxlength="5" /> <br />
		<input type="submit" />
	</div>
</form>

Let's validate this form's data on the server...

Basic server-side validation code

$city  = $_REQUEST["city"];
$state = $_REQUEST["state"];
$zip   = $_REQUEST["zip"];
if (!$city || strlen($state) != 2 || strlen($zip) != 5) {
	print "Error, invalid city/state/zip submitted.";
}

basic idea: examine parameter values, and if they are bad, show an error message and abort. But:
- How do you test for integers vs. real numbers vs. strings?
- How do you test for a valid credit card number?
- How do you test that a person's name has a middle initial?
- (How do you test whether a given string matches a particular complex format?)

Regular expressions

/^[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}$/

regular expression ("regex"): a description of a pattern of text
- can test whether a string matches the expression's pattern
- can use a regex to search/replace characters in a string
regular expressions are extremely powerful but tough to read
(the above regular expression matches email addresses)
regular expressions occur in many places:
- Java: Scanner, String's split method (CSE 143 sentence generator)
- supported by PHP, JavaScript, and other languages
- many text editors (TextPad) allow regexes in search/replace

Regular expressions in PHP (PDF)

regex syntax: strings that begin and end with /, such as "/[AEIOU]+/"

function	description
`preg_match(regex, string)`	returns `TRUE` if `string` matches `regex`
`preg_replace(regex, replacement, string)`	returns a new string with all substrings that match `regex` replaced by `replacement`
`preg_split(regex, string)`	returns an array of strings from given `string` broken apart using given `regex` as delimiter (like `explode` but more powerful)

PHP form validation w/ regexes

$state = $_REQUEST["state"];
if (!preg_match("/^[A-Z]{2}$/", $state)) {
	print "Error, invalid state submitted.";
}

preg_match and regexes help you to validate parameters
sites often don't want to give a descriptive error message here (why?)

Basic regular expressions

/abc/

in PHP, regexes are strings that begin and end with /
the simplest regexes simply match a particular substring
the above regular expression matches any string containing "abc":
- YES: "abc", "abcdef", "defabc", ".=.abc.=.", ...
- NO: "fedcba", "ab c", "PHP", ...

Wildcards: `.`

A dot . matches any character except a \n line break
- /.oo.y/ matches "Doocy", "goofy", "LooNy", ...
A trailing i at the end of a regex (after the closing /) signifies a case-insensitive match
- /mart/i matches "Marty Stepp", "smart fellow", "WALMART", ...

Special characters: `|`, `()`, `\`

| means OR
- /abc|def|g/ matches "abc", "def", or "g"
- There's no AND symbol. Why not?
() are for grouping
- /(Homer|Marge) Simpson/ matches "Homer Simpson" or "Marge Simpson"
\ starts an escape sequence
- many characters must be escaped to match them literally: / \ $ . [ ] ( ) ^ * + ?
- /<br \/>/ matches lines containing <br /> tags

Quantifiers: `*`, `+`, `?`

* means 0 or more occurrences
- /abc*/ matches "ab", "abc", "abcc", "abccc", ...
- /a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc", ...
- /a.*a/ matches "aa", "aba", "a8qa", "a!?xyz__9a", ...
+ means 1 or more occurrences
- /a(bc)+/ matches "abc", "abcbc", "abcbcbc", ...
- /Goo+gle/ matches "Google", "Gooogle", "Goooogle", ...
? means 0 or 1 occurrences
- /a(bc)?/ matches "a" or "abc"

More quantifiers: `{min,max}`

{min,max} means between min and max occurrences (inclusive)
- /a(bc){2,4}/ matches "abcbc", "abcbcbc", or "abcbcbcbc"
min or max may be omitted to specify any number
- {2,} means 2 or more
- {,6} means up to 6
- {3} means exactly 3

Anchors: `^` and `$`

^ represents the beginning of the string or line;
$ represents the end
- /Jess/ matches all strings that contain Jess;
  /^Jess/ matches all strings that start with Jess;
  /Jess$/ matches all strings that end with Jess;
  /^Jess$/ matches the exact string "Jess" only
- /^Mart.*Stepp$/ matches "MartStepp", "Marty Stepp", "Martin D Stepp", ...
  but NOT "Marty Stepp stinks" or "I H8 Martin Stepp"
(on the other slides, when we say, /PATTERN/ matches "text", we really mean that it matches any string that contains that text)

Character sets: `[]`

[] group characters into a character set; will match any single character from the set
- /[bcd]art/ matches strings containing "bart", "cart", and "dart"
- equivalent to /(b|c|d)art/ but shorter
inside [], many of the modifier keys act as normal characters
- /what[!*?]*/ matches "what", "what!", "what?**!", "what??!", ...
What regular expression matches DNA (strings of A, C, G, or T)?
- /[ACGT]+/

Character ranges: `[start-end]`

inside a character set, specify a range of characters with -
- /[a-z]/ matches any lowercase letter
- /[a-zA-Z0-9]/ matches any lower- or uppercase letter or digit
an initial ^ inside a character set negates it
- /[^abcd]/ matches any character other than a, b, c, or d
inside a character set, - must be escaped to be matched
- /[+\-]?[0-9]+/ matches an optional + or -, followed by at least one digit
What regular expression matches letter grades such as A, B+, or D- ?
- /[ABCDF][+\-]?/

Escape sequences

special escape sequence character sets:
- \d matches any digit (same as [0-9]); \D any non-digit ([^0-9])
- \w matches any word character (same as [a-zA-Z_0-9]); \W any non-word char
- \s matches any whitespace character ( , \t, \n, etc.); \S any non-whitespace
What regular expression matches dollar amounts of at least $100.00 ?
- /\$\d{3,}\.\d{2}/

Regular expression PHP example

# replace vowels with stars
$str = "the quick    brown        fox";

$str = preg_replace("/[aeiou]/", "*", $str);
                         # "th* q**ck    br*wn        f*x"

# break apart into words
$words = preg_split("/[ ]+/", $str);
                         # ("th*", "q**ck", "br*wn", "f*x")

# capitalize words that had 2+ consecutive vowels
for ($i = 0; $i < count($words); $i++) {
	if (preg_match("/\\*{2,}/", $words[$i])) {
		$words[$i] = strtoupper($words[$i]);
	}
}                        # ("th*", "Q**CK", "br*wn", "f*x")

notice how \ must be escaped to \\

Regular expressions in JavaScript

string.match(regex)
- if string fits the pattern, returns the matching text; else returns null
- can be used as a Boolean truthy/falsey test:
  var name = $("name").value; if (name.match(/[a-z]+/)) { ... }
an i can be placed after the regex for a case-insensitive match
- name.match(/Marty/i) will match "marty", "MaRtY", ...

Replacing text with regular expressions

string.replace(regex, "text")
- replaces the first occurrence of given pattern with the given text
- var str = "Marty Stepp";
  str.replace(/[a-z]/, "x") returns "Mxrty Stepp"
- returns the modified string as its result; must be stored
  str = str.replace(/[a-z]/, "x")
a g can be placed after the regex for a global match (replace all occurrences)
- str.replace(/[a-z]/g, "x") returns "Mxxxx Sxxxx"
replace with empty string to use a regex as a filter
- str = str.replace(/[^A-Z]+/g, "") turns str into "MS"

15.4: SQL Injection

15.1: Security Principles
15.2: Cross-Site Scripting (XSS)
15.3: Validating Input Data
15.4: SQL Injection
15.5: Session-Based Attacks

SQL injection

a flaw where the user is able to inject arbitrary SQL into your query

This flaw often exists when a page accepts user input and inserts it bare into the query.
example: simpsons grade lookup
What kinds of SQL can we inject into the query? Why is this bad?

A SQL injection attack

The query in the Simpsons PHP code is:

$query = "SELECT * FROM students
WHERE username = '$username' AND password = '$password'";

Are there malicious values for the user name and password that we could enter?
Password:

This causes the query to be executed as:

$query = "SELECT * FROM students
WHERE username = '$username' AND password = '' OR '1'='1'";

What will the above query return? Why is this bad?

Too true...

injected SQL can:
- change the query to output others' data (revealing private information)
- insert a query to modify existing data (increase bank account balance)
- delete existing data (; DROP TABLE students; -- )
- bloat the query to slow down the server (JOIN a JOIN b JOIN c ...)
- ...

Securing against SQL injection

similar to securing against HTML injection, escape the string before you include it in your query

quote returns a SQL-escaped version of a string

$username = $db->quote($_POST["username"]);
$password = $db->quote($_POST["password"]);
$query = "SELECT name, ssn, dob FROM users
WHERE username = $username AND password = $password";

replaces ' with \', etc., and surrounds with quotes

15.5: Session-Based Attacks

15.1: Security Principles
15.2: Cross-Site Scripting (XSS)
15.3: Validating Input Data
15.4: SQL Injection
15.5: Session-Based Attacks

Man-in-the-middle attack

when the attacker listens on your network and reads and/or modifies your data

works if attacker can access and compromise any server/router between you and your server
also works if you are on the same local area network as the attacker
often, the attacker still sends your info back and forth to/from the real server, but he silently logs or modifies some of it along the way to his own benefit
e.g. listens for you to send your user name / password / credit card number / ...

Secure HTTP (HTTPS)

HTTPS: encrypted version of HTTP protocol
all messages between client and server are encrypted so men in the middle cannot easily read them
servers can have certificates that verify their identity

Session hijacking

when the attacker gets a hold of your session ID and masquerades as you

exploit sites that use HTTPS for only the initial login:
- HTTPS: browser → server (POST login.php)
- HTTPS: browser ← server (login.php + PHPSESSID cookie)
- HTTP: browser → server (GET whatever.php + PHPSESSID cookie)
- HTTP: browser ← server (whatever.php + PHPSESSID cookie)
attacker can listen to the network, get your session ID cookie, and make requests to the same server with that same session ID cookie to masquerade as you!
example: Firesheep

Web Programming Step by Step, 2nd Edition

Lab 15: Web Security

Web Programming Step by Step, 2nd Edition

Chapter 15: Web Security

15.1: Security Principles

Our current view of security

The real world

Attackers' goals

Tools that attackers use

Some kinds of attacks

Information leakage

15.2: Cross-Site Scripting (XSS)

HTML injection

Injecting HTML content

Cross-site scripting (XSS)

Another XSS example

Securing against HTML injection / XSS

15.3: Validating Input Data

What is form validation?

A real form that uses validation

Client vs. server-side validation

An example form to be validated

Basic server-side validation code

Regular expressions

Regular expressions in PHP (PDF)

PHP form validation w/ regexes

Basic regular expressions

Wildcards: .

Special characters: |, (), \

Quantifiers: *, +, ?

More quantifiers: {min,max}

Anchors: ^ and $

Character sets: []

Character ranges: [start-end]

Escape sequences

Regular expression PHP example

Regular expressions in JavaScript

Replacing text with regular expressions

15.4: SQL Injection

SQL injection

A SQL injection attack

Too true...

Securing against SQL injection

15.5: Session-Based Attacks

Man-in-the-middle attack

Secure HTTP (HTTPS)

Session hijacking

Wildcards: `.`

Special characters: `|`, `()`, `\`

Quantifiers: `*`, `+`, `?`

More quantifiers: `{min,max}`

Anchors: `^` and `$`

Character sets: `[]`

Character ranges: `[start-end]`