Clear answers to technical questions from Little Fire Digital

Sanitising Text: Web Development Fundamentals

Sanitising text is one of the most fundamental practices of web development. From maintaining data integrity to cybersecurity, you cannot overstate its importance.

Sanitising text in the context of data security is a crucial defensive programming practice aimed at preventing malicious attacks, such as SQL injection, which can compromise the integrity and security of a database. SQL injection is a type of security vulnerability that allows an attacker to interfere with the queries that an application makes to its database. It often involves inserting or “injecting” malicious SQL code into an input field in an attempt to execute unauthorised commands, access, modify, or delete data.

The Importance of Sanitising Text

Sanitising involves scrutinising and cleaning input data to ensure that it is safe for processing by an application or database. By sanitising all inputs, developers can significantly mitigate the risk of SQL injection and other injection-related security threats.

Some sanitising is easy: is an integer and integer, is an email address an email address and so on. But because most code is written as text, text can be mistaken for code.

Why is that a problem?

Well, without giving too many hints to the black hats out there, we’ll try to explain why unsanitised input can be such a big issue.

Example 1: SQL Injection Vulnerability in PHP

Vulnerable Code:

// This example shows a potential SQL injection vulnerability.
// The code directly includes user input in the SQL query without sanitization or validation.

$userInput = $_GET['user_id']; // Assume this comes from a URL parameter
$query = "SELECT * FROM users WHERE id = $userInput"; // Vulnerable to SQL injection
$result = mysqli_query($connection, $query);Code language: PHP (php)

In this example, the application constructs a SQL query directly from user input without any form of sanitisation.

Consider if the a malicious user entered this:

22; DELETE FROM users;

The combined $query will look like this:

$query = "SELECT * FROM users WHERE id = 22; DELETE FROM users;"; // !!Code language: PHP (php)

The attacker has manipulated the user_id URL parameter to inject a SQL command which will cause all manner of harm.

Example 2: Cross-Site Scripting (XSS) Vulnerability in HTML/PHP

Vulnerable Code:

// This example demonstrates a potential XSS vulnerability.
// The application echoes back user input directly into the HTML without sanitization.

$userInput = $_GET['comment']; // Assume this comes from a form input
echo "User comment: $userInput"; // Vulnerable to XSS if the input includes JavaScriptCode language: PHP (php)

This code snippet directly incorporates user input into the HTML output. An attacker could enter a malicious script as part of the comment, which would then be executed in the browser of anyone viewing that comment.

Consider this:

  • BOLD TEXT

Could be created using the following code:

echo "<strong>BOLD TEXT<script> // Do something malicious</script></strong>";Code language: HTML, XML (xml)

If published to a webpage, that script could run when viewed by a user and, in these days of powerful, modern javascript, bad things could happen.

By using the htmlspecialchars function, special characters in the user input are converted to HTML entities. For example, do this:

echo htmlspecialchars("<strong>BOLD TEXT<script> // Do something malicious</script></strong>");Code language: HTML, XML (xml)

And the output will look like this:

  • <strong>BOLD TEXT<script> // Do something malicious</script></strong>

It’s not pretty, but it is safe.

Strategies for Sanitising Text in PHP

PHP offers several functions and techniques to sanitise user inputs and protect against SQL injection and other forms of attacks:

Using Prepared Statements

Prepared statements ensure that an application treats input data as data, not as part of the SQL command, thereby eliminating the risk of SQL injection. The PHP Data Objects (PDO) extension and MySQLi provide support for prepared statements.

PDO Example:

$pdo = new PDO('mysql:host=example.com;dbname=database', 'username', 'password');
$stmt = $pdo->prepare('SELECT * FROM users WHERE email = :email');
$stmt->execute(['email' => $userInput]);
$rows = $stmt->fetchAll();Code language: PHP (php)

MySQLi Example:

$mysqli = new mysqli('example.com', 'username', 'password', 'database');
$stmt = $mysqli->prepare('SELECT * FROM users WHERE email = ?');
$stmt->bind_param('s', $userInput); // 's' specifies the variable type as string
$stmt->execute();
$result = $stmt->get_result();Code language: PHP (php)

Escaping Input Data

While not as effective as prepared statements, escaping input data is another method to sanitise text. It involves adding escape characters before potentially dangerous characters in a string. However, this approach is generally less preferred due to its reliance on manual implementation, which is more error-prone.

PHP with MySQLi Example:

$userInput = mysqli_real_escape_string($mysqli, $userInput);
$query = "SELECT * FROM users WHERE email = '$userInput'";Code language: PHP (php)

Validating and Sanitising with Filters

PHP’s filter extension provides a range of functions for validating and sanitising data. Developers should use functions to ensure that the input conforms to expected formats. Filtering for email addresses, URLs, or integers, reduces the likelihood of malicious data being processed.

Sanitising an Email Example:

$sanitisedEmail = filter_var($userInput, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitisedEmail, FILTER_VALIDATE_EMAIL)) {
    // Proceed with the sanitised and validated email
}Code language: PHP (php)

Conclusion

Sanitising text is a fundamental aspect of strengthening applications against unsafe data input. SQL injection and other data security threats are malicious exploitations of more general system weaknesses. By employing techniques such as using prepared statements, escaping input data and utilising PHP’s filter functions, developers can significantly reduce the risk of malicious attacks.

It is essential for developers, especially those working for environments with stringent data security requirements like the UK, to adhere to best practices in text sanitisation to protect the integrity and security of their applications and databases.