XSS權威指南

跨站點腳本攻擊指南。它們如何運作?您如何預防它們?

什麼是XSS,也稱為跨站點腳本?

XSS是我們用來定義一種特定類型的攻擊的術語,由於用戶輸入的不安全處理,網站(您的網站,如果您不注意)可能被用作攻擊其用戶的媒介。

基本上,不良行為者(攻擊者)可以利用我們在代碼中留下的漏洞,以某種方式將JavaScript注入我們的網站。

利用此漏洞,他們可以竊取用戶的信息。

根據XSS漏洞的利用方式,我們有3種主要的XSS攻擊:

  • 持續的XSS
  • 反映的XSS
  • 基於DOM的XSS

為什麼XSS危險?

假設您有一個網站。攻擊者可以某種方式註入由您的網站提供的JavaScript代碼,並且該代碼是在用戶的瀏覽器不經您的意願和您不知道的情況下執行的。

這是非常危險的。

由於您疏於解決XSS漏洞,您的網站可以用作攻擊媒介,並且用戶信息受到威脅。

特別是,當任何人都可以在頁面中註入JavaScript時,他們就可以訪問與該網站關聯的用戶cookie,並讀取其中包含的任何信息。並將其發送到自己的服務器。他們可以偵聽鍵盤事件,並可以訪問用戶在頁面中鍵入的任何內容,然後使用訪存或XHR將其發送到攻擊者的服務器。例如,用戶名和密碼。他們還可以操縱DOM,並藉助此功能可以執行許多不良操作。

XSS是前端還是後端問題?

兩者都是。這是一個涉及前端和後端的網站架構問題。

XSS攻擊示例

當您允許用戶輸入要存儲的信息(在後端)然後再顯示回來時,基本上會啟用XSS。

假設您有一個博客,並允許用戶在該博客中發表評論。如果您盲目接受用戶輸入的任何內容,則惡意用戶可以嘗試以最基本的形式包含在其中的JavaScript代碼段中<script></script>。例如<script>alert('test')</script>

您可以將該註釋存儲在數據庫中,並且在重新加載頁面時-如果沒有採取任何措施,則再次加載該頁面的所有用戶都將運行該JavaScript代碼段。

我用了一個簡單的alert()調用以舉一個例子,但是如上所列,用戶可以輸入任何類型的腳本。此時,該站點已被破壞。

什麼是持久性XSS?

持久XSS是我們在野外發現的三種XSS之一。這是我上面在博客文章示例中描述的內容。

在這種情況下,該漏洞的代碼存儲在數據庫中或其他資源中(由您自己託管)。

一旦有人可以輸入JavaScript代碼段,您的網站就會自動提供該代碼段,而無需執行任何其他操作。

XSS反映了什麼?

Reflected XSS是一種通過向最終用戶提供一個內部包含腳本的鏈接來動態利用站點中漏洞的一種方法。

這樣,攻擊者提供了類似於以下內容的鏈接

yoursite.com/?example=<script>alert('test')</script>

If your site uses the example GET variable to perform something and display it on the page, and you don’t check and sanitize its value, now that script will be executed by the user’s browser.

A typical example is a search form. It might live in the /search URL and you might accept the search term using the GET term variable.

You might display the You searched for <term> string when someone searches for it. Now, if you didn’t sanitize the value, you now have a problem.

Spam/phishing emails are a common medium for this XSS attack. Of course, the bigger and more important the site, the more frequently hackers will try to hack it.

What is DOM-based XSS?

With persistent XSS, the attacking code must be sent to the server, where it can be (and hopefully it is) sanitized. With reflected XSS, the same is true.

DOM-based XSS is a kind of XSS where the malicious code is never sent to the server. It’s common for this to happen by using the fragment part of a URL, or by referencing document.URL/document.location.href. Some examples you find online don’t really work any more because modern browsers automatically escape JS in the address bar for us. They only work if you unescape it, which is kind of scary (don’t do it!).

Here’s a simple working example. Say you have a page listening on http://127.0.0.1:8081/testxss.html. Your client-side JavaScript looks at the test variable passed in the fragment part of the URL:

http://127.0.0.1:8081/testxss.html#test=something

The #test=something value is never send to the server. It’s only local. Persistent/reflected XSS would not work. But say your script accesses that value using:

const pos = document.URL.indexOf("test=") + 5;
const value = document.URL.substring(document.URL.indexOf("test=") + 5, document.URL.length)

and you write it directly into the DOM:

document.write(value)

All is fine, until someone calls the URL like this:

http://127.0.0.1:8081/testxss.html#test=

Now, thanks to the automatic escaping that happens by referencing document.URL nothing should happen in this specific case.

You’d get

%3Cscript%3Ealert('x')%3C/script%3E

printed to the page. The value is escaped, so it’s not interpreted as HTML.

But if for some reason you unescape the document.URL value, you have a problem now, as the JavaScript is run. Any JS can be run on your users browsers.

On older browser, this was a much bigger problem, since they didn’t auto-escape JS put into the address bar.

Are static sites vulnerable to XSS?

Yes! Any kind of site, actually. Because being static does not mean there is no information loaded from other sources. For example you might roll your own form or comments, even without a database.

Or, we might have a search functionality that accepts input from an HTTP GET or POST variable. You are not immune just by not having a database.

How can we prevent XSS?

There are 3 techniques we can use:

  • encoding
  • validation
  • CSP

Encoding is done to escape the data. Doing so, the browser will not interpret the JavaScript because, for example, <script> will be encoded to %3Cscript%3E.

Encoding, as a general rule, should be always done.

Server-side frameworks commonly provide helper functions to provide this functionality to you.

In client-side JavaScript we use a different encoding mechanism depending on the use case.

If you need to add content to an HTML element, the best way is to assign the user-generated input to that element using the textContent property. The browser will do all the escaping for you:

document.querySelector('#myElement').textContent = theUserGeneratedInput

If you need to create an element use document.createTextNode():

const el = document.createTextNode(theUserGeneratedInput)

If you need to add content to an HTML attribute, use the setAttribute() method of the element:

document.querySelector('#myElement').setAttribute('attributeName', theUserGeneratedInput)

If you need to add content to the URL, use the window.encodeURIComponent() function:

window.location.href = window.location.href + '?test=' + window.encodeURIComponent(theUserGeneratedInput)

Validation is usually done when you cannot use escaping to filter the input. A common example is a CMS that lets the user define the content of the page in HTML. You can’t escape that.

You either use a blacklisting or whitelisting strategy for validation. The difference is that with blacklisting you decide which tags you want to disallow. With whitelisting you decide which tags you want to allow. Whitelisting is safer because blacklisting is prone to errors, complex and also not future-proof.

CSP means Content Security Policy. It’s a new standard implemented by browsers to enforce only executing JavaScript code coming from secure and trusted sources, and you can disallow running inline JavaScript in your code. The kind of JavaScript that allowed the above XSS exploits, for example.

CSP is enabled by the Web Server, by adding the Content‑Security‑Policy HTTP Header when serving the page.


More security tutorials: