Processing XML documents requires taking into account the possibility of an XML eXternal Entity injection attack (XXE).
The vulnerability arises when XML parser processes unverified data containing reference to an external entity.
In this publication, I will review how secure the popular XML parsers for Erlang and Elixir are.
I enumerate parsers in descending order of their popularity and importance.
xmerl
This parser is included in the Erlang Open Telecom Platform, and some XML libraries use it.
The parser provides xmerl_scan:file
and xmerl_scan:string
functions.
The following code outputs the contents of the file /etc/passwd
using an injected external entity:
#!/usr/bin/env escript
%% -*- erlang -*-
main(_) ->
try
Xml = xmerl_scan:string(
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
"<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
"<response><result>&xxe;</result></response>"),
io:format("~p", [Xml])
catch
Excep:Error:St ->
io:format("~p: ~n ~p~n~p", [Excep, Error, St]),
halt(1)
end.
Safe xmerl
To use the xmerl parser securely, you need to pass the callback function to fetch an external resource:
#!/usr/bin/env escript
%% -*- erlang -*-
main(_) ->
Trap = fun(_, GlobalState) ->
throw(xxe_attack),
{ok, {string, "not_fetched"}, GlobalState}
end,
try
Xml = xmerl_scan:string(
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
"<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
"<response><result>&xxe;</result></response>",
[
{fetch_fun, Trap}
]),
io:format("~p", [Xml])
catch
Excep:Error:St ->
io:format("~p: ~n ~p~n~p", [Excep, Error, St]),
halt(1)
end.
You can also pass the acc_fun callback function to prevent DoS attacks using XML.
Erlsom
Erlsom is an Erlang XML parsing library. This XML parser has different modes of use.
SAX mode
-module(erlsom_sax_test).
-export([main/0]).
main() ->
Fun = fun(Event, Acc) -> io:format("~p~n", [Event]), Acc end,
Doc0 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<response><result></result></response>",
Doc1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
"<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
"<response><result>&xxe;</result></response>",
Xml0 = erlsom:parse_sax(Doc0, [], Fun),
io:format("~p", [Xml0]),
Xml1 = erlsom:parse_sax(Doc1, [], Fun),
io:format("~p", [Xml1]).
When running the example above, we will get an exception {error, "Malformed: Illegal character in literal value"}
for Doc1. In this mode, the parser does not process external entities.
Simple DOM mode
-module(erlsom_simple_test).
-export([main/0]).
main() ->
Doc0 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<response><result></result></response>",
Doc1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
"<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
"<response><result>&xxe;</result></response>",
Xml0 = erlsom:simple_form(Doc0),
io:format("~p\n", [Xml0]),
Xml1 = erlsom:simple_form(Doc1),
io:format("~p\n", [Xml1]).
In this mode, the injected external entity also triggers an exception {error, "Malformed: Illegal character in literal value"}
for Doc1.
Data binder mode
In this mode, Erlsom parses an XML document and checks whether it conforms to the Schema.
-module(erlsom_binder_test).
-export([main/0]).
main() ->
Schema = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<xsd:schema xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">" ++
"<xsd:element name=\"response\" type=\"response_type\"/>" ++
"<xsd:complexType name=\"response_type\">" ++
"<xsd:sequence>" ++
"<xsd:element name=\"result\" type=\"xsd:string\"" ++
"maxOccurs=\"unbounded\"/>" ++
"</xsd:sequence>" ++
"</xsd:complexType>" ++
"</xsd:schema>",
{ok, Model} = erlsom:compile_xsd(Schema),
Doc0 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<response><result>abc</result></response>",
Xml0 = erlsom:scan(Doc0, Model),
io:format("~p\n", [Xml0]),
Doc1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
"<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
"<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
"<response><result>&xxe;</result></response>",
Xml1 = erlsom:scan(Doc1, Model),
io:format("~p\n", [Xml1]).
Here we also get an exception for the second document {error, "Malformed: Illegal character in literal value"}
.
XmlToMap
This library allows us to create an Elixir map data structure from an XML string using erlsom.simple_form
. Thus, consumers are protected from XXE.
SweetXml
SweetXml is a wrapper around xmerl written in Elixir. It allows XXE injection like xmerl does:
defmodule XXE do
import SweetXml
def test do
= """
payload <?xml version="1.0" ?>
<!DOCTYPE r [
<!ELEMENT r ANY >
<!ENTITY sp SYSTEM "file:///etc/passwd">
]>
<r>&sp;</r>
"""
|> xpath(~x"//r/text()"e)
payload end
end
The library includes the parse
and stream
functions. It is possible to pass xmerl options to them. But there are also the popular xpath
and xmap
functions which cannot be used securely: parent |> parse |> xpath(spec)
.
Saxy
Saxy does not support parsing DTD and XSD schemas. When encountering DTD, the parser skips that. Also, the parser does not expand external entity references, but provides an option to specify how to handle them.
import Saxy.XML
= """
doc1 <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<response><result>&xxe;</result></response>
"""
Saxy.SimpleForm.parse_string(doc1)
The code above parses a document to such an object:
:ok, {"response", [], [{"result", [], ["&xxe;"]}]}} {
Quinn
A simple XML parser in Elixir using xmerl_scan.string
to parse the XML.
import Elixir.Quinn
= """
doc1 <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<response><result>&xxe;</result></response>
"""
Quinn.parse(doc1)
As expected, we get the contents of /etc/passwd
. There are no functions in this library that protects from XXE.
fast_xml
fast_xml is Expat based Erlang XML parsing and manipulation library, with a focus on XML stream parsing.
Let’s assemble the library and launch the interactive shell erl -pa ebin -pa deps/*/ebin
and try to parse a full structure:
Doc = <<"<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
"<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>",
"<response><result>&xxe;</result></response>">>.
fxml:load_nif().
fxml_stream:parse_element(Doc).
We will get an error parsing aborted
as for the streaming parser.
XmlRpc
The library uses erlsom.scan
under the hood, so the external entities are not resolved.
Meeseeks
Meeseeks is an Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors built on top of the html5ever Rust library.
The library is a simple parser. External entities are not allowed as well as references.
You can parse the document using Meeseeks.parse(text, :xml)
or Meeseeks.parse(text)
. They have different behaviors, but both are safe (thanks to [@mischov](https://twitter.com/mischov) for comments).
Querying parsed document with Meeseeks.one(parsed, xpath("//*/*"))
will produce:
#Meeseeks.Result<{ <response><result>&xxe;</result></response> }>
Exml
The library is based on xmerl_scan.string
and is also not secure by default.
Exoml
This library converts XML documents into a tree structure. External entities are not allowed.
Exoml.decode(payload)
:
:root, [],
{
[:prolog, [{"version", "1.0"}, {"encoding", "UTF-8"}], nil},
{:doctype, [" foo [ <!ELEMENT foo ANY "], nil},
{"<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>\n",
"response", [], [{"result", [], ["xxe"]}]},
{"\n"
]}
Summary
Secure parsers
- Erlsom
- XmlToMap
- Saxy
- fast_xml
- XmlRpc
- Meeseeks
- Exoml
Insecure by default parsers
- xmerl
- sweetxml
- Quinn
- Exml