Processing XML documents requires taking into account the possibility of an XML eXternal Entity injection attack (XXE).

The vulnerability arises when XML parser processes unverified data containing reference to an external entity.

In this publication, I will review how secure the popular XML parsers for Erlang and Elixir are.

I enumerate parsers in descending order of their popularity and importance.

xmerl

This parser is included in the Erlang Open Telecom Platform, and some XML libraries use it.

The parser provides xmerl_scan:file and xmerl_scan:string functions.

The following code outputs the contents of the file /etc/passwd using an injected external entity:

#!/usr/bin/env escript
%% -*- erlang -*-
main(_) ->
  try
    Xml = xmerl_scan:string(
      "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
      "<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
      "<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
      "<response><result>&xxe;</result></response>"),
    io:format("~p", [Xml])
  catch
    Excep:Error:St ->
      io:format("~p: ~n ~p~n~p", [Excep, Error, St]),
      halt(1)
  end.

Safe xmerl

To use the xmerl parser securely, you need to pass the callback function to fetch an external resource:

#!/usr/bin/env escript
%% -*- erlang -*-
main(_) ->
  Trap = fun(_, GlobalState) ->
    throw(xxe_attack),
    {ok, {string, "not_fetched"}, GlobalState}
  end,
  try
    Xml = xmerl_scan:string(
      "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
      "<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
      "<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
      "<response><result>&xxe;</result></response>",
      [
        {fetch_fun, Trap}
      ]),
    io:format("~p", [Xml])
  catch
    Excep:Error:St ->
      io:format("~p: ~n ~p~n~p", [Excep, Error, St]),
      halt(1)
  end.

You can also pass the acc_fun callback function to prevent DoS attacks using XML.

Erlsom

Erlsom is an Erlang XML parsing library. This XML parser has different modes of use.

SAX mode

-module(erlsom_sax_test).
-export([main/0]).

main() ->
  Fun = fun(Event, Acc) -> io:format("~p~n", [Event]), Acc end,
  Doc0 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
         "<response><result></result></response>",
  Doc1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
         "<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
         "<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
         "<response><result>&xxe;</result></response>",
  Xml0 = erlsom:parse_sax(Doc0, [], Fun),
  io:format("~p", [Xml0]),
  Xml1 = erlsom:parse_sax(Doc1, [], Fun),
  io:format("~p", [Xml1]).

When running the example above, we will get an exception {error, "Malformed: Illegal character in literal value"} for Doc1. In this mode, the parser does not process external entities.

Simple DOM mode

-module(erlsom_simple_test).
-export([main/0]).

main() ->
  Doc0 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
         "<response><result></result></response>",
  Doc1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
         "<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
         "<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
         "<response><result>&xxe;</result></response>",
  Xml0 = erlsom:simple_form(Doc0),
  io:format("~p\n", [Xml0]),
  Xml1 = erlsom:simple_form(Doc1),
  io:format("~p\n", [Xml1]).

In this mode, the injected external entity also triggers an exception {error, "Malformed: Illegal character in literal value"} for Doc1.

Data binder mode

In this mode, Erlsom parses an XML document and checks whether it conforms to the Schema.

-module(erlsom_binder_test).
-export([main/0]).

main() ->
  Schema = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
           "<xsd:schema xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">" ++
           "<xsd:element name=\"response\" type=\"response_type\"/>" ++
           "<xsd:complexType name=\"response_type\">" ++
              "<xsd:sequence>" ++
                "<xsd:element name=\"result\" type=\"xsd:string\"" ++
                "maxOccurs=\"unbounded\"/>" ++
              "</xsd:sequence>" ++
              "</xsd:complexType>" ++
          "</xsd:schema>",
  {ok, Model} = erlsom:compile_xsd(Schema),
  Doc0 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
         "<response><result>abc</result></response>",
  Xml0 = erlsom:scan(Doc0, Model),
  io:format("~p\n", [Xml0]),
  Doc1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" ++
         "<!DOCTYPE foo [ <!ELEMENT foo ANY >" ++
         "<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>" ++
         "<response><result>&xxe;</result></response>",
  Xml1 = erlsom:scan(Doc1, Model),
  io:format("~p\n", [Xml1]).

Here we also get an exception for the second document {error, "Malformed: Illegal character in literal value"}.

XmlToMap

This library allows us to create an Elixir map data structure from an XML string using erlsom.simple_form. Thus, consumers are protected from XXE.

SweetXml

SweetXml is a wrapper around xmerl written in Elixir. It allows XXE injection like xmerl does:

defmodule XXE do
  import SweetXml
  def test do
    payload = """
<?xml version="1.0" ?>
<!DOCTYPE r [
<!ELEMENT r ANY >
<!ENTITY sp SYSTEM "file:///etc/passwd">
]>
<r>&sp;</r>
"""
    payload |> xpath(~x"//r/text()"e)
  end
end

The library includes the parse and stream functions. It is possible to pass xmerl options to them. But there are also the popular xpath and xmap functions which cannot be used securely: parent |> parse |> xpath(spec).

Saxy

Saxy does not support parsing DTD and XSD schemas. When encountering DTD, the parser skips that. Also, the parser does not expand external entity references, but provides an option to specify how to handle them.

import Saxy.XML
doc1 = """
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<response><result>&xxe;</result></response>
"""
Saxy.SimpleForm.parse_string(doc1)

The code above parses a document to such an object:

{:ok, {"response", [], [{"result", [], ["&xxe;"]}]}}

Quinn

A simple XML parser in Elixir using xmerl_scan.string to parse the XML.

import Elixir.Quinn
doc1 = """
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<response><result>&xxe;</result></response>
"""
Quinn.parse(doc1)

As expected, we get the contents of /etc/passwd. There are no functions in this library that protects from XXE.

fast_xml

fast_xml is Expat based Erlang XML parsing and manipulation library, with a focus on XML stream parsing.

Let’s assemble the library and launch the interactive shell erl -pa ebin -pa deps/*/ebin and try to parse a full structure:

Doc = <<"<?xml version=\"1.0\" encoding=\"UTF-8\"?>",
        "<!DOCTYPE foo [ <!ELEMENT foo ANY ><!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>",
        "<response><result>&xxe;</result></response>">>.
fxml:load_nif().
fxml_stream:parse_element(Doc).

We will get an error parsing aborted as for the streaming parser.

XmlRpc

The library uses erlsom.scan under the hood, so the external entities are not resolved.

Meeseeks

Meeseeks is an Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors built on top of the html5ever Rust library.

The library is a simple parser. External entities are not allowed as well as references.

You can parse the document using Meeseeks.parse(text, :xml) or Meeseeks.parse(text). They have different behaviors, but both are safe (thanks to [@mischov](https://twitter.com/mischov) for comments).

Querying parsed document with Meeseeks.one(parsed, xpath("//*/*")) will produce:

#Meeseeks.Result<{ <response><result>&amp;xxe;</result></response> }>

Exml

The library is based on xmerl_scan.string and is also not secure by default.

Exoml

This library converts XML documents into a tree structure. External entities are not allowed.

Exoml.decode(payload):

{:root, [],
 [
   {:prolog, [{"version", "1.0"}, {"encoding", "UTF-8"}], nil},
   {:doctype, [" foo [ <!ELEMENT foo ANY "], nil},
   "<!ENTITY xxe SYSTEM \"file:///etc/passwd\" >]>\n",
   {"response", [], [{"result", [], ["xxe"]}]},
   "\n"
 ]}

Summary

Secure parsers

  • Erlsom
  • XmlToMap
  • Saxy
  • fast_xml
  • XmlRpc
  • Meeseeks
  • Exoml

Insecure by default parsers

  • xmerl
  • sweetxml
  • Quinn
  • Exml

References