Expand description
Serde Deserializer
module.
Due to the complexity of the XML standard and the fact that Serde was developed with JSON in mind, not all Serde concepts apply smoothly to XML. This leads to that fact that some XML concepts are inexpressible in terms of Serde derives and may require manual deserialization.
The most notable restriction is the ability to distinguish between elements and attributes, as no other format used by serde has such a conception.
Due to that the mapping is performed in a best effort manner.
§Table of Contents
- Mapping XML to Rust types
- Composition Rules
- Enum Representations
- Difference between
$text
and$value
special names - Frequently Used Patterns
§Mapping XML to Rust types
Type names are never considered when deserializing, so you can name your types as you wish. Other general rules:
struct
field name could be represented in XML only as an attribute name or an element name;enum
variant name could be represented in XML only as an attribute name or an element name;- the unit struct, unit type
()
and unit enum variant can be deserialized from any valid XML content:- attribute and element names;
- attribute and element values;
- text or CDATA content (including mixed text and CDATA content).
NOTE: All tests are marked with an ignore
option, even though they do
compile. This is because rustdoc marks such blocks with an information
icon unlike no_run
blocks.
§Basics | |
---|---|
To parse all these XML's... | ...use these Rust type(s) |
Content of attributes and text / CDATA content of elements (including mixed
text and CDATA content):
Mixed text / CDATA content represents one logical string, |
You can use any type that can be deserialized from an
NOTE: deserialization to non-owned types (i.e. borrow from the input),
such as |
Content of attributes and text / CDATA content of elements (including mixed
text and CDATA content), which represents a space-delimited lists, as
specified in the XML Schema specification for
|
Use any type that deserialized using
See the next row to learn where in your struct definition you should use that type. According to the XML Schema specification, delimiters for elements is one
or more space ( NOTE: according to the XML Schema restrictions, you cannot escape those
white-space characters, so list elements will never contain them.
In practice you will usually use NOTE: according to the XML Schema specification, list elements can be delimited only by spaces. Other delimiters (for example, commas) are not allowed. |
A typical XML with attributes. The root tag name does not matter:
|
A structure where each XML attribute is mapped to a field with a name
starting with
All these structs can be used to deserialize from an XML on the left side depending on amount of information that you want to get. Of course, you can combine them with elements extractor structs (see below). NOTE: XML allows you to have an attribute and an element with the same name
inside the one element. quick-xml deals with that by prepending a |
A typical XML with child elements. The root tag name does not matter:
|
A structure where each XML child element is mapped to the field.
Each element name becomes a name of field. The name of the struct itself
does not matter:
All these structs can be used to deserialize from an XML on the left side depending on amount of information that you want to get. Of course, you can combine them with attributes extractor structs (see above). NOTE: XML allows you to have an attribute and an element with the same name
inside the one element. quick-xml deals with that by prepending a |
An XML with an attribute and a child element named equally:
|
You MUST specify
|
§Optional attributes and elements | |
To parse all these XML's... | ...use these Rust type(s) |
An optional XML attribute that you want to capture.
The root tag name does not matter:
|
A structure with an optional field, renamed according to the requirements for attributes:
When the XML attribute is present, type
|
An optional XML elements that you want to capture.
The root tag name does not matter:
|
A structure with an optional field:
When the XML element is present, type Currently some edge cases exists described in the issue #497. |
§Choices ( | |
To parse all these XML's... | ...use these Rust type(s) |
An XML with different root tag names, as well as text / CDATA content:
|
An enum where each variant has the name of a possible root tag. The name of the enum itself does not matter. If you need to get the textual content, mark a variant with All these structs can be used to deserialize from any XML on the left side depending on amount of information that you want to get:
NOTE: You should have variants for all possible tag names in your enum
or have an |
|
A structure with a field which type is an If you need to get a textual content, mark a variant with Names of the enum, struct, and struct field with
|
|
A structure with a field which type is an Names of the enum, struct, and struct field with
NOTE: if your |
|
A structure with a field of an intermediate type with one field of Names of the enum and struct does not matter:
|
|
A structure with a field of an intermediate type with one field of Names of the enum and struct does not matter:
|
§Sequences ( | |
To parse all these XML's... | ...use these Rust type(s) |
A sequence inside of a tag without a dedicated name:
|
A structure with a field which is a sequence type, for example, Use the
Use the ⓘ
See also Frequently Used Patterns. |
A sequence with a strict order, probably with mixed content
(text / CDATA and tags):
NOTE: this is just an example for showing mapping. XML does not allow multiple root tags – you should wrap the sequence into a tag. |
All elements mapped to the heterogeneous sequential type: tuple or named tuple.
Each element of the tuple should be able to be deserialized from the nested
element content (
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. NOTE: In the case that the list might contain tags that are overlapped with
tags that do not correspond to the list you should add the feature |
A sequence with a non-strict order, probably with a mixed content
(text / CDATA and tags).
NOTE: this is just an example for showing mapping. XML does not allow multiple root tags – you should wrap the sequence into a tag. |
A homogeneous sequence of elements with a fixed or dynamic size:
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. |
A sequence with a strict order, probably with a mixed content,
(text and tags) inside of the other element:
|
A structure where all child elements mapped to the one field which have
a heterogeneous sequential type: tuple or named tuple. Each element of the
tuple should be able to be deserialized from the full element ( You MUST specify
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. |
A sequence with a non-strict order, probably with a mixed content
(text / CDATA and tags) inside of the other element:
|
A structure where all child elements mapped to the one field which have
a homogeneous sequential type: array-like container. A container type You MUST specify
NOTE: consequent text and CDATA nodes are merged into the one text node, so you cannot have two adjacent string types in your sequence. |
§Composition Rules
The XML format is very different from other formats supported by serde
.
One such difference it is how data in the serialized form is related to
the Rust type. Usually each byte in the data can be associated only with
one field in the data structure. However, XML is an exception.
For example, took this XML:
<any>
<key attr="value"/>
</any>
and try to deserialize it to the struct AnyName
:
#[derive(Deserialize)]
struct AnyName { // AnyName calls `deserialize_struct` on `<any><key attr="value"/></any>`
// Used data: ^^^^^^^^^^^^^^^^^^^
key: Inner, // Inner calls `deserialize_struct` on `<key attr="value"/>`
// Used data: ^^^^^^^^^^^^
}
#[derive(Deserialize)]
struct Inner {
#[serde(rename = "@attr")]
attr: String, // String calls `deserialize_string` on `value`
// Used data: ^^^^^
}
Comments shows what methods of a Deserializer
called by each struct
deserialize
method and which input their seen. Used data shows, what
content is actually used for deserializing. As you see, name of the inner
<key>
tag used both as a map key / outer struct field name and as part
of the inner struct (although value of the tag, i.e. key
is not used
by it).
§Enum Representations
quick-xml
represents enums differently in normal fields, $text
fields and
$value
fields. A normal representation is compatible with serde’s adjacent
and internal tags feature – tag for adjacently and internally tagged enums
are serialized using Serializer::serialize_unit_variant
and deserialized
using Deserializer::deserialize_enum
.
Use those simple rules to remember, how enum would be represented in XML:
- In
$value
field the representation is always the same as top-level representation; - In
$text
field the representation is always the same as in normal field, but surrounding tags with field name are removed; - In normal field the representation is always contains a tag with a field name.
§Normal enum variant
To model an xs:choice
XML construct use $value
field.
To model a top-level xs:choice
just use the enum type.
Kind | Top-level and in $value field | In normal field | In $text field |
---|---|---|---|
Unit | <Unit/> | <field>Unit</field> | Unit |
Newtype | <Newtype>42</Newtype> | Err(Unsupported) | Err(Unsupported) |
Tuple | <Tuple>42</Tuple><Tuple>answer</Tuple> | Err(Unsupported) | Err(Unsupported) |
Struct | <Struct><q>42</q><a>answer</a></Struct> | Err(Unsupported) | Err(Unsupported) |
§$text
enum variant
Kind | Top-level and in $value field | In normal field | In $text field |
---|---|---|---|
Unit | (empty) | <field/> | (empty) |
Newtype | 42 | Err(Unsupported) 1 | Err(Unsupported) 2 |
Tuple | 42 answer | Err(Unsupported) 3 | Err(Unsupported) 4 |
Struct | Err(Unsupported) | Err(Unsupported) | Err(Unsupported) |
§Difference between $text
and $value
special names
quick-xml supports two special names for fields – $text
and $value
.
Although they may seem the same, there is a distinction. Two different
names is required mostly for serialization, because quick-xml should know
how you want to serialize certain constructs, which could be represented
through XML in multiple different ways.
The only difference is in how complex types and sequences are serialized.
If you doubt which one you should select, begin with $value
.
§$text
$text
is used when you want to write your XML as a text or a CDATA content.
More formally, field with that name represents simple type definition with
{variety} = atomic
or {variety} = union
whose basic members are all atomic,
as described in the specification.
As a result, not all types of such fields can be serialized. Only serialization of following types are supported:
- all primitive types (strings, numbers, booleans)
- unit variants of enumerations (serializes to a name of a variant)
- newtypes (delegates serialization to inner type)
Option
of above (None
serializes to nothing)- sequences (including tuples and tuple variants of enumerations) of above,
excluding
None
and empty string elements (because it will not be possible to deserialize them back). The elements are separated by space(s) - unit type
()
and unit structs (serializes to nothing)
Complex types, such as structs and maps, are not supported in this field.
If you want them, you should use $value
.
Sequences serialized to a space-delimited string, that is why only certain types are allowed in this mode:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$text")]
field: Vec<usize>,
}
let obj = AnyName { field: vec![1, 2, 3] };
let xml = to_string(&obj).unwrap();
assert_eq!(xml, "<AnyName>1 2 3</AnyName>");
let object: AnyName = from_str(&xml).unwrap();
assert_eq!(object, obj);
§$value
NOTE: a name #content
would better explain the purpose of that field,
but $value
is used for compatibility with other XML serde crates, which
uses that name. This will allow you to switch XML crates more smoothly if required.
Representation of primitive types in $value
does not differ from their
representation in $text
field. The difference is how sequences are serialized.
$value
serializes each sequence item as a separate XML element. The name
of that element is taken from serialized type, and because only enum
s provide
such name (their variant name), only they should be used for such fields.
$value
fields does not support struct
types with fields, the serialization
of such types would end with an Err(Unsupported)
. Unit structs and unit
type ()
serializing to nothing and can be deserialized from any content.
Serialization and deserialization of $value
field performed as usual, except
that name for an XML element will be given by the serialized type, instead of
field. The latter allow to serialize enumerated types, where variant is encoded
as a tag name, and, so, represent an XSD xs:choice
schema by the Rust enum
.
In the example below, field will be serialized as <field/>
, because elements
get their names from the field name. It cannot be deserialized, because Enum
expects elements <A/>
, <B/>
or <C/>
, but AnyName
looked only for <field/>
:
#[derive(Deserialize, Serialize)]
enum Enum { A, B, C }
#[derive(Deserialize, Serialize)]
struct AnyName {
// <field>A</field>, <field>B</field>, or <field>C</field>
field: Enum,
}
If you rename field to $value
, then field
would be serialized as <A/>
,
<B/>
or <C/>
, depending on the its content. It is also possible to
deserialize it from the same elements:
#[derive(Deserialize, Serialize)]
struct AnyName {
// <A/>, <B/> or <C/>
#[serde(rename = "$value")]
field: Enum,
}
§Primitives and sequences of primitives
Sequences serialized to a list of elements. Note, that types that does not produce their own tag (i. e. primitives) are written as is, without delimiters:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$value")]
field: Vec<usize>,
}
let obj = AnyName { field: vec![1, 2, 3] };
let xml = to_string(&obj).unwrap();
// Note, that types that does not produce their own tag are written as is!
assert_eq!(xml, "<AnyName>123</AnyName>");
let object: AnyName = from_str("<AnyName>123</AnyName>").unwrap();
assert_eq!(object, AnyName { field: vec![123] });
// `1 2 3` is mapped to a single `usize` element
// It is impossible to deserialize list of primitives to such field
from_str::<AnyName>("<AnyName>1 2 3</AnyName>").unwrap_err();
A particular case of that example is a string $value
field, which probably
would be a most used example of that attribute:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$value")]
field: String,
}
let obj = AnyName { field: "content".to_string() };
let xml = to_string(&obj).unwrap();
assert_eq!(xml, "<AnyName>content</AnyName>");
§Structs and sequences of structs
Note, that structures do not have a serializable name as well (name of the
type is never used), so it is impossible to serialize non-unit struct or
sequence of non-unit structs in $value
field. (sequences of) unit structs
are serialized as empty string, because units itself serializing
to nothing:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct Unit;
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
// #[serde(default)] is required to deserialization of empty lists
// This is a general note, not related to $value
#[serde(rename = "$value", default)]
field: Vec<Unit>,
}
let obj = AnyName { field: vec![Unit, Unit, Unit] };
let xml = to_string(&obj).unwrap();
assert_eq!(xml, "<AnyName/>");
let object: AnyName = from_str("<AnyName/>").unwrap();
assert_eq!(object, AnyName { field: vec![] });
let object: AnyName = from_str("<AnyName></AnyName>").unwrap();
assert_eq!(object, AnyName { field: vec![] });
let object: AnyName = from_str("<AnyName><A/><B/><C/></AnyName>").unwrap();
assert_eq!(object, AnyName { field: vec![Unit, Unit, Unit] });
§Enums and sequences of enums
Enumerations uses the variant name as an element name:
#[derive(Deserialize, Serialize, PartialEq, Debug)]
struct AnyName {
#[serde(rename = "$value")]
field: Vec<Enum>,
}
#[derive(Deserialize, Serialize, PartialEq, Debug)]
enum Enum { A, B, C }
let obj = AnyName { field: vec![Enum::A, Enum::B, Enum::C] };
let xml = to_string(&obj).unwrap();
assert_eq!(
xml,
"<AnyName>\
<A/>\
<B/>\
<C/>\
</AnyName>"
);
let object: AnyName = from_str(&xml).unwrap();
assert_eq!(object, obj);
You can have either $text
or $value
field in your structs. Unfortunately,
that is not enforced, so you can theoretically have both, but you should
avoid that.
§Frequently Used Patterns
Some XML constructs used so frequent, that it is worth to document the recommended way to represent them in the Rust. The sections below describes them.
§<element>
lists
Many XML formats wrap lists of elements in the additional container, although this is not required by the XML rules:
<root>
<field1/>
<field2/>
<list><!-- Container -->
<element/>
<element/>
<element/>
</list>
<field3/>
</root>
In this case, there is a great desire to describe this XML in this way:
/// Represents <element/>
type Element = ();
/// Represents <root>...</root>
struct AnyName {
// Incorrect
list: Vec<Element>,
}
This will not work, because potentially <list>
element can have attributes
and other elements inside. You should define the struct for the <list>
explicitly, as you do that in the XSD for that XML:
/// Represents <element/>
type Element = ();
/// Represents <root>...</root>
struct AnyName {
// Correct
list: List,
}
/// Represents <list>...</list>
struct List {
element: Vec<Element>,
}
If you want to simplify your API, you could write a simple function for unwrapping
inner list and apply it via deserialize_with
:
use quick_xml::de::from_str;
use serde::{Deserialize, Deserializer};
/// Represents <element/>
type Element = ();
/// Represents <root>...</root>
#[derive(Deserialize, Debug, PartialEq)]
struct AnyName {
#[serde(deserialize_with = "unwrap_list")]
list: Vec<Element>,
}
fn unwrap_list<'de, D>(deserializer: D) -> Result<Vec<Element>, D::Error>
where
D: Deserializer<'de>,
{
/// Represents <list>...</list>
#[derive(Deserialize)]
struct List {
// default allows empty list
#[serde(default)]
element: Vec<Element>,
}
Ok(List::deserialize(deserializer)?.element)
}
assert_eq!(
AnyName { list: vec![(), (), ()] },
from_str("
<root>
<list>
<element/>
<element/>
<element/>
</list>
</root>
").unwrap(),
);
Instead of writing such functions manually, you also could try https://lib.rs/crates/serde-query.
§Overlapped (Out-of-Order) Elements
In the case that the list might contain tags that are overlapped with tags that do not correspond to the list (this is a usual case in XML documents) like this:
<any-name>
<item/>
<another-item/>
<item/>
<item/>
</any-name>
you should enable the overlapped-lists
feature to make it possible
to deserialize this to:
#[derive(Deserialize)]
#[serde(rename_all = "kebab-case")]
struct AnyName {
item: Vec<()>,
another_item: (),
}
§Internally Tagged Enums
Tagged enums are currently not supported because of an issue in the Serde
design (see serde#1183 and quick-xml#586) and missing optimizations in
Serde which could be useful for XML parsing (serde#1495). This can be worked
around by manually implementing deserialize with #[serde(deserialize_with = "func")]
or implementing Deserialize
, but this can get very tedious very fast for
files with large amounts of tagged enums. To help with this issue quick-xml
provides a macro impl_deserialize_for_internally_tagged_enum!
. See the
macro documentation for details.
If this serialize as
<field>42</field>
then it will be ambiguity during deserialization, because it clash withUnit
representation in normal field. ↩If this serialize as
42
then it will be ambiguity during deserialization, because it clash withUnit
representation in$text
field. ↩If this serialize as
<field>42 answer</field>
then it will be ambiguity during deserialization, because it clash withUnit
representation in normal field. ↩If this serialize as
42 answer
then it will be ambiguity during deserialization, because it clash withUnit
representation in$text
field. ↩
Structs§
- A structure that deserializes XML into Rust values.
- XML input source that reads from a std::io input stream.
- An
EntityResolver
that does nothing and always returnsNone
. - XML input source that reads from a slice of bytes and can borrow from it.
Enums§
- (De)serialization error
- Simplified event which contains only these variants that used by deserializer
- Simplified event which contains only these variants that used by deserializer, but
Text
events not yet fully processed.
Traits§
- Used to resolve unknown entities while parsing
- Trait used by the deserializer for iterating over input. This is manually “specialized” for iterating over
&[u8]
.
Functions§
- Deserialize from a reader. This method will do internal copies of data readed from
reader
. If you want have a&str
input and want to borrow as much as possible, usefrom_str
. - Deserialize an instance of type
T
from a string of XML text.