Factur-X / ZUGFeRD e-invoices with Python

This is an article about how to create Factur-X / ZUGFeRD invoices with open source libraries (weasyprint and lxml) in Python.

Data standards for electronic invoices

Invoices are usually provided as PDFs, which makes them well-printable, but not so well usable in digital processes like book keeping. This is because PDFs itself do not contain any structured data which could be used afterwards.

In order to change this, there are different e-invoice formats providing invoices as structured data, sometimes as an attachment of PDFs, sometimes just the structured data (like XML or JSON). In Europe, ZUGFeRD and Factur-X came up starting at 2014 and merged a few years later (2020). With ZUGFeRD 2.1 and Factur-X 1.0, both standards are just different words for the same solution.

ZUGFeRD / Factur-X have different profiles to match different requirements. Not all profiles match the requirements by the European Union, the minimum profile is EN 16931, which is also the name of the corresponding EU norm. There’s also a special profile for German authorities, X-Rechnung, because … well. 🙃 We will focus on EN 16931, as this is the EU norm and requirement for German B2B invoices starting at 1.1.2025.

While ZUGFeRD 1.0 was a closed standard, Factur-X and ZUGFeRD 2.0+ are open, which was an enabler for an open source ecosystem.

What is Factur-X / ZUGFeRD, technically?

Core of Factur-X / ZUGFeRD is an XML definition for a XML file. This can be used by itself, then the recipient needs an invoicing system which can parse and visualize the XML invoice. The XML can also be used as an PDF attachment, which provides a much better usability: humans can read the designed PDF, machines can read the XML attachment.

The PDF has to meet the PDF/A-3 standard. PDF/A focus is archiving documents, and as invoices should be immutable and part of the accounting archive, this is a fine match. PDF/A come with additional rules like all fonts have to be included (and not loaded externally), no JavaScript etc.. PDF/A have properly defined RDF metadata, which can (and has to be) extended for Factur-X / ZUGFeRD invoices.

PDF/A-3 extended PDF/A with the feature to attach any files to the PDF document, which was the prerequisite for proper e-invoices: the XML attachment was now realizable.

Required Libraries

In order to get proper Factur-X / ZUGFeRD-Invoices, we have to major tasks:

  • Generate valid XML for the XML attachment and the RDF metadata
  • Generate the PDF and attach the XML

Building complex XMLs with lxml

For the first task, we use lxml and it’s ElementBuilder. The ElementBuilder is a simple way to generate complex XML with namespaces, and still keeping overview. Namespaces are heavily used in both XMLs, so this helps a lot. You can also define namespaces just for one XML, while the standard library included in Python always sets namespaces globally.

Let’s make a simple example with two different namespaces (which is a reduced RDF XML):

from lxml import builder, etree

nsmap: dict[str, str] = {
    'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
    'pdf': 'http://ns.adobe.com/pdf/1.3/',
}

em_rdf = builder.ElementMaker(namespace=nsmap['rdf'], nsmap=nsmap)
em_pdf = builder.ElementMaker(namespace=nsmap['pdf'])

root = em_rdf.RDF(
    em_rdf.Description(
        em_pdf.Producer(f'WeasyPrint'),
        {etree.QName(nsmap['rdf'], 'about'): ''},
    )
)

print(etree.tostring(root).decode())

The output is:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
  <rdf:Description rdf:about="">
    <pdf:Producer>WeasyPrint</pdf:Producer>
  </rdf:Description>
</rdf:RDF>

In this example, you see all relevant mechanisms:

  • All namespaces are added to the root element RDF, because it’s part of the ElementBuilder used in the root element
  • The root element has the namespace rdf, because we use the correct ElementBuilder there
  • The attribute about has also a namespace prefix because we use etree.QName there (which is required in RDF, you won’t need it in the XML attachment).
  • Producer has the namespace pdf, because we use the correct ElementBuilder

With these mechanisms, you can build both XMLs without too much hazzle.

Building PDFs with weasyprint

Weasyprint is a pure Python library for generating PDFs based on HTML and CSS input. It also supports generating PDF/A-3, if you choose the right parameters. Instead of weasyprint, we also used wkhtmltopdf, but its not well maintained, it requires additional software on the system, and we never got valid PDF/A out of it, so … we recommend weasyprint.

Basic parameters can be explained with a few lines:

pdf_html: str
pdf_css: str

font_config = FontConfiguration()
stylesheets = [CSS(pdf_css)]
prepared_pdf_html = HTML(string=pdf_html)

pdf_document = prepared_pdf_html.render(
    font_config=font_config,
    stylesheets=stylesheets,
    optimize_size=('images', 'fonts'),
)
pdf_file = pdf_document.write_pdf()

Idea is pretty simple here: you get HTML and CSS as a string (usually rendered by Jinja2 or some other rendering engine you already have), and you get a PDF in the end. Weasyprint supports many PDF-focussed CSS attributes like header, footer, layouts etc, just have a look at the documentation.

Steps to Factur-X / ZUGFeRD invoices

In order to get valid Factur-X / ZUGFeRD invoices, we need PDF/A with RDF metadata XML and Factur-X / ZUGFeRD XML.

Ingredient 1: Factur-X / ZUGFeRD XML.

The Factur-X / ZUGFeRD XML is a pretty complex XML file which has to be generated based on your invoice data. If you have a very simple system, you might lack some of the required fields. The validation is pretty much in detail, so if you have any issues, looking at input data including rounding issues is a good place to start.

You can find the specification at the FeRD website (in German, standard is downloadable in English) or at the Factur-X website (in French). At the spec download, you will find some XML examples. You can find additional examples at ZUGFeRDs Github project page.

It would be way to much example code to present a whole XML here. Just a few hints to generate the XML:

  • Use ElementBuilder at all times, and prepare one ElementBuilder per namespace in beforehand
  • Use dataclasses or even validataclass for data input, to make sure all data is fine
  • Keep a close look on namespaces, especially when namespace is used when.

The XML has to be converted to bytes and added to the PDF as attachment:

factur_x_xml: etree.Element

xml_data: bytes = etree.to_string(factur_x_xml, xml_declaration=True, encoding='utf-8', pretty_print=True).decode()

pdf_document.metadata.attachments = [
    Attachment(
        string=xml_data,
        base_url='factur-x.xml',
        description='Factur-x invoice',
    ),
]

Small remark: the parameter name base_url is quite confusing, it’s actually the attachment filename.

Ingredient 2: RDF metadata XML

RDF metadata is a generic framework to describe anything. It’s something different to the PDF info directory: the info directory is directly part of the PDF, while the RDF metadata XML is saved as a stream and referenced as Metadata at the PDF catalog. We just need to modify the XML, but keep the default attributes like Producer or Author in sync between both storage places.

Besides the Producer and the PDF/A ConformanceLevel definition at our RDF, we need the specific Factur-X definition:

<rdf:Description rdf:about="" xmlns:fx="urn:factur-x:pdfa:CrossIndustryDocument:invoice:1p0#">
  <fx:DocumentType>INVOICE</fx:DocumentType>
  <fx:DocumentFileName>factur-x.xml</fx:DocumentFileName>
  <fx:Version>1.0</fx:Version>
  <fx:ConformanceLevel>EN 16931</fx:ConformanceLevel>
</rdf:Description>

Additionally, we need the Schema for this schema definition within the RDF. It would be a bit too much for this article to show the full schema, but it’s pretty simple to find.

We highly recommend to generate the RDF using lxml, too, as there are dynamic values like Producer in the PDF, and generating it by string concatenation always has a risk of invalid XML.

If you generating the RDF XML, you have to use the new callable WeasyPrint 64.0 will provide. With this MR, you can set a generator function for your RDF XML like this:

def rdf_metadata_generator() -> bytes:
    return etree.tostring(rdf_xml_root)

pdf_document.metadata.rdf_metadata_generator = rdf_metadata_generator

Use both XMLs to generate a PDF/A with attachment

With the two XMLs from above, we can generate the final PDF/A:

pdf_html: str
pdf_identifier: bytes
factur_x_xml: etree.Element

def rdf_metadata_generator() -> bytes:
    return etree.tostring(rdf_xml_root)

font_config = FontConfiguration()
stylesheets = [CSS(pdf_css)]
prepared_pdf_html = HTML(string=pdf_html)
xml_data: bytes = etree.to_string(factur_x_xml, xml_declaration=True, encoding='utf-8', pretty_print=True).decode()

pdf_document = prepared_pdf_html.render(
    font_config=font_config,
    stylesheets=stylesheets,
    optimize_size=('images', 'fonts'),
)

pdf_document.metadata.attachments = [
    Attachment(
        string=etree.to_string(xml_data, xml_declaration=True, encoding='utf-8', pretty_print=True).decode(),
        base_url='factur-x.xml',
        description='Factur-x invoice',
    ),
]
pdf_document.metadata.rdf_metadata_generator = rdf_metadata_generator

pdf_file: bytes = pdf_document.write_pdf(
    pdf_variant='pdf/a-3b',
    pdf_identifier=pdf_identifier,
)

A few remarks:

  • Please keep in mind that you need the (not yet released) weasyprint 64.0 for using the RDF metadata generator.
  • The pdf_identifier should be a MD5 hash based on various information which makes the identifier unique within the system.
  • pdf_file is the PDF as bytes. You can also store it directly to the file system, please have a look at weasyprints documentation for more details.

Validating your PDFs

In order to provide valid Factur-X / ZUGFeRD invoices, it’s highly recommended to validate your PDFs. There are several online validators, but some of them are very broken, and some PDFs might contain sensitive data, especially if you try to trace issues with production PDFs, so validating locally is a good idea.

XSD Schema Validation

You can validate your XML file with the XSD schema, which is part of the ZUGFeRD download package. We use lxml for running the validation:

_FACTUR_X_SCHEMA_DIR = Path(Path(__file__).parent.resolve(), 'factur_x_schema')
_FACTUR_X_SCHEMA_FILE = 'Factur-X_1.07.2_EN16931.xsd'

run_dir = os.getcwd()
os.chdir(_FACTUR_X_SCHEMA_DIR)
factur_x_schema = etree.XMLSchema(file=_FACTUR_X_SCHEMA_FILE)
os.chdir(run_dir)

factur_x_schema.assertValid(invoice_xml)

As the XSL schema has several relative file references, we need to change the working directory to the schema directory temporary. Everything else is pretty straight forward. The code works nice in integration tests.

XSLT Schematron Validation

The ZUGFeRD download package provides an XSLT 2.0 defintion. Sadly, lxml does not support XSLT 2.0, just 1.0. There is a chance that saxonche can test the Factur-X / ZUGFeRD XML as it supports XSLT 2.0+, but we did not check that, because Mustang does this and a lot more.

Mustang-based validation

Mustang is a Java-based open source project providing a validation system for Factur-X / ZUGFeRD invoices. It checks the PDF using veraPDF as well as the XML attachment. Therefore, it’s a good system to make final tests of your code or to find issues in production invoices.

In order to use Mustang, we need to run the jar-File. You can either compile it yourself, or you can download it at their Releases. And, of course, you need a java runtime environment. The headless variant is fine, as Mustang is a CLI tool. To validate a PDF, you need the following command:

java -Xmx1G -Dfile.encoding=UTF-8 -jar ./Mustang-CLI-2.15.2.jar ./data/your.pdf

It will output an XML like this:

<?xml version="1.0" encoding="UTF-8"?>

<validation filename="1-de.pdf" datetime="2025-01-05 12:12:41">
  <pdf>ValidationResult [flavour=3b, totalAssertions=78873, assertions=[], isCompliant=true]
    <info>
      <signature>unknown</signature>
      <duration unit="ms">1605</duration>
    </info>
    <summary status="valid"/>
  </pdf>  
  <xml>
    <info>
      <version>2</version>
      <profile>urn:cen.eu:en16931:2017#compliant#urn:factur-x.eu:1p0:basic</profile>
      <validator version="2.15.1-SNAPSHOT"/>
      <rules>
        <fired>5157</fired>
        <failed>0</failed>
      </rules>
      <duration unit="ms">3319</duration>
    </info>
    <summary status="valid"/>
  </xml>
  <summary status="valid"/>
</validation>

As you see, this PDF generated by weasyprint and lxml is perfectly valid, the PDF as well as the attached XML.

Conclusion

We hope that the article helped you a bit to generate your own, valid Factur-X / ZUGFeRD invoices with Python-based Open Source tools. If you have questions or suggestions to improve the article, feel free to use the comment section or to ask us via e-mail!

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht.