TLDR: The Github repository is here: github.com/KelSolaar/unreal-engine-docset
Introduction
Unreal Engine source is an intimidating beast with over 11.5 millions line of code.
The C++ API Reference and Blueprint API Reference being “early work in progress”, are terse and hard to navigate. The Python API Reference, generated with Sphinx, has much better usability, especially with the integrated search sidebar.
Admittedly, an IDE or text editor, e.g., Rider or 10x, with an efficient fuzzy search, will almost systematically replaces the C++ API Reference, especially because it is up-to-date and also shows private objects.

Unreal Engine’s Blueprint Editor has also a very good search, thus, looking at the documentation is rarely required in practise.
Dash and Zeal are two offline document browsers. The former, i.e. Dash, has an excellent user interface, and, thanks to its integration with PyCharm, Rider and Visual Studio Code has become part of my development stack on macOS.
As I wanted to get Unreal Engine’s Python API Reference into Dash and Zeal, it became tempting to generate the two other API docsets.
Generating the Docsets
Prior Art
Before reinventing the wheel, I looked for existing generators and found two of interest:
lolleko/unreal-docset: Written in Go, it is built for Unreal Engine 4 documentation and scrapes the online documentation which is a very slow process.
DrummerB/UnrealEngineDocset: Written in Python, this one works against the Compiled HTML files (.chm) that Epic Games used in the past. Those were discontinued because Unreal Engine’s documentation grew too large for the CHM-generation software.
I decided to write my own docset generator using Python.
Sourcing the Documentation
The first challenge is to source the documentation files to generate the docset. To understand the scope, it is important to acknowledge the sheer size of the documentation:
Eris:Documents kelsolaar$ tree . | tail -1
501685 directories, 502142 files
I tested different approaches:
Scraping docs.unrealengine.com/API: Extremely slow, hundreds of thousands of requests required. Httrack ran for way too long.
Cloning github.com/EpicGames/UnrealEngine: The documentation .tgz files, located in the EpicGames/UnrealEngine/Engine/Documentation/Builds directory, are not hosted on Github but must be pulled down by using the Setup.sh script. This ends up consuming ~106Gb of disk space. The .git folder itself is a whopping ~47Gb.
Sparse cloning github.com/EpicGames/UnrealEngine: Unfortunately, this ends up consuming around ~85Gb of disk space. The commands to perform the sparse clone I used are as follows:
git clone --filter=blob:none --sparse https://github.com/EpicGames/UnrealEngine.git
git sparse-checkout add Engine/Binaries/DotNET/GitDependencies
git sparse-checkout add Engine/Build/BatchFiles/Mac
./Setup.sh
Downloading Unreal Engine with the Epic Games Launcher (EGL): This is the best approach as it only consumes ~25Gb of disk space, and ultimately, less bandwidth.
C++ and Blueprint API
I started with the C++ and Blueprint API docsets: The Python one is built with Sphinx and there is already an excellent generator for it, i.e. doc2dash.
Upon extraction of the .tgz files, the first noticeable oddity is that none of the links works because they point to directories and not index.html files:
All the relevant href links must then be re-written to point to their corresponding child index.html files. I initially used BeautifulSoup4 for parsing, but when dealing with a dataset over 0.5 million files, patience is quickly tested. I pivoted toward lxml as it is written in C++ and also support XPath for XML Language. Coupled with multiprocessing, it now takes less than 5 minutes to process the entire C++ API Reference docset on my MacBook Pro M1.
Dash populates its sidebar ToC using dedicated anchors that must be placed in the HTML file; note that Zeal’s lack of support for this feature is highly problematic.
<td class="name-cell">
<a class="dashAnchor" name="//apple_ref/cpp/Enum/EBPConditionType"></a>
<a href="EBPConditionType/index.html">
<p>EBPConditionType</p>
</a>
</td>
.//div[@id="constructor"]//td[@class="name-cell"][1]/a[not(@class="dashAnchor")]
Example of an XPath predicate to select all the constructors within an Unreal Engine C++ API html file; The not(@class="dashAnchor")
statement is used to make the generator idempotent.
The C++ and the Blueprint API docsets generation is similar and shares most of the code. I was tempted to create a dependency graph with NetworkX to model the relationships between objects; that would have contributed to make the logic better whilst being totally overkill.
I wish that the C++ and the Blueprint API References had an inventory file like the Python API Reference objects.inv one. This would reduce significantly the parsing requirements.
Dash indexes the documentation entries with a simple SQLite3 database. A PLIST XML file is also required to describe the docset.
The final C++ API Reference docset looks as follows in Dash:
Python API
As mentioned earlier, the Python docset is generated with doc2dash. Its PLIST file is replaced for consistency with the other docsets.
It is worth noting that whilst there are only ~11500 files in the Python API Reference, because BeautifulSoup4 is used for parsing, it takes almost the same time to process than the C++ API Reference and its ~0.5million files!
The PythonAPI-HTML.tgz file is currently only available from Epic Games’ Perforce.
With Dash, it is possible to hide the Sphinx sidebar by appending the following CSS code to the UnrealEnginePythonAPI.docset/Contents/Resources/Documents/_static/ue_api.css file:
div.sphinxsidebar {
display: none !important;
}
div.bodywrapper {
margin-left: 0 !important;
}
div.body {
max-width: 1280px !important;
}
The final Python API Reference docset looks as follows in Dash:
Automating the Generation on Github Actions
I started to be concerned with processing millions of files on my MacBook Pro SSD so I decided to move the generation process on Github Actions.
How to run the EGL on the runners in order to download Unreal Engine? It is obviously not possible without a UI but a cursory search pointed to Legendary, a really cool CLI alternative to the EGL.
Afaik, it still requires a UI for the initial authentication and token generation:
legendary auth
Once performed on my laptop, the BASE64 user.json file can be pushed to the Github Actions runner via a secret:
cat ~/.config/legendary/user.json | base64
and restored trivially:
echo "${{ secrets.LEGENDARY_USER_JSON }}" | base64 --decode > ~/.config/legendary/user.json
The full process to download Unreal Engine on a Github Actions runner can be summarised as follows:
pip install legendary-gl
mkdir -p ~/.config/legendary
echo "${{ secrets.LEGENDARY_USER_JSON }}" | base64 --decode > ~/.config/legendary/user.json
legendary auth
legendary list --include-ue
legendary download UE_$CI_UNREAL_ENGINE_VERSION --include-ue --platform Windows --install-tag "" --yes
Conclusion
Generating those docsets was an interesting detour from my typical colour science BDFL duties. I currently use them sporadically at work because Zeal could use some love and Rider covers most of my needs. The Python one is probably the most useful.
To finish and because a picture is worth a thousand words, here are some numbers about the C++ API Reference: