OpenMicae Logo

Discord-OpenMicae

A dataset of 362 thousand anonymized Discord conversations from late spring to late summer 2025 for training and evaluating conversational AI models in a ChatML-friendly format.



Nomic Atlas Map Preview
View on Nomic Atlas

Features

Use

Dataset

High-level totals

Length Distribution (tokens)

31–38
39–46
47–54
55–62
63–70
71–78
79–86
87–94
95–102
103–110

License

Apache License 2.0

Related

All data collected following Discord's Terms of Service.