Skip to main content

Geo-Llama: Large Language Model for Geographic Data

SECTOR: Geospatial
TECHNOLOGY USED: HPC, AI, LLM, GenAI
COUNTRY: Germany

Organisations involved

Main Participant: DataMonkey, a brand under Urban Monkeys GmbH, is a German tech company dedicated to making AI-powered data more accessible for business experts, data analysts, and IT teams.

 

The challenge

Across the industry, organisations increasingly rely on geospatial data to support decisions e.g. in mobility planning, climate-risk assessment or real-estate management. Yet accessing, preparing and analysing this data remains highly complex. Although up to 80% of business information has a spatial component, many SMEs and public institutions lack the GIS expertise, tools and infrastructure required to work with it effectively. Open data sources such as OpenStreetMap (OSM) offer enormous value but require coding in specialised query languages and navigating inconsistent tagging systems with more than 100,000 variables.

As a result, organisations spend considerable time on data preparation rather than deriving insight, limiting the use of open and geospatial data in key European sectors. DataMonkey identified the need for a domain-specific AI model capable of understanding geospatial language and context. However, fine-tuning such a Large Language Model required computational power far beyond standard cloud environments. Through FFplus, the team secured access to EuroHPC resources, including 50,000 node hours on the Leonardo supercomputer, enabling high-precision training and experimentation. This access was essential in making Geo-Llama a scalable, EU-based geospatial AI solution.

 

The Solution

DataMonkey developed Geo‑Llama, a geospatially‑aware LLM that lets users query and combine open geographic data using natural language. The model integrates Generative AI, Retrieval-Augmented Generation (RAG) pipelines and HPC‑accelerated fine‑tuning to deliver accurate spatial reasoning. EuroHPC resources enabled efficient training of multi‑billion‑parameter models, ensuring high‑precision performance and scalability. The outcome is an intuitive, language‑based analytics system that makes advanced geospatial intelligence accessible to non‑experts across multiple sectors.

 

Impact 

Geo-Llama transforms how organisations access and use geospatial data by removing the need for specialised GIS expertise. Through natural-language interaction, companies can run precise analyses and significantly cut the time and cost of data preparation by up to 95%. This accelerates data-driven decision-making in sectors such as mobility, utilities, real estate and sustainability.

Access to large-scale HPC resources was essential to train and optimise multi-billion-parameter models, ensuring the accuracy, scalability and performance required for business-grade applications. As a result, DataMonkey has accelerated its technological roadmap and strengthened its competitive position in the European AI landscape. Geo-Llama provides an EU-based, compliant and efficient solution for geospatial intelligence, supporting digital sovereignty and sustainable business growth across Europe.

 

Benefits

  • +35% higher query accuracy than baseline models.
  •  More than 135,000 high-quality input-output pairing training dataset, covering 91% of OSM key usage.
  • Reduced data-preparation time by up to 95%, accelerating open-data integration.
  • Enabled non-technical users to run complex geographic analyses via natural language, expanding the addressable market.
  • Strengthened DataMonkey’s position as a leading EU-based geospatial AI provider, supporting new pilots and partnerships across industry.