Exposing enterprise data in Schema.org with Java and Spring, enabling AI indexing for platforms or libaries like NLWeb is a key challenge: First generating structured data ready to be indexed by AIs publicly and secondly for utilizing AIs to index data internally within an enterprise. This guide explores the application of Schema.org classes in Java. We discuss how one can easily use a Java Schema.org metadatadtypes Json-LD library to map DTOs-to-Schema.org-entities and then expose the Json-LD. Through a Spring Boot application with an embedded OrientDB instance and Apache TinkerPop Gremlin driver, we demonstrate mapping, querying, and serializing JSON-LD data, drawing on the libraries’s examples. Shared under the Fair Code License OCTL, the Schema-org Java library of iunera offers a sustainable model, licensed ideally for enterprise data solutions.
Introduction
Motivation for Schema.org Datatypes in Java
Exposing enterprise data for AI indexing is a critical need in today’s digital landscape, where structured, machine-readable data drives semantic web applications. Schema.org, backed by Google, Microsoft, Yahoo, and Yandex, provides a standardized vocabulary for annotating data, enhancing search visibility by up to 30% through rich snippets, as per Google’s Structured Data guidelines. It is even more important in the future for Ai indexing. JSON-LD is a lightweight linked data format which is used to make the Schema.org vocabulary accessible in Json. However, integrating Schema.org into Java applications can be complex. The jsonld-schemaorg-javatypes
library, hosted at GitHub, addresses this with Schema.org Java classes for the complete Schema.org vocabulary as Java Types and a FieldMapper
utility. Key motivations to generate Json-LD for and from Java calasses include:
- Semantic Search Optimization: Boosts Ai SEO for enterprise data, according to the A-U-S-S-I rules. Easily allowing enterprise data to be exposed in Json-LD via normal java services can be a quick win to be recognized in the natural language AI web.
- AI training and Indexing: Powers AI systems like NLWeb’s chatbots and search and machine learning can be better trained with semantically annotated data.
- Data Interoperability: Enables seamless data exchange and linking between applications, critical for big data analytics and Data Lakes.
- Knowledge Graphs: Builds graphs for enterprise insights that can be extracted or exposed from graph databases.
Shared under the Fair Code Open Compensation token license, the Java Json-LD Schema.org library ensures sustainable development through a license-token approach, balancing open access with contributor support, making it ideal for enterprises.
Scope of the Java Schema.org Datatype library
The jsonld-schemaorg-javatypes
repository, available via Maven Central, is a toolkit for exposing enterprise data with Java and Spring Boot and with simple enterprise graph database, like OrientDB, integration, as shown in GitHub examples.
Key Components
- Schema.org Java Classes: Classes like
Person
andProduct
, annotated with@Vertex
, model Schema.org properties. - FieldMapper Utility: Maps DTOs to entities, per MappingAPerson.java.
- JSON-LD Serialization: Uses
SimpleSerializer.toJson
for W3C JSON-LD compliance. - Custom Type Generator: JavaPoet-based generator for custom types.
Integration
Integrates with Spring Boot and OrientDB via the Gremlin driver.
Use Cases
The key use cases to use JSON-Ld Schema.org data types in enterprise Java are in our opinion the following:
- Enriched Natural Language AI Training: Enhances AI text training with structured data, supporting NLWeb’s profiling with Json-LD Schema.org types.
- Sematic enriched Vector Database Search: Using the sematic information in vector dabase indexing can signifcantly improve search results – in special if used in a RAG scenario with generative AI.
- Enterprise Integration: Easy mapping capabilities to uniform data types enable cross- analysis with apache spark, apache flink and similar big data processing techniques.
- Knowledge Graph: Allows to persists knowlwege graphs in grapth databases like OrientDB. The query of such knowlege graphs then can play a crucial role in enriching context to generative AI.
- Tradtional Search SEO: Publishes JSON-LD for SEO easily, per Google Structured Data.
Focus on NLweb
NLWeb is an AI-powered platform for conversational websites, using NLP for chatbots and semantic search. Goal of the library to provide structures data for NLWeb’s AI, mapping DTOs and serializing JSON-LD. Semantic Search Engine
Modelling of Schema.org in Java
Schema.org Hierarchies
Models Schema.org hierarchies (e.g., Person
extends Thing
) are expressed as natural Java inheritance hierarchies.
Multi-Inheritance with Annotations
Java does not support multi inheritance (for good reasons). Therefore, the mapping enables Schema.org multi-inheritance with aggregations. Therefore, also in the serialization to Json-LD the aggregation is kept to avoid ambiguous overrides of overloaded properties. We recommend to explicitiy extend our serialization when you have such ambiguous merging intenions.
Datatype Mapping
Maps Schema.org types to Java (e.g., Text
→ String
) and other datatypes which are sematically the same.
Utilities
- FieldMapper: Custom mappings to allow a map with property names of a normal enterprise Java entity to the Json-LD Java object and vice versa.
- JSON-LD Serialization:
SimpleSerializer.toJson
serializes annotated Java types to valid Schema.org structured data. It works also for futher types that are annotated in the same matter, what ensures the extendability of the whole concept.
Custom Enterprise Json-LD Vocabulary Generation
Usage Examples
In the following, we show three examples of how Json-LD Schema.org vocabulary can be propulated and serialized into valid Json-LD Schema.org vocabulary.
CreativeWork
CreativeWork article = new CreativeWork(); article.setName("AI Tech"); String jsonLd = SimpleSerializer.toJson(article);
Person
// the Schema.org Json-LD type as plain old Pojo Person person = new Person(); person.setGivenName("Jane Doe"); PostalAddress address = new PostalAddress(); address.setStreetAddress("123 Main St"); person.setAddress(address); // outputs valid Schema.org valid Json-LD String jsonLd = SimpleSerializer.toJson(person);
SoftwareApplication
SoftwareApplication nlweb = new SoftwareApplication(); nlweb.setName("NLweb"); // outputs valid Schema.org valid Json-LD String jsonLd = SimpleSerializer.toJson(nlweb);
Mapping DTOs to JSON-LD
Aside simple Java property associations one can also leverage the mapping capabilities of the library like follows:
// a normal Pojo PersonDTO dto = new PersonDTO(); dto.firstName = "John Doe"; dto.birthDate = "1990-01-01"; dto.street = "123 Main St"; dto.city = "Springfield"; dto.zipCode = "12345"; // Generate the mappings between Pojo and Schema.org types Map<String, String> personFieldMappings = Map.of("firstName", "givenName", "birthDate", "birthDate"); Map<String, String> addressFieldMappings = Map.of("street", "streetAddress", "city", "addressLocality", "zipCode", "postalCode"); FieldMapper personMapper = new FieldMapper(personFieldMappings, Set.of()); FieldMapper addressMapper = new FieldMapper(addressFieldMappings, Set.of()); // generate the Json-LD receiving types Person person = new Person(); PostalAddress address = new PostalAddress(); person.setAddress(address); // map the normal Java types to the Json-LD schema.org vocabulary personMapper.copyFieldsWithMapping(person, dto); addressMapper.copyFieldsWithMapping(address, dto); // simply output valid Schema.org Json-LD String jsonLd = SimpleSerializer.toJson(person);
Storing and retrieving Schema.org Objects in a graph Database, easily
Use Case
Storing Schema.org objects in OrientDB to enrich context of AI queries by retrieving them laters
Implementation
/** * Creates or updates a Product vertex from a ProductDTO using the jsonld-schemaorg-javatypes FieldMapper. * Demonstrates how a DTO can be used for mapping. * Note: The same way can also be used to map a DTO from a Database to a @Vertex object. * @param productDTO The ProductDTO to map and save. * @throws RuntimeException If mapping or saving fails. */ @PostMapping(value = "/products", consumes = MediaType.APPLICATION_JSON_VALUE) public void saveProduct(@RequestBody ProductDTO productDTO) { try { // Define field mappings for Product Map<String, String> productFieldMappings = Map.of( "dtoName", "name", "dtoDescription", "description" ); // Define field mappings for Offer Map<String, String> offerFieldMappings = Map.of( "dtoPrice", "price", "dtoPriceCurrency", "priceCurrency" ); // Create target Product and Offer Product product = new Product(); Offer offer = new Offer(); product.setOffer(offer); // Map fields using FieldMapper FieldMapper productMapper = new FieldMapper(productFieldMappings, Set.of()); FieldMapper offerMapper = new FieldMapper(offerFieldMappings, Set.of()); productMapper.copyFieldsWithMapping(product, productDTO); offerMapper.copyFieldsWithMapping(offer, productDTO.getOffer()); // Set ID if present product.setId(productDTO.getId()); // Save or update the Product vertex vertexMapper.saveVertexRecursive(product); } catch (Exception e) { throw new RuntimeException("Failed to map or save Product: " + e.getMessage(), e); } } /** * Retrieves all Product vertices. Shows how tow retrieve Schema Org objects * @param mediaType The response media type (JSON or JSON-LD). * @return A list of Product objects. */ @GetMapping(value = "/products", produces = {MediaType.APPLICATION_JSON_VALUE, "application/ld+json"}) public String getProducts(@RequestParam(value = "mediaType", defaultValue = "application/json") String mediaType) { return SimpleSerializer.toJsonLd(vertexMapper.findAllVertices(Product.class)); }
Usage
POST http://localhost:8080/products Content-Type: application/json { "dtoPrice": "10", "dtoPriceCurrency": "EUR", "dtoName": "youai", "dtoDescription": "iunera's awsome product to turn your social media presence into an ai with your personality" }
- Query now Schema.org compatible JSON-LD:
GET http://localhost:8080/products
Conclusion
The jsonld-schemaorg-javatypes
library, simplifies exposing enterprise data for AI indexing with Java and Spring, supporting NLWeb’s AI applications what was our main intention of sharing this library.
We showed how one can leverage the library’s Schema.org Java classes, FieldMapper utility, and JSON-LD serialization to map enterprise DTOs to Schema.org entities, serialize them into valid JSON-LD, and store or retrieve them using a graph database like OrientDB. This enables seamless integration with AI-driven platforms like NLWeb, enhancing semantic search, knowledge graph creation, and data interoperability for enterprise use cases. By providing practical examples, such as mapping DTOs to Schema.org types and querying graph databases, we demonstrated how enterprises can efficiently expose structured data for AI indexing and traditional SEO, boosting visibility and usability.
Explore jsonld-schemaorg-javatypes
on GitHub to build AI-ready solutions. The Fair Code License’s license-token approach for open collaboration ensures sustainable open development, making it in our opinion a smart choice for enterprises enhancing NLWeb.