| ID | Original Text | Similarity score | URL |
|---|---|---|---|
| No results yet | |||
This is an example of an application that allows searching a large text collection using semantic search combined with structured field search. The source articles text is processed using LLM, and the structured results are saved to a database. Based on this derived data, vectors are calculated using an embedding model. This is a preparatory stage for further processing of your information.
Then, the data collection becomes available for local search, both for semantic proximity based on vectors and by keywords and structured features.
A semantic + structured text search tool powered by EmbeddingGemma 300M via wllama and DuckDB-WASM. Runs entirely in your browser — no server required.
text_embeddings view).indexed_text view loaded from a single parquet file, including nested STRUCT, MAP, and ARRAY types.list_cosine_similarity.indexed_text to display original text and metadata.Each record can produce embeddings for multiple sub-fields:
| Name | Source |
|---|---|
summary | s.summary |
meaning | s.meaning |
nsfwReason | s.nsfw_reasons[] |
insight | s.key_insights[] |
contradictory | s.contradictory_statements[] |
theme | s.themes[] |
topic | s.topics[]['topic_description'] |
audience | s.target_audience['audience_description'] |
Filters apply a DuckDB WHERE clause on the indexed_text table.
Multiple filters are AND-ed together in the SQL.
The indexed_text view exposes columns over a parquet file:
task_id BIGINT
url VARCHAR
title VARCHAR
summary VARCHAR
meaning VARCHAR
target_audience STRUCT("level" VARCHAR, audience_description VARCHAR)
genre STRUCT(primary_genre VARCHAR, secondary_genres VARCHAR[])
topics STRUCT("name" VARCHAR, topic_description VARCHAR)[]
themes VARCHAR[]
key_insights VARCHAR[]
is_not_safe_for_work BOOLEAN
keywords VARCHAR[]
keyword_taxonomy VARCHAR[]
nsfw_reasons VARCHAR[]
metadata MAP(VARCHAR, VARCHAR)
user_rating INTEGER
metadata_create_time TIMESTAMP
sentiment STRUCT(polarity VARCHAR, confidence DOUBLE, tone VARCHAR, explanation VARCHAR)
completeness STRUCT(score DOUBLE, "level" VARCHAR, missing_elements VARCHAR[])
contradictory_statements VARCHAR[]
demagoguery_analysis STRUCT(detected_techniques_used_in_this_text VARCHAR[], severity VARCHAR, explanation VARCHAR)
presence_of_advertising BOOLEAN
advertising_details STRUCT(advertising_items VARCHAR[], confidence DOUBLE, explanation VARCHAR)