Dylan Freedman @dylan@journa.host
Dylan Freedman @dylan@journa.host

@dylfreed

7 Tweets 52 reads Apr 25, 2023
I'm excited to announce Semantra: an open source multi-tool for semantic search ๐ŸŽ‰ github.com
- Launch a local search engine over text and PDF files
- Search by concepts/meaning
- Refine results via tagging and adding/subtracting queries
Try it out now ๐Ÿš€๐Ÿ“š๐Ÿ”
Semantra is built for those seeking needles in haystacks: journalists, researchers, students, and more.
I've found it useful personally across a wide range of content, including books, reports, speeches, and government documents.
Tutorial: github.com
Semantra runs locally, keeping your data safe, or it can optionally use OpenAI's paid embedding models to offload computation.
Install with Python/pipx:
```
python3 -m pip install --user pipx
python3 -m pipx ensurepath
```
In a new terminal, run:
```
pipx install semantra
```
To run Semantra over a collection of documents (text or pdf):
```
semantra <filenames>
```
It will download embedding models as needed, analyze the documents in chunks, and launch a local web app for interactive analysis โœจ
Here's an example using Semantra on a collection of US inaugural speeches. You can play with this document collection in the tutorial github.com
After downloading the documents, analyze them all at once with:
```
semantra us_inaugural_speeches/*.txt
```
Semantra is full of flexible options: you can run any Hugging Face transformers model, change the window sizes for the embeddings, switch up the results algorithm, and more.
Processed documents are cached by content so Semantra only ever does the initial processing work once.

Loading suggestions...