Document ingestion (RAG)¶

Upload existing documents — design docs, internal specs, runbooks, customer-facing docs — to ground AI generation in your real product context. Uploaded documents are processed, chunked, embedded, and stored in a vector index that the AI retrieves from during PRD authoring, TRD authoring, and test generation.

Note

Ingested documents are read-only grounding material. For living pages that humans and agents read and write day-to-day (runbooks, environment notes, agent lessons), use Notebooks instead.

Supported formats¶

PDF
Word documents (.docx)
Markdown (.md)
Plain text (.txt)

Max file size: 50 MB. Larger files should be split before upload.

Uploading¶

Navigate to your project.
Open the Documents tab → Upload.
Drop the file. Ingestion runs as a background workflow.

Info

Processing takes 10–60 seconds depending on file size. Status: processing → ingested or failed. You can navigate away — status is preserved server-side.

You can also upload via the MCP server (mcp__dezycro__upload_document / upload_document_content) if you're scripting bulk ingestion.

What happens at ingestion time¶

Parse — extract text from the source format
Chunk — split into semantically meaningful pieces (~512 tokens each, with overlap)
Embed — run each chunk through an embedding model
Index — write to your project's vector index
Done — document available for retrieval

Embedding calls run on shared infrastructure and do not count against your token allowance — see What does NOT count.

How retrieved chunks are used¶

When the AI generates a PRD or test case, it issues a hybrid retrieval query (dense embeddings + sparse keyword) against your project's vector collection, retrieves the top-K most-relevant chunks, and includes them in the LLM prompt as grounding context.

This means generated content:

Cites your actual product terminology, not generic patterns
Reuses your existing user-flow patterns where they apply
Identifies features, user journeys, and acceptance criteria already in your docs
Provides accurate per-feature context without you having to paste anything

Managing ingested documents¶

The Documents tab shows:

Field	Description
Name	Document filename
Status	`processing` / `ingested` / `failed`
Chunk count	Number of indexed chunks
Source type	How the doc was uploaded (UI / MCP / API)
Ingested at	Timestamp

Importing as a PRD¶

Click Import as PRD on any ingested doc to load it into the PRD editor as an editable starting point. Useful when an existing spec is close to what you want.

Re-indexing¶

If you edit an ingested doc (re-upload the same filename), Dezycro deletes the old chunks and re-indexes the new content. Vector collection stays consistent.

Deleting¶

Delete from the Documents tab. Removes the doc + its chunks + any embeddings. Generated PRDs and tests that referenced it stay (the retrieval was at generation time; the output is independent of the source after generation).

Document-test traceability¶

Every test case generated from a PRD captures the source chunk references that grounded it. From the test case detail panel, expand Sources to see which document and which paragraph drove the test.

This is how you spot coverage gaps: if a critical paragraph in your spec has no test referencing it, that's a missing journey.

How retrieval works¶

Documents are chunked, embedded, and stored in a per-project vector index. At generation time, retrieval combines semantic and keyword matching, then re-ranks the top candidates to select the most relevant chunks as grounding context.

Embedding calls always run on Dezycro's shared infrastructure — they're free of charge and unaffected by BYOK.