How mintbot handles files¶

When you send a photo, document, voice note, spreadsheet, screenshot or PDF to your mintbot agent — through Telegram, the web panel, or the API — the file does not pass through mintbot's central infrastructure on its way to the language model. It lands directly on your agent's own VPS, stays there for as long as you want, and the LLM gets a transformed copy that's optimized for it.

This is a quiet design decision with loud consequences. It's worth spelling out, because it's one of the bigger places mintbot diverges from consumer LLM chat.

The flow, end to end¶

Upload arrives at the agent VPS. A photo from Telegram, a PDF dragged into the web panel, a voice memo, a screenshot pasted into chat. The agent's local API accepts the bytes, sniffs the magic header to figure out what kind of file it actually is (phones and browsers mislabel surprisingly often), hashes it with SHA-256, and writes it to /var/lib/mintbot-agent/uploads/<shard>/<sha256>.<ext> on your agent's own VPS. A row goes into a local catalog with the source (telegram / panel / api), uploader ID, MIME type, and original filename.
The original is sacred. From this point on, nothing inside mintbot ever mutates the stored file. Adapters that prepare it for the LLM only emit working copies — resized JPEGs, transcoded text, extracted thumbnails. The byte-for-byte original stays on disk until you delete it through the agent's file manager. There is no central bucket, no retention timer, no cross-agent leakage: each agent VPS only knows about its own owner's uploads.

The model gets an LLM-optimized version. When the agent decides to show the file to the LLM, a small dispatcher picks the right adapter by MIME type and extension, and the adapter emits content blocks the model can read:

Adapter	Handles	Output
Image	JPG, PNG, WebP, GIF, HEIC (iPhone), AVIF, and anything else Pillow can open	Resized to a 1568 px long edge, re-encoded as JPEG q85, base64-inlined in the model context
PDF	`.pdf` ≤ 32 MB	Base64-inlined as native PDF (Anthropic models read it directly)
Text	`.md`, `.csv`, `.json`, `.yaml`, source code (`.py`, `.js`, `.ts`, `.go`, `.rs`, …), logs, diffs	UTF-8 decoded (latin-1 fallback), inlined as text up to a size cap
Audio	`.mp3`, `.ogg`, `.opus`, `.m4a`, `.wav`, `.flac`	Telegram voice notes are already transcribed inline by the bot; direct uploads currently get a placeholder, with Whisper STT in a follow-up wave
Video	`.mp4`, `.mov`, `.webm`, `.mkv`	Placeholder for now; ffmpeg keyframe + audio transcript extraction lands in a follow-up wave
Office docs	`.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`	Placeholder for now; native text extraction (python-docx / openpyxl / python-pptx) lands in a follow-up wave
Unknown	Anything else	Text placeholder: "the user attached a `<mime>` file, it's preserved on disk at upload ID `<id>`" — so the model can at least reason about what was sent

Every transformation gets cached next to the original at <sha256>.cache/v<N>.json, so the second time the model needs that file it's an instant load. Bumping the adapter version invalidates the cache automatically.

No expiring URLs in model context. When an image or PDF goes to the LLM, it's base64-inlined in the same turn — no URL that could 404 later, no signed link with a timer. For larger files where the model only needs a pointer, the URL is an internal https://agent<id>.<domain>/<panel_token>/api/local/uploads/<upload_id>/raw — gated by your agent's own panel token, valid for as long as the file lives on disk.

Why this beats the consumer LLM chat experience¶

When you upload a photo to ChatGPT or a PDF to Claude.ai, the file goes into the provider's storage, attached to that conversation, and the provider's retention policy decides when it disappears. Past a certain age the file is gone, even if you can still see the conversation it lived in. Switching from one provider to another means starting over.

A common Telegram-bot gotcha makes the contrast concrete. Telegram itself keeps a permanent file_id for every photo, but third-party bots that fetch a Telegram file_id get a temporary URL that expires after 24 hours. Older bots referencing yesterday's photo serve a 404. Mintbot fixes this once: the first time it sees a Telegram file, it re-fetches the bytes through the eternally-valid file_id and copies them into your agent's archive. From that moment on, the photo is yours.

Three things follow from this design:

Files belong to you, not to the LLM provider. Switch from Claude to GPT-5 next month and your file history goes with you, untouched, because it sits on your VPS — not in a vendor's bucket.
You can re-ask later. "Three months ago you analyzed a contract for me — can you compare it to this new draft?" works, because the original is still on disk. With consumer chat, the older file is usually gone.
The model always gets the version it can use best. Vision models get the resized JPEG, text-readers get UTF-8, PDF-readers get native PDF. Phones can upload HEIC and it just works — Pillow's HEIF plugin loads at startup, and the magic-byte sniffer catches phones that mislabel the upload as application/octet-stream.

Where to manage your files¶

The agent web panel ships a file manager in the topbar. It browses the full agent VPS, and the upload archive at /var/lib/mintbot-agent/uploads/ is the part your conversations populate. From there you can:

Rename, delete, or move uploaded files
Browse them by date, source, or filename
Drag-drop new uploads (chunked, so there's no fixed size limit — you can upload very large files, all the way up to whatever free disk space your VPS has)
Edit small text files inline

Because uploads are chunked and land straight on your own VPS, the only ceiling on file size is your disk — there's no mintbot-imposed cap. Multi-gigabyte videos, datasets, or disk images upload fine as long as the drive has room; if you run low, you can resize the VPS or clear out old files from this same panel.

Deleting a file from the panel removes both the blob and the catalog row. The agent will no longer be able to surface it to the LLM. That is what makes the original "yours": you're the only party with delete authority.

Bottom line¶

Most LLM chat products treat your uploads as ephemeral conversation context. Mintbot treats them as your data — stored on your VPS, owned by you, format-shifted on demand into whatever shape the model needs that turn. Most of mintbot's more interesting capabilities sit on top of this foundation.