qol

Voice-to-text dictation overlay. Hold a hotkey, talk, release — your cleaned-up words appear in whatever app has focus. Powered by Aavaaz for streaming transcription and an LLM polish pass for filler removal, punctuation, and per-app tone.

qol (қол) means "voice" in Kazakh — a sibling name to Aavaaz ("voice" in Hindi).

Architecture

┌──────────────────────────────────────────────────┐
│  qol (Tauri desktop app)                         │
│                                                  │
│  global-shortcut ──► push-to-talk                │
│  cpal             ──► 16 kHz mono PCM            │
│  tokio-tungstenite──► ws://localhost:9090        │  ── Aavaaz/WhisperLive ──► transcript
│  active-win       ──► focused app context        │
│  Claude API       ──► polish (tone, punctuation) │
│  enigo            ──► inject text into focused app│
│                                                  │
│  webview          ──► settings UI (Vite + TS)    │
└──────────────────────────────────────────────────┘

Layout

Path	Purpose
`src-tauri/Cargo.toml`	Rust deps (tauri, cpal, tokio-tungstenite, enigo, reqwest)
`src-tauri/src/main.rs`	App entry, hotkey wiring, Tauri commands
`src-tauri/src/audio.rs`	Mic capture → 16 kHz mono f32 frames
`src-tauri/src/transport.rs`	WebSocket session to Aavaaz/WhisperLive
`src-tauri/src/session.rs`	Lifecycle: audio → transport → polish → inject
`src-tauri/src/inject.rs`	Keystroke injection (enigo) + active-window probe
`src-tauri/src/polish.rs`	Claude API call for transcript cleanup
`src-tauri/src/config.rs`	JSON config in `~/.config/qol/config.json`
`index.html` + `src/`	Vite settings UI

Prerequisites

Rust 1.77+ (rustup toolchain install stable)
Node 20+ (pnpm or npm)
A running Aavaaz instance at ws://localhost:9090
Optional LLM polish: any OpenAI-compatible endpoint —
- OPENAI_API_KEY env var for OpenAI (default)
- Or point base_url at Groq, OpenRouter, Together, Cerebras, Mistral, ...
- Or fully local: Ollama (http://localhost:11434/v1, model qwen2.5:7b-instruct) or llama.cpp's --server (http://localhost:8080/v1) — leave the API key env var empty
- Or skip entirely: with polish disabled, raw transcripts inject fine

Fedora-specific system deps

sudo dnf install -y \
  webkit2gtk4.1-devel \
  openssl-devel \
  curl wget file \
  libappindicator-gtk3-devel \
  librsvg2-devel \
  gtk3-devel \
  alsa-lib-devel \
  libxdo-devel

libxdo-devel is needed by enigo on X11.

Wayland (GNOME, KDE Plasma 6 in Wayland mode)

GNOME Wayland blocks synthetic X11 input, so we automatically detect Wayland and route injection through ydotool.

sudo dnf install ydotool   # or: apt install ydotool

How you set this up depends on whether ydotoold runs as root (a system service — Fedora/Debian/Ubuntu) or as your user (an Arch user unit).

System service running as root (Fedora, Debian/Ubuntu)

This is the common case. ydotoold runs as root, so it already has /dev/uinput access — you do not need a udev rule or the input group. Those only matter when the daemon runs as your user (next section).

The real problem is the socket. Run as root, ydotoold creates /tmp/.ydotool_socket owned root:root 0600, which (a) your unprivileged clients can't open, and (b) isn't where the client looks by default ($XDG_RUNTIME_DIR/.ydotool_socket, i.e. /run/user/<uid>/.ydotool_socket). Two mismatches, both silent.

Fix both with a drop-in that pins a known path, hands ownership to your user, and makes it group-readable (replace 1000:1000 with your id -u:id -g):

sudo systemctl enable ydotool          # Fedora unit name; Debian: ydotoold
sudo mkdir -p /etc/systemd/system/ydotool.service.d
sudo tee /etc/systemd/system/ydotool.service.d/socket.conf >/dev/null <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/bin/ydotoold --socket-path=/tmp/.ydotool_socket --socket-perm=0660 --socket-own=1000:1000
EOF
sudo systemctl daemon-reload
sudo systemctl restart ydotool

Then tell clients where the socket is. qol already does this internally (inject.rs sets YDOTOOL_SOCKET=/tmp/.ydotool_socket unless you override it), so qol works with no further config. For your own shell, make it permanent:

echo 'YDOTOOL_SOCKET=/tmp/.ydotool_socket' | sudo tee -a /etc/environment

Because the socket is owned by your user (not root:input), this works without the input group and without a logout — group membership added by usermod -aG doesn't reach an already-running GNOME session anyway, which is the usual reason "it worked after I logged out" turns out false.

Daemon running as your user (Arch AUR user unit)

Here ydotoold runs as you, so it needs /dev/uinput access via a udev rule and the input group, and the socket lands in $XDG_RUNTIME_DIR where the client already looks — no YDOTOOL_SOCKET needed:

echo 'KERNEL=="uinput", MODE="0660", GROUP="input"' | \
  sudo tee /etc/udev/rules.d/80-uinput.rules
sudo udevadm control --reload && sudo udevadm trigger
sudo usermod -aG input "$USER"          # then fully log out + back in (or reboot)
systemctl --user enable --now ydotool

Verify

systemctl status ydotool --no-pager                  # active (running)
ls -l "${YDOTOOL_SOCKET:-/tmp/.ydotool_socket}"      # socket exists, owned by you
YDOTOOL_SOCKET=/tmp/.ydotool_socket ydotool type "hello"   # types into focused window

failed to connect socket … No such file or directory → daemon isn't running, or the client is looking at the wrong path (set YDOTOOL_SOCKET).
failed to connect socket … Permission denied → the socket is owned root:input and your session isn't in input; use the --socket-own drop-in above instead of relying on the group.
failed to open uinput device → only with a user-run daemon: the udev rule or input group hasn't taken effect (reboot to be sure).

qol picks the backend automatically at startup. Look for selected injection backend = Ydotool in the logs to confirm.

Wayland hotkey — use `qol-trigger` + GNOME Custom Shortcut

tauri-plugin-global-shortcut can't grab keys under GNOME Wayland (Mutter refuses the X11-style key grab), and the modern xdg-desktop-portal GlobalShortcuts interface rejects non-sandboxed apps because the portal sends an empty app_id and gnome-control-center discards the request:

gnome-control-center-global-shortcuts-provider:
  Discarded shortcut bind request from application with an invalid app_id ><.

Workaround: qol always opens a Unix socket at $XDG_RUNTIME_DIR/qol.sock, and ships a tiny qol-trigger CLI that pokes it. Bind a GNOME Custom Shortcut to qol-trigger toggle and you get a working hotkey on Wayland:

Install the CLI on your PATH:

sudo install -m 755 src-tauri/target/debug/qol-trigger /usr/local/bin/qol-trigger

Settings → Keyboard → View and Customize Shortcuts → Custom Shortcuts → +
- Name: qol toggle dictation
- Command: /usr/local/bin/qol-trigger toggle
- Shortcut: pick your combo (e.g. Ctrl+Alt+Space — make sure nothing else has it; Super+Space is grabbed by GNOME's input-source switcher and won't reach the command)
Start qol once so the socket exists, then press your combo. First press starts dictation, second press stops it.

Since GNOME custom keybindings only fire on press (no release event), the hotkey is toggle, not push-to-talk. Aavaaz's VAD finalizes segments naturally during dictation; toggling again ends the session.

Other commands the CLI supports:

qol-trigger status     # prints "idle" or "recording"
qol-trigger start      # idempotent
qol-trigger stop       # idempotent
qol-trigger toggle     # default

The trigger socket is enabled on every OS, not just Linux, so the same CLI works from scripts on macOS and X11 too. On X11/macOS/Windows you also still have real push-to-talk through the in-process global-shortcut plugin — pick whichever feels better.

Run

# in one terminal — start Aavaaz with a model that fits your GPU
cd ../Aavaaz/aavaaz
source .venv/bin/activate
aavaaz serve --model distil-large-v3

# in another — build and run qol
cd ../../qol
npm install
npm run tauri dev

Then press your hotkey (default Super+Space), speak, and release.

First run — testing against a local Aavaaz

This walks through the end-to-end smoke test from a cold start. Aimed at a single workstation: Aavaaz running on localhost, qol injecting into the focused text field.

1. Build qol once

cd ~/src/qol
npm install
( cd src-tauri && cargo build )

The first build pulls a lot of dependencies (~5 min). Subsequent builds are seconds.

2. Start Aavaaz

Pick a model that fits your GPU's VRAM. For a 6 GB card (e.g. RTX 3060):

cd ~/src/Aavaaz/aavaaz
source .venv/bin/activate
aavaaz serve --model distil-large-v3

You should see something like:

INFO  whisper_live - WebSocket server listening on 0.0.0.0:9090
INFO  whisper_live - Loaded distil-large-v3 on cuda:0

Sanity-check from another terminal:

ss -tln | grep 9090            # port is listening

3. Disable polish for the first test

We want to isolate STT before mixing in an LLM. Either toggle it off in the settings window after first launch, or pre-seed the config:

mkdir -p ~/.config/qol
cat > ~/.config/qol/config.json <<'EOF'
{
  "aavaaz_url": "ws://localhost:9090",
  "model": "distil-large-v3",
  "language": "en",
  "hotkey": "Super+Space",
  "polish": {
    "enabled": false,
    "base_url": "https://api.openai.com/v1",
    "model": "gpt-4o-mini",
    "api_key_env": "OPENAI_API_KEY",
    "per_app_tone": true
  },
  "hotwords": [],
  "inject_method": "type"
}
EOF

4. Run qol with logs visible

RUST_LOG=qol=debug,warn ~/src/qol/src-tauri/target/debug/qol

You should see roughly:

INFO qol::inject: selected injection backend backend=Enigo

(backend=Ydotool if you're on Wayland with ydotool installed.)

The window stays hidden. Look for the tray icon — see Troubleshooting if it's missing on GNOME.

5. Dictate

Focus any text field (a terminal, gedit, your browser address bar).
Hold Super+Space, say a sentence, release.

Watch the qol logs — you should see:

INFO qol: session started
DEBUG qol::session: session started app=Some("...")
INFO qol: session stopped

The transcript should land in the focused field within ~1 second of release.

6. Enable polish (optional)

Open the settings window from the tray, check Clean up transcripts, and configure:

OpenAI: https://api.openai.com/v1, model gpt-4o-mini, env var OPENAI_API_KEY
Groq (very fast): https://api.groq.com/openai/v1, model llama-3.1-8b-instant, env var GROQ_API_KEY
Local Ollama: http://localhost:11434/v1, model qwen2.5:7b-instruct, env var blank

export the key in the same shell you launch qol from, then restart qol.

Troubleshooting

Symptom	Likely cause	Fix
Aavaaz errors `libcudnn_ops_infer.so: cannot open`	cuDNN not on path	`pip install nvidia-cudnn-cu12` inside the Aavaaz venv
Aavaaz `CUDA out of memory`	Model too big for your VRAM	Use `distil-large-v3` or `medium` instead of `large-v3`
qol logs "no default input device"	Mic not picked by PulseAudio/Pipewire	`pactl list sources short`; set a default with `pactl set-default-source <name>`
No tray icon on GNOME	GNOME hides AppIndicators by default	`sudo dnf install gnome-shell-extension-appindicator`, then enable "AppIndicator and KStatusNotifierItem Support"
Hotkey does nothing	Already grabbed by another app	Pick a different combo in settings (e.g. `Ctrl+Alt+Space`)
Text doesn't appear in focused app (GNOME Wayland)	Wayland blocks synthetic input	Install `ydotool` + `ydotoold` (see Wayland section above); restart qol; verify `backend=Ydotool` in logs
Polish silently produces no text	API key env var unset or wrong name	`echo $OPENAI_API_KEY`; restart qol after `export`-ing
`connect failed: ConnectionRefused`	Aavaaz not running	Start it on `:9090` first

What "good" looks like

End-to-end, on a 6 GB GPU with polish disabled, expect roughly:

Hotkey press → first PCM frame to Aavaaz: <50 ms
End of speech → first completed segment from Aavaaz: 300–800 ms (depends on VAD pause threshold)
First completed segment → text in focused app: <50 ms
With polish enabled (OpenAI gpt-4o-mini): add ~300–600 ms per segment

If you're seeing multi-second lag, that's almost always Aavaaz model load or CPU fallback (check nvidia-smi while dictating — qol should drive the GPU to ~30% utilization momentarily).

Settings

Edit via the settings window (open from system tray), or directly at ~/.config/qol/config.json:

{
  "aavaaz_url": "ws://localhost:9090",
  "model": "distil-large-v3",
  "language": "en",
  "hotkey": "Super+Space",
  "polish": {
    "enabled": true,
    "base_url": "https://api.openai.com/v1",
    "model": "gpt-4o-mini",
    "api_key_env": "OPENAI_API_KEY",
    "per_app_tone": true
  },
  "hotwords": ["Aavaaz", "qol", "WhisperLive"],
  "inject_method": "type"
}

Tests & CI

cd src-tauri
cargo test          # unit tests (config round-trip, hotkey parser)
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check

CI runs the above on Ubuntu, macOS, and Windows for every push and PR — see .github/workflows/ci.yml.

Hardware-bound paths (audio capture, keystroke injection) and network-bound paths (WebSocket session, Claude polish) aren't unit-tested yet; integration tests with a fake Aavaaz endpoint and a virtual audio device are a TODO.

Status

This is a scaffold. Working / stubbed:

License

MPL-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assets		assets
scripts		scripts
src-tauri		src-tauri
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qol

Architecture

Layout

Prerequisites

Fedora-specific system deps

Wayland (GNOME, KDE Plasma 6 in Wayland mode)

System service running as root (Fedora, Debian/Ubuntu)

Daemon running as your user (Arch AUR user unit)

Verify

Wayland hotkey — use `qol-trigger` + GNOME Custom Shortcut

Run

First run — testing against a local Aavaaz

1. Build qol once

2. Start Aavaaz

3. Disable polish for the first test

4. Run qol with logs visible

5. Dictate

6. Enable polish (optional)

Troubleshooting

What "good" looks like

Settings

Tests & CI

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

qol

Architecture

Layout

Prerequisites

Fedora-specific system deps

Wayland (GNOME, KDE Plasma 6 in Wayland mode)

System service running as root (Fedora, Debian/Ubuntu)

Daemon running as your user (Arch AUR user unit)

Verify

Wayland hotkey — use qol-trigger + GNOME Custom Shortcut

Run

First run — testing against a local Aavaaz

1. Build qol once

2. Start Aavaaz

3. Disable polish for the first test

4. Run qol with logs visible

5. Dictate

6. Enable polish (optional)

Troubleshooting

What "good" looks like

Settings

Tests & CI

Status

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Wayland hotkey — use `qol-trigger` + GNOME Custom Shortcut

Packages