Skip to content

feat: two-way audio (listen + talk), all platforms#26

Merged
keyldev merged 12 commits into
mainfrom
feat/phase-17-audio-listen
Jun 21, 2026
Merged

feat: two-way audio (listen + talk), all platforms#26
keyldev merged 12 commits into
mainfrom
feat/phase-17-audio-listen

Conversation

@keyldev

@keyldev keyldev commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

listen to the camera's mic and push-to-talk into its speaker. FFmpeg is used only for audio decode/encode; playback, capture and the backchannel are native per platform

Related

Type

  • Bug fix
  • Feature
  • Refactor / cleanup
  • Docs / CI
  • Other:

Checklist

  • Builds with 0 warnings (TreatWarningsAsErrors=true).
  • Tests pass (dotnet test); new Core logic has unit tests.
  • No layering violation — App references Core only (Infrastructure / Video / Devices wired via DI in a head).
  • Scope stays within one phase (didn't pull work from a later phase's "Не входит").
  • README / docs updated if public commands, options, or setup changed.

Platforms tested

  • Windows
  • Linux
  • macOS
  • Android
  • iOS
  • CI build only

Screenshots / notes

keyldev added 12 commits June 21, 2026 00:39
@
feat(audio): phase 17 listen slice — camera audio + volume/mute

- Core: AudioFrame, IVideoSession.AudioFrames, EnableAudio flag, IAudioOutput/NullAudioOutput, AudioMonitor (one-source policy + software gain)
- Video: FfmpegVideoSession decodes audio via swresample → S16/48k/stereo; auto-reconnect forwards it
- Desktop: native WASAPI shared-mode renderer (AUTOCONVERTPCM, ring buffer)
- App: speaker toggle + volume slider on single-camera page, default muted, persisted
@
@
feat(audio): gate speaker controls on real audio (phase 17.4)

- runtime capability detect: show mute/volume only once the camera decodes an audio frame
- hide on cameras with no mic instead of teasing silent playback
- reset detection on stream reload / camera switch
@
@
feat(audio): grid tile listen via dynamic SetAudioEnabled (phase 17)

- IVideoSession.SetAudioEnabled toggles audio decode on a live session, no video blip
- FfmpegVideoSession lazily sets up/tears down audio decoder in the loop; sticky across reconnects
- per-tile speaker button; one-source policy displaces the previously-listening tile
@
@
feat(audio): Majestic "enable audio" hint (phase 17.4)

- parse/write audio.enabled in Majestic config (MajesticConfig/Patch + HttpClient)
- single-camera page: when camera has audio off, show hint + one-tap enable
- enabling POSTs audio.enabled=true and reloads so the track appears
@
@
feat(audio): talk foundation — G.711, RTP, mic capture (phase 17.6)

- G711 µ-law/A-law encode/decode (pure C#) + tests
- RtpPacketizer for the backchannel uplink + tests
- IAudioInput/NullAudioInput contract; native WASAPI mic capture on Windows (8k mono via AUTOCONVERTPCM)
@
@
feat(audio): two-way talk via ONVIF backchannel (phase 17.5/17.6)

- RtspBackchannelClient: minimal RTSP (DESCRIBE+Require/SETUP TCP-interleaved/PLAY), Basic/Digest auth, SDP sendonly track, interleaved RTP send + keepalive
- IAudioBackchannelClient/Session contracts; PushToTalkController wires capture->G711->RTP->send (tested with fakes)
- single-camera mic button (tap to talk), gated on mic availability; backchannel failures surface as TalkError
- NOTE: needs a Profile-T camera with a speaker to validate end-to-end
@
@
feat(audio): mic capture on all platforms (phase 17.6)

- Linux AlsaAudioInput (libasound), macOS CoreAudioInput (AudioQueue)
- Android AudioRecord (+RECORD_AUDIO manifest), iOS AVAudioEngine (+NSMicrophoneUsageDescription)
- registered per head; 8k mono S16 to match the G.711 backchannel
- verified compile: Win/Linux/macOS/Android; iOS builds on Mac only
@
@
feat(audio): ONVIF audio capability detection (phase 17.4)

- MediaProfile/OnvifProbeResult carry HasAudioIn (mic) + HasAudioOut (speaker/backchannel)
- detect from AudioEncoderConfiguration + Extension.AudioOutputConfiguration; OR across profiles
@
@
feat(audio): persist ONVIF audio caps on Camera, gate talk button (phase 17.4)

- Camera.HasAudioIn/HasAudioOut + migration 011 + repo mapping
- SaveOnvifMetadataAsync writes them from the probe during onboarding
- talk button hides only when a probed ONVIF camera reports no speaker
@
@
fix(audio): no spurious sink Stop on first attach; green tests

- AudioMonitor.DetachLocked skips Stop when nothing was attached
- fix AudioMonitor one-source test to attach by session id
- fix stale FakeUserSettings (missing AiAcceleration) blocking Infrastructure tests
@
@
feat(audio): native playback on all platforms (phase 17.2)

- Linux AlsaAudioOutput, macOS CoreAudioOutput (shared PcmRing), Android AudioTrack, iOS AVAudioPlayerNode
- registered per head -> listen now works everywhere, not just Windows
- verified compile: Win/Linux/macOS/Android; iOS builds on Mac only
@
@
feat(audio): hold-to-talk, tile speaker gating, Android mic prompt (phase 17.6)

- push-to-talk is now press-and-hold (BeginTalk/EndTalk via pointer events), guards release-during-connect
- grid tile speaker hidden only when a probed ONVIF camera reports no mic
- request RECORD_AUDIO at Android startup
@
@keyldev keyldev merged commit b06934f into main Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant