透過azure openai / openai api with vision功能,讓人工智慧看懂圖片

做了個小玩具  網址: https://shoushan.happyweb.com.tw

上傳圖片後,可以做出

一、依照圖片內容生成商品文案給社群小編行銷

二、人工智慧ai 描述辨識圖片

三、上傳餐廳、飲料店等菜單,透過ai辨識回傳json

四、行為偵測,例如上傳照片 讓ai看看有沒有犯法或違法

五、上傳一張網頁的圖片/手繪的prototyp,然後生成規格書與欄位內容,最後搞出一個前端的prototype

c# 即時錄音送至openai whisper 翻譯/逐字稿

最近試著做即時翻譯這件事,透過安裝naudio。把麥克風聲音錄下後,每10秒轉成一個檔案上傳至open AI whisper做即時翻譯或逐字稿:


private const string API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
private const string API_URL = "https://api.openai.com/v1/audio/transcriptions";
var waveIn = new WaveInEvent();
waveIn.WaveFormat = new WaveFormat(16000, 1);

var buffer = new MemoryStream();
var writer = new WaveFileWriter(buffer, waveIn.WaveFormat);

//var writer = new WaveFileWriter(new DisposeStream(buffer), waveIn.WaveFormat);

waveIn.DataAvailable += async (sender, e) =>
{
writer.Write(e.Buffer, 0, e.BytesRecorded);
if (buffer.Length > 16000 * 2 * 10) // 每10秒
{
var audioData = buffer.ToArray();
buffer.SetLength(0);
buffer.Position = 0;
await SaveAudioChunkAsMp3(audioData, waveIn.WaveFormat);
await SendAudioChunk(audioData);
}
};

Console.WriteLine(“開始錄音。按任意鍵停止…”);
waveIn.StartRecording();
Console.ReadKey();

waveIn.StopRecording();
writer.Dispose();
buffer.Dispose();

static async Task SendAudioChunk(byte[] audioData)
{
using var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add(“Authorization”, $”Bearer {API_KEY}”);

using var content = new MultipartFormDataContent();
content.Add(new ByteArrayContent(audioData), “file”, “audio.wav”);
content.Add(new StringContent(“whisper-1”), “model”);
content.Add(new StringContent(“language”), “zh-hant”);

var response = await httpClient.PostAsync(API_URL, content);
var result = await response.Content.ReadAsStringAsync();
Console.WriteLine($”Transcription: {result}”);
}