Loading content...
Learn how to build modern local AI applications using Ollama with a .NET backend and React frontend. Stream real-time model responses, maintain data privacy, and eliminate cloud dependencies.
Seamlessly connect Ollama’s local inference with .NET APIs and React UIs.
Keep all sensitive data on local infrastructure without cloud exposure.
Deliver live token-by-token responses with low-latency interaction.
Loading content...
Let's discuss your project and create a custom web application that drives your business forward. Get started with a free consultation today.

Modern users expect apps to do more than just display data — they expect them to think a little. Summarize notes, answer questions, adapt as you type.
A few months ago, we started exploring how to make that possible without relying on cloud APIs. Turns out, you can — with Ollama, .NET, and React working together.
This guide walks through how to build a local AI app step-by-step. We'll use Ollama to host models, .NET for the backend API, and React for a live, streaming frontend. You'll see how to get it running locally — no OpenAI key, no data leaving your laptop. (In our internal tests, a mid-range GPU handled this setup surprisingly well once caching warmed up.)
What Is an AI App, Really? At its core, an AI app is just a regular application that can think a little. Instead of following only hard-coded rules, it uses predictive or generative models to make smart decisions or create new content on the fly.
You've already seen them in action — chat assistants that summarize meetings, tools that recommend what to watch next, systems that flag fraud, or apps that write short summaries for you. What makes them “AI-powered” is that they learn from data and use patterns to infer answers instead of relying entirely on fixed logic.
Most AI apps, no matter how advanced, follow a pretty similar pattern:
In simple terms: an AI app listens, thinks, and responds — just like a human assistant, but built from data and code.
We'll build a minimal full-stack AI app:
/api/chat endpoint with streaming enabled.Treat Ollama as a replaceable inference provider behind a clean API boundary. You can swap models or hosting without touching the frontend or business logic.
Let's build something practical to see everything in action — a local meeting assistant. You'll paste your meeting transcript, and the app will let you ask questions like:
This simple use case shows how streaming, prompt control, and local inference come together. Everything happens locally — no external API calls, no data leaving your computer.
Before we code anything, we need Ollama running — this is the local LLM engine that hosts models like Llama 3.
1
2
3
brew install ollama
ollama serve
ollama pull llama3.1:8b-instruct
The first command installs Ollama. The second starts the Ollama service (by default on port 11434). The last one downloads the model (llama3.1:8b-instruct) which we'll use in this example.
Tip: The first time you run ollama pull, it may take a few minutes depending on your internet speed and hardware.
If you prefer to run Ollama in a container, use:
1
2
docker run -d -p 11434:11434 -v ollama:/root/.ollama \
--name ollama ollama/ollama:latest
This runs Ollama in the background (-d), maps the correct port, and saves model data in a volume so you don't have to re-download it.
You can confirm Ollama is running by checking:
1
curl http://localhost:11434/api/tags
If you see a JSON response, you're good to go.
Now we'll create a small .NET 8 API that talks to Ollama. This API will accept a message from the frontend, send it to Ollama, and then stream the model's response back to the browser in real time.
1
2
dotnet new web -n OllamaApi
cd OllamaApi
Replace your Program.cs with the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
using System.Net.Http.Json;
using System.Text.Json;
using System.Text.Json.Nodes;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddCors(o => o.AddDefaultPolicy(p =>
p.WithOrigins("http://localhost:5173")
.AllowAnyHeader()
.AllowAnyMethod()
.AllowCredentials()
));
builder.Services.AddHttpClient("ollama", c =>
{
c.BaseAddress = new Uri("http://localhost:11434");
c.Timeout = Timeout.InfiniteTimeSpan; // for streaming
});
var app = builder.Build();
app.UseCors();
app.MapPost("/api/chat", async (HttpContext ctx, ChatRequest req, IHttpClientFactory httpFactory) =>
{
var client = httpFactory.CreateClient("ollama");
var body = new
{
model = string.IsNullOrWhiteSpace(req.Model) ? "llama3.1:8b-instruct" : req.Model,
messages = req.Messages,
stream = true,
options = new { temperature = 0.2, num_ctx = 4096, keep_alive = "5m" }
};
var httpReq = new HttpRequestMessage(HttpMethod.Post, "/api/chat")
{
Content = JsonContent.Create(body)
};
var httpRes = await client.SendAsync(httpReq, HttpCompletionOption.ResponseHeadersRead, ctx.RequestAborted);
httpRes.EnsureSuccessStatusCode();
ctx.Response.Headers.ContentType = "text/event-stream";
await using var stream = await httpRes.Content.ReadAsStreamAsync(ctx.RequestAborted);
using var reader = new StreamReader(stream);
while (!reader.EndOfStream)
{
var line = await reader.ReadLineAsync();
if (string.IsNullOrWhiteSpace(line)) continue;
var node = JsonNode.Parse(line);
var token = node?["message"]?["content"]?.GetValue<string>();
if (!string.IsNullOrEmpty(token))
{
await ctx.Response.WriteAsync($"data: {token}\n\n", ctx.RequestAborted);
await ctx.Response.Body.FlushAsync(ctx.RequestAborted);
}
if (node?["done"]?.GetValue<bool>() ?? false) break;
}
});
app.Run();
record ChatRequest(string? Model, List<ChatMessage> Messages);
record ChatMessage(string Role, string Content);
http://localhost:11434).POST request comes to /api/chat, the backend:Sends the user's message to Ollama.
Reads each token Ollama generates.
Streams those tokens to the browser using Server-Sent Events (SSE).
The response type text/event-stream allows the frontend to display text in real time as it's being generated — just like ChatGPT typing. Options like temperature and num_ctx can be tuned to adjust creativity or context size. For summarization tasks, keeping temperature low (0.1–0.3) gives more consistent results.
Finally, let's build a React interface that connects to our API and shows live model responses.
1
2
3
npm create vite@latest react-ollama -- --template react
cd react-ollama
npm i
This sets up a modern, fast React environment with ES modules and instant reload.
This component:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import { useState, useRef } from "react";
export default function Chat() {
const [messages, setMessages] = useState([]);
const [input, setInput] = useState("");
const [streamingAnswer, setStreamingAnswer] = useState("");
const answerRef = useRef("");
const send = async () => {
const payload = {
model: "llama3.1:8b-instruct",
messages: [
...messages,
{ role: "system", content: "You are a helpful meeting assistant." },
{ role: "user", content: input }
]
};
setStreamingAnswer("");
answerRef.current = "";
const res = await fetch("http://localhost:5189/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload)
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { value, done } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
chunk.split("\n\n").forEach(line => {
if (line.startsWith("data: ")) {
const token = line.slice(6);
answerRef.current += token;
setStreamingAnswer(answerRef.current);
}
});
}
setMessages(prev => [
...prev,
{ role: "user", content: input },
{ role: "assistant", content: answerRef.current }
]);
setInput("");
};
return (
<div style={{ maxWidth: 720, margin: "2rem auto", fontFamily: "sans-serif" }}>
<h2>Local Meeting Assistant</h2>
<div style={{ border: "1px solid #ddd", padding: 16, borderRadius: 8, minHeight: 240 }}>
{messages.map((m, i) => (
<div key={i} style={{ marginBottom: 8 }}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
{streamingAnswer && (
<div style={{ marginTop: 8, opacity: 0.9 }}>
<strong>assistant:</strong> {streamingAnswer}
</div>
)}
</div>
<textarea
rows={4}
style={{ width: "100%", marginTop: 12 }}
placeholder="Paste your notes or ask a question..."
value={input}
onChange={(e) => setInput(e.target.value)}
/>
<button onClick={send} style={{ marginTop: 12 }}>Ask</button>
</div>
);
}
The send() function posts the user's input to your .NET API. The backend streams the AI's answer token-by-token. React updates the display immediately with each token, so you see the model “thinking” in real time.
Before running the app:
5173) and .NET API (5189).http://localhost:5173.Example query: Paste your meeting transcript and ask: “Summarize today's engineering sync in bullet points.” Within a second or two, you'll start seeing the AI's summary appear line by line.
Before connecting everything, let's make sure both Ollama and your .NET API are working as expected. These quick checks help confirm your setup and rule out any networking or configuration issues.
Run the following commands in your terminal:
# List available models
1
curl http://localhost:11434/api/tags
This should return a JSON list of models currently available in your local Ollama instance. If you see something like:
1
{"models":[{"name":"llama3.1:8b-instruct"}]}
— perfect, Ollama is up and running.
Now, let's test a simple prompt directly through the Ollama API:
1
2
3
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"llama3.1:8b-instruct","messages":[{"role":"user","content":"Hello"}],"stream":false}'
If everything is set up correctly, Ollama will respond with JSON similar to this:
1
2
3
4
5
{
"model": "llama3.1:8b-instruct",
"response": "Hello! How can I help you today?",
"done": true
}
That confirms Ollama can receive prompts and generate text locally.
Next, let's verify that your .NET backend is talking to Ollama properly. In a new terminal window, run:
1
2
3
curl -N -X POST http://localhost:5189/api/chat \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Give me three meeting action items based on: finalize budget, email vendor, schedule training."}]}'
Here's what's happening:
-N keeps the connection open so you can see streamed output.If it's working, you'll see a stream of responses like:
1
2
3
data: Finalize project budget approval with finance.
data: Send updated proposal email to the vendor.
data: Schedule internal training session next week.
Each data: line represents a new chunk of text sent by the model in real time.
If you see tokens streaming like above — success! If it hangs with no output, check:
ollama serve and dotnet run),11434 for Ollama, 5189 for .NET),Once both commands work, your stack is ready — you can open the React app and start chatting with your local AI assistant.
AbortController./api/chat returns line-delimited JSON when stream=true.temperature between 0.1 - 0.3.num_ctx. Summarize or chunk inputs first, then ask targeted questions.keep_alive while capping concurrent sessions.Connecting Ollama with .NET and React provides a pragmatic blueprint for building full-stack local AI apps that are fast, private, and cost-efficient. You've seen how to:
Start with a meeting assistant example, then extend it with retrieval, authentication, and production-grade observability.
If you'd like support designing or shipping your next AI-powered UI, our team can help with architecture, implementation, and model strategy. Explore our services at /services/ai-strategy and /services/full-stack-development and let's build something intelligent, together.
Ready to optimize your AI strategy? Partner with Moltech Solution for hybrid AI deployments, private LLM benchmarking, and expert guidance to get the best of local and cloud AI—secure, fast, and cost-efficient
Let's connect and discuss your project. We're here to help bring your vision to life!