Ivan Pidov shares practical lessons from real io.Intelligence client projects and engagements.
1. Use front-end AI when visibility matters
One of the main design choices was to let the AI work through the front end of the application instead of connecting only to backend APIs. If the assistant navigates to a report, applies a filter, or extracts data from a table, the user can verify the result in the application they already trust.
This also helps with access control. Since the assistant operates in the context of the current user, it inherits the user’s existing permissions. The AI can only access what the user can access, instead of requiring a separate authorization model for backend API access.
2. Don’t send all your data to the model
Tools should shape the data before it reaches the model. That can mean filtering, paging, aggregation, summarization, or returning only the fields that are relevant to the user’s task.
LLMs can analyze large datasets, but that does not mean they should. It is usually slower, more expensive, and harder to control.
3. Keep the system prompt small
Another instinct is to put all business rules, workflows, and domain knowledge into the system prompt. That sounds useful, but it creates problems.
The system prompt is sent with every request. If it contains too much information, the model has to process irrelevant instructions over and over again. It may also apply the wrong rule, confuse workflows, or get distracted by context that does not apply to the current task. A better pattern is to move domain knowledge into a separate layer, such as a RAG/vector database or workflow lookup service. Then the assistant can retrieve only the specific workflow or business logic needed for the current question.
4. Put cost controls at the user and application level
Cost controls should exist per application and per user. LLM gateways can help because they give better visibility into which users, applications, and workflows are consuming tokens. They also make it easier to apply usage caps and enforce policies centrally.
5. Start with the strongest model, then optimize down
For early prototypes, it can be tempting to start with a cheaper model. In practice, this can make debugging harder.
If the assistant gives poor results, it becomes unclear whether the issue is the application logic, the tools, the prompts, or simply the model’s capability.
The better approach is to start with a strong model and use it to establish a working baseline. Once the workflow behaves correctly, test cheaper or smaller models and see how far you can reduce cost while keeping acceptable quality.
For simple tasks like summarization, smaller models may be enough. For complex reasoning and tool use, starting with the best model can save time.
6. Use separate API keys for users and applications
Separate keys for users and applications make debugging, reporting, and cost allocation much easier. They also support better governance once the project moves beyond a small prototype.
7. Run a hackathon-style implementation
We found that a short, focused hackathon worked better than a usual long asynchronous implementation process. With technical people, subject matter experts, and stakeholders in the same room, decisions were made faster. The team could define the use case, build quickly, test immediately, and iterate based on feedback.The lesson is that AI prototypes often fail not because the code is difficult, but because teams move slowly on decisions. A focused build session can reduce that friction.