Transcription plays an important role in everyone’s life today and for the German public sector it is one of the cornerstones that needs to be tackled when it comes to digitalization. With our AI-powered solution based on microservices, we at Appsfactory have taken the transcription process to a new level and ensured that the data provided remains secure.
Authors: Rolf Kluge, Aidin Azimi, Alina Drahozhylova, Frank Wockenfuß
Name? ASEL. Mission? Workload reduction.
In 2021, we started a project to develop a speech-to-text solution that would make the work of authorities much more efficient and thus reduce the workload, while complying with data protection regulations. The solution we came up with is called ASEL (literally: Automatische Spracherkennungs Lösung) and is now used by the Polizeiverwaltungsamt Sachsen (police administration of Saxony)via web portal, with the dictation features also being available in the mobile app. It will probably become the standard for all 16 federal states as well as the Federal Police, the Federal Criminal Police Office, the Customs Criminal Police Office, and the Police at the German Bundestag. ASEL provides transcription and translation of audio, video and text files using a microservices approach including a wide range of AI services. As a result, users do not need to manually transcribe or translate their files. This enables them time to focus on their most important topic: The people behind the cases.
Distributed Services offer high agility for quick updates without disrupting the entire system. Such a ‘plug-and-play’ solution also provides the flexibility of using the ideal service combination for each specific use case.
By combining up to four different AI engines we ensure sufficient capabilities in addressing a wide range of processes. Because each engine has its own service, the models behind them can be easily replaced to continuously ensure the best results. Such versatility empowers users to leverage the existing AI engines for their specific requirements and to take advantage of the latest improvements and trends in the fast-paced development in AI. Different AI models play a critical role in the success of our solution. Features such as keyword extraction and speaker recognition provide strategic advantages by helping to enrich content and associate text with specific speakers. Additionally, the Vocabulary Adaptation feature ensures contextually relevant results by optimizing the text output to include industry-specific language nuances. To help drive the project forward, user feedback is being continuously evaluated and integrated into the system. Incorporating this user feedback, gathered by our UX Research department, has led to an ideal adaptation to the specific needs of the users interacting with our solution on a day-to-day basis.
As ASEL is specifically designed for dictation in the context of initial police action at the scene, video interrogations — data security is a top priority. Advanced data security measures are built into the solution to ensure the confidentiality and integrity of sensitive information throughout the transcription and translation process. The solution can be provided as a SaaS on cloud (in the secure police cloud) and completely on-premise, meaning that no third parties have access to sensitive information and it is 100% data protection compliant.
UI as flexible as your Work
We have already noted that ASEL is designed to adapt to ever-changing situations, and the user interface should not be neglected here. Our solution offers a fusion of an editor and a player for advanced word or text processing and is packed with a large number of features, such as formatting, commenting, export etc. Text is also enriched with valuable information from the transcription services to help with analyzing and proofreading the output. From seamless editing tools to robust transcription and translation capabilities, the UI offers a user-friendly experience that allows people to interact with text effortlessly and efficiently.
How We Made It Happen
As mentioned above, this project is still ongoing and set up to be a long term success story. The numerous technologies and solutions are already impressive and show versatility and strength, but it gets even more impressive when you look at the details. For example, one of our complex solutions boasts 145 projects, compared to the average of about 50 projects.. We used a mix of technologies, including distributed services, actor systems, live streaming, transcription and async communication using a queueing system. Additionally, we have implemented comprehensive monitoring using ELK (Elasticsearch, Logstash, Kibana) and Matomo to provide real-time dashboards with insights into system performance and user behavior. With this diverse toolkit, we exceed our customer’s expectations by delivering unparalleled value. All of this is made possible by a dedicated team of approximately 20 #Appsfactorians ranging from development, design, project management to quality assurance.