Responsible AI Data Scientist - Language Models (m/f/d)

Resaro AI

Deutschland

EUR 80.000 - 100.000

Jobbeschreibung

Resaro was founded on the belief that AI will change the world in ways we cannot even imagine, but every new technology needs safeguards.

As AI adoption increases, the challenge in the next decade is to harness AI safety with the appropriate levels of governance and assurance to build trust in these advanced algorithmic systems. Most enterprises do not have the capability to do this and we are a new AI assurance venture that provides services to validate AI systems for accuracy, robustness, explainability, fairness, privacy and security.

We are looking for a data scientist to be based in Munich, with experience in deep learning based language models to work with a category-defining AI assurance venture that will help companies test and audit their AI systems. You will help evaluate and stress-test AI models to make sure they are fit for purpose and safe to be deployed. We value strong technical ability and real world experience and there will be room to solve challenging problems and adopt cutting edge technology into business applications.

YOU WILL

Perform technical AI evaluations, benchmarking and "red-team" tests on large language models, including assessing them for robustness in performance, embedded biases, vulnerability to jailbreaks and prompt injection attacks.
Lead the design, implementation, and execution of robust evaluation framework for Large Language Models (LLMs), including but not limited to GPT-based models, LSTM, BERT, T5, and other state-of-the-art architectures.
Evaluate the performance of generative AI systems, including text and multi-modal models. This will be for foundation models, fine tuned models as well as end-to-end systems.
Establish and refine metrics and benchmarks for model quality, including output fidelity, diversity, creativity, and bias detection as well as for evaluating model performance, such as perplexity, BLEU, ROUGE, accuracy, coherence, factual consistency, and bias detection.
Lead efforts in curating and managing large, high-quality datasets for evaluating LLMs, ensuring data is representative, unbiased, and ethically sourced.
Work with clients and more junior team members to design custom evaluation approaches using the latest scientific research that address the client's needs.
Work with the product management team to develop a suite of technical and analytical AI evaluation frameworks and tools that are backed by scientific research and methods. These should assess the robustness, explainability, fairness, privacy, safety and security of AI and machine learning systems, with a strong focus on large language models.
Mentor junior data scientists, guiding them in best practices for LLM evaluation and the latest advancements in NLP.
Stay up-to-date with the latest advancements in Natural Language Processing (NLP) and LLM evaluation, applying cutting-edge methods and tools to improve model performance.

YOU ARE ABLE TO

Think from first principles and want to tackle the most challenging technical problems from a multi-disciplinary approach e.g., design, engineering and social science.
Lead by example regardless of whether you are a manager or individual contributor. You want to work with passionate and talented individuals and people want to work with you.
Communicate in an open, frank and respectful manner.
Thrive in a fast-paced environment.
Navigate uncertainty by being willing to explore, while remaining laser focused on the mission at hand.

YOU HAVE

Experience as a data scientist training or deploying deep learning based natural language models/large language models in real-world contexts. About 5-8 years of working experience or a relevant postgraduate degree with 2+ years of working experience building and deploying LLMs. (We are hiring candidates across different levels).
Strong experience in evaluating LLMs using metrics such as perplexity, BLEU, ROUGE, and human-centered evaluation techniques.
Proven track record of managing and analyzing large, complex language datasets, including text preprocessing and tokenization.
Understanding of methods to make LLM decisions interpretable and understandable.
Interest in machine learning for anomaly detection and applying natural language processing in the fields of cybersecurity and fraud detection.
Excellent written and verbal communication skills, with the ability to clearly explain complex technical concepts to diverse audiences, including non-technical stakeholders.
Solid programming skills in Python and experience building automated pipelines for continuous model evaluation.
Passion and interest in applied research on the safe and responsible use of AI and with large language models.

NICE TO HAVE

Published research in the field of generative AI or model evaluation.
Hands-on experience with model explainability tools and methods.
Familiarity with cloud-based platforms (e.g., AWS, GCP) for scalable model evaluation and deployment.

ABOUT US

Resaro is a leading AI Assurance company with offices in Singapore and Munich, pioneering the field of artificial intelligence safety and reliability. We are a team of AI experts, engineers, and data scientists. Our mission is to ensure that AI systems are developed and deployed in a way that is safe, secure, and aligned with human values. The future of society, business and the world depends on responsible AI adoption.

Resaro is an Equal Opportunity Employer. We respect each individual and support the diverse cultures, perspectives, skills and experiences within our teams.

Erhalte deine kostenlose, vertrauliche Lebenslaufüberprüfung.

Datei wählen oder lege sie per Drag & Drop ab