Hardship Level (not applicable for home-based)H (no hardship)
Family Type (not applicable for home-based)
Family
Staff Member / Affiliate TypeUNOPS IICA1
Target Start Date2024-04-02
Job Posting End DateFebruary 29, 2024
Terms of ReferenceTitle: Assistant Data Scientist
Location: Copenhagen, Denmark
Duration: 02/04/2024 to 01/10/2024 (with possibility of extension)
Contract Type: UNOPS IICA-1/LICA-8 (P-1/NO-A equivalent)
Background and Organizational Context
UNHCR, the UN Refugee Agency, is a global organization dedicated to saving lives, protecting rights, and building a better future for refugees, forcibly displaced communities, and stateless people. Every year, millions of men, women, and children are forced to flee their homes to escape conflict and persecution. UNHCR teams are in the field in over 130 countries, using their expertise to protect and care for over 100 million people.
The Statistics and Demographics Section which is located in UNHCR’s Global Data Service (GDS) in Copenhagen, develops statistical definitions and standards, oversees the production of global statistics on forcibly displaced and stateless persons, and applies innovative methods to answer questions related to forced displacement. Access to reliable, timely, and accurate data is essential to achieve the aims of the Data Transformation Strategy 2020-2025, which is to position UNHCR as a trusted leader on data and information related to forcibly displaced and stateless persons. The Section develops external products such as the annual Global Trends report, which analyzes the changes in forced displacement and statelessness and deepens public understanding of ongoing crises.
The data science team within the section uses innovative methods and big data sources to help fill data gaps with respect to forced displacement statistics which cannot be estimated through “traditional” data sources (household surveys, administrative records, or population censuses), such as improving the timeliness and granularity of the estimates; generating early warning of emerging issues and crises; producing reliable forced displacement forecasts at the country level for programming and planning; improving understanding of both local impacts and larger geographic patterns; visualizing data for more nuanced and accessible insights; and improving analysis to help corroborate or refute hypotheses related to specific phenomena. Increasingly, data science methodologies are being applied to operational challenges, which include areas of integrity and fraud in large transactional databases and in critical systems similar to national CRVS setups.
Purpose and Scope of Assignment
UNHCR’s registration system (proGres) exists to establish the identity of refugees and other forcibly displaced and stateless persons upon first encounter. This is essential to deliver protection and assistance subsequently. Registration data covers biographical data, demographic characteristics, socio-economic variables, and specific needs. Registration data underpins critical delivery of services from cash delivery to travel to third countries for resettlement. In a number of countries, UNHCR registration data serves as KYC (know-your-customer) data. UNHCR’s registration system (proGres) includes 17 million registered individuals and is managed by over 15,000 operators worldwide, covering over 130 countries.
To ensure the quality of the data contained in this system and to maintain the integrity of the data, the use of data science techniques is increasingly sought after, for instance, to identify outliers, including in the context of fraud detection. To this effect, data science methods need to be developed and implemented to advance on integrity, including faster detection of potential duplicates or the unusual use of the system itself.
In addition, proGres is used by UNHCR teams around the world to support the identification of refugees for the purpose of finding solutions, such as resettlement to third countries. This is presently done using qualitative data and interview processes. For the purpose of this assignment, data science models will be scoped to predict the probability of individuals to be successful in a resettlement process, thereby allowing for a segmentation approach and shortening the overall identification period for the majority of the individuals under consideration.
Duties and responsibilities
In order to carry out this function, the Assistant Data Scientist will have the following duties:
Deduplication tool
• Collaborate with the team on the development of the data science methodology to support deduplication within UNHCR registration data, with responsibilities including:
- Data familiarization: Gain a deep understanding of the UNHCR registration data structure and data entities as well as deduplication policy and processes as applied on a daily basis in UNHCR operations.
- Database understanding: Analyze data entities, relationships, and data quality issues within registration data.
- Identifying and developing suitable models: Utilize machine learning algorithms to implement deduplication, integrating data from the registration data.
- Collaboration and communication: Continuously collaborate with relevant teams within the organization on registration data to gather requirements and ensure alignment.
• Identify and pre-process data: Utilize data cleansing and transformation techniques to prepare data from registration data.
• Model development and optimization: Training and testing machine learning models for the detection of duplicate case entries within the registration data, incorporating data from proGres.
• Documentation: Ensure thorough documentation of all processes, including model selection, data preprocessing steps, and model performance, as well as the integration of registration data.
Support for Fraud Detection Tools
• Collaborate with the registration team to support the development of ML/AI-enhanced fraud detection tools, focusing on operator-driven manipulations and fraud.
• Conceptualize a scalable set of analysis products, which can be deployed to all 130+ country operations using the system.
Resettlement model
• Understand the complexities of the resettlement process, selection criteria and the nuances of UNHCR's proGres data that inform the process.
• Draw on good practices in operations, having developed a data-driven resettlement selection process.
• Identify relevant data entities within proGres and recognize challenges and caveats in resettlement data.
• Support with the development, testing, and evaluation of predictive models capable of assessing the probability of individual resettlement based on historical proGres data and selected features.
• Documentation: Ensure comprehensive documentation of the entire process within Microsoft Power Platform, including key steps and limitations of AI models and the use of proGres data.
Monitoring and Progress Controls
Main deliverables:
• Support the development and deployment of a deduplication tool for identifying duplicates in UNHCR’s registration database.
• Support the development of a predictive model based on registration data to assess statistical probabilities for the purpose of expediting resettlement identification.
• Support the development of fraud detection algorithms based on registration data, specifically log data.
• Communicate with other internal teams to develop collaborative frameworks and to harmonize ways of working together.
Qualifications and Experience
Education:
University Degree in statistics, data science, mathematics, economics, or other quantitative social sciences.
Required Work Experience:
- 1 year of relevant experience with Undergraduate degree; or no experience with Graduate degree; or no experience with Doctorate degree
- Demonstrated experience in data collection, management, cleaning, processing, and applied data analysis.
- Demonstrated experience user of the statistical programming languages R and/or Python.
- Demonstrated experience with using SQL.
- Demonstrated experience working with alternative data sources and/or statistical learning methods.
- Demonstrated experience in ensuring the operational relevance of analytical and/or research work.
- Demonstrated experience writing technical reports.
- Demonstrated experience presenting work to both technical and non-technical audiences.
Desirable Work Experience:
- Familiarity with Azure.
Key Competencies
• Excellent data management skills, including the ability to process, clean, and transform data.
• Excellent knowledge of the theory and application of statistical learning methods is required.
• Proficiency in R, Julia, and/or Python is a requirement.
• Good data visualization skills to represent findings and estimations in a graphic format is required.
• Excellent writing skills and ability to draft reports and develop materials for documentation purposes is required.
• Familiarity with Github, SQL, and Azure is a plus.
• Willingness to experiment in data innovation and alternative data sources and push the boundaries in applying technical skills for development and humanitarian action.
• Excellent organizational skills and ability to work flexibly, creatively and within established deadlines is required.
• Excellent communication skills in explaining findings and techniques to technical and non-technical audiences.
• Excellent knowledge of English is required with working knowledge of another UN language is considered an asset.
• Knowledge about forced displacement and the principles and concepts of international protection related topics is a plus.
Location and Conditions
The successful candidate will be based in Copenhagen, Denmark.
Shortlisted candidates might be required to sit for a written test. Only shortlisted candidates will be notified. No late applications will be accepted.
This vacancy is open to applicants residing in Denmark as well as those residents in other countries. The remuneration level and the applicable entitlements and benefits may differ based on the residence of the most suitable selected candidate.
Please note that UNHCR does not charge a fee at any stage of its recruitment process (application, interview, meeting, travelling, processing, training or any other fees).
All UNHCR workforce members must individually and collectively, contribute towards a working environment where each person feels safe, and empowered to perform their duties. This includes by demonstrating no tolerance for sexual exploitation and abuse, harassment including sexual harassment, sexism, gender inequality, discrimination, and abuse of power.
As individuals and as managers, all must be proactive in preventing and responding to inappropriate conduct, support ongoing dialogue on these matters and speaking up and seeking guidance and support from relevant UNHCR resources when these issues arise.
,
,
,
,
Skills
EducationBachelor of Arts (BA): Social Science (Required), Bachelor of Science (BS): Data Science (Required), Bachelor of Science (BS): Economics (Required), Bachelor of Science (BS): Mathematics (Required), Bachelor of Science (BS): Statistics (Required)
Certifications
Work Experience
Other informationThis position doesn't require a functional clearance
Home-BasedNo