Master in data science and Master in data science [PDF]

The present document describes the learning outcomes and the course organization of new. Master programme in Data Scienc

0 downloads 4 Views 61KB Size

Report

Download PDF

PNG Network

Recommend Stories

Master Executive Data Science para Profesionales

So many books, so little time. Frank Zappa

Master of Science in Geography

What you seek is seeking you. Rumi

Master of Science in Busines

We must be willing to let go of the life we have planned, so as to have the life that is waiting for

master of science in accountancy

Learning never exhausts the mind. Leonardo da Vinci

Master Of Science In Anatomy

I want to sing like the birds sing, not worrying about who hears or what they think. Rumi

Master of Science in Management

Ask yourself: What isn’t working well for you in your current life and career — what drains you, mak

Master of Science in Agriculture

The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

International Master in Computer Science

Ask yourself: When I'm in physical or emotional pain, what are some of the best things I can do for

NORTHCENTRAL UNIVERSITY PERFORMANCE FACT SHEET Master of Science in Data

Kindness, like a boomerang, always returns. Unknown

Master of Science in Innovation and Business

Nothing in nature is unbeautiful. Alfred, Lord Tennyson

Idea Transcript

MASTER IN DATA SCIENCE AND ENGINEERING MASTER IN DATA SCIENCE Learning Outcomes and Organization Academic year 2017-2018

1. Introduction The present document describes the learning outcomes and the course organization of new Master programme in Data Science and Master in Data Science and Engineering proposed by the School of Engineering and the Montefiore Institute of the University of Liège. A number of newly created courses appear in the programme. They are not yet fully documented in the ‘ULg progcours’ database and their outline is therefore included at the end of this document. For further information, please contact Prof. Louis WEHENKEL ([email protected]) or Prof. Guy LEDUC ([email protected]).

2. Learning outcomes The programme is “full English”, i.e. English is the only language used and required in this programme. It aims at developing the following learning outcomes.

Outcome 1: Mastering the scientific fundamentals of Data Science Data Science relies essentially on applied mathematics (probability, statistics, optimisation), on computer science (algorithms, data structures, automata, computational complexity), and on artificial intelligence (machine learning, knowledge representation, automatic reasoning). In order to acquire a sustainable expertise and be able to exploit future techniques, it is paramount to master these scientific fundamentals.

1

Outcome 2: Being able to use computational tools The purpose of Data Science is to extract synthetic and usable knowledge by exploiting realworld data. These data are often of heterogeneous quality, come in high volumes, and typically in very diverse forms (text, numbers, images, time-series). The nature of the knowledge to be extracted from the data may also take various forms (predictive models of behaviour, clusters of homogenous behaviours, subsets of relevant variables). The available tools that may be used to extract knowledge from data include machine learning and optimization toolboxes, programming languages and paradigms, massive data storage and processing systems. The practice of Data Science requires an excellent knowledge of the strengths and weaknesses of these tools and experience in deploying them to develop practical solutions.

Outcome 3: Being able to develop an effective Big Data solution in a real environment A Big Data solution is developed in several stages, including the definition of the target knowledge to be extracted, the choice of the particular data streams to exploit, prototyping of the data processing pipeline, the data collection per se, testing and optimizing the pipeline, presenting the results in a suitable form, and the design of a life-cycle management approach to ensure the sustainability of the proposed solution. In order to make sure that the solution fits the end-user needs, and can be deployed in the target operational environment (research lab, industry, administration, etc.), it is essential to involve the end-users and the management team of the client both at the design and implementation stages, in order to fully understand the nature of the data and of the field constraints. It is therefore necessary to master the principles of Big Data project management, and be able to establish a dialog with the field experts and the IT department of the client, in order to make the right technical choices during the project development.

Outcome 4: Being able to carry out a cost-benefit analysis In order to help companies to make the right choices in terms of leveraging data science, it is necessary to be able to carry out a cost-benefit analysis of a big data project, both about the initial stages of the project, as well as concerning the longer-term strategy. The Data Scientist therefore has to be equipped with a methodology enabling him to carry out such cost-benefit analyses based on the information provided by a company and in close collaboration with the strategic management of the company.

Outcome 5: Understanding the legal and societal implications The use of Data Science solutions may lead to important changes in terms of workload in the companies and/or to exploiting information about people and their actions (workers, clients, general public). In order to be acceptable, these solutions must be in line with the legal and ethical rules of society and of the companies.

2

The Data Scientist must be aware and respectful of the legal and societal implications of the projects he engages in.

3. Programme curriculum The programme is organized in two 60 ECTS credits blocs, each corresponding to a year of study. The first bloc aims at mastering the fundamentals (scientific and technological) of Data Science, its problem-solving methods and the enabling technologies, and their application in the context of a « Big Data Project ». The second bloc includes a Master thesis, and various courses broadening the student’s outlook and/or allowing her to specialize. NB: Q1 means that the course is organized in the first term (September - December 2017), Q2 means that it is organized in the second term (from February - May 2018). BLOC 1 Computer Science, Applied Mathematics and Data Science fundamentals (20 credits): Course ID INFO0016 MATH0461 INFO8006 ELEN0062

Topic Introduction to the theory of computation Introduction to numerical optimization Introduction to artificial intelligence Introduction to machine learning

Credits 5 5 5 5

Term Q1 Q1 Q1 Q1

Professional focus in data science (30 credits): Course ID MATH2021 INFO8002 PROJ0016 ELEN0060 INFO8004 INFO8005

Topic High-dimensional data analysis Large-scale database systems Big data project Information and coding theory Advanced machine learning Semantic Data

Credits Term 3 Q1 5 Q1 7 Q1&Q2 5 Q2 5 Q2 5 Q2

Elective courses (choose 10 credits in the following list): Course ID INFO8003 INFO0027 INFO0049 INFO0010 INFO0045

Topic Optimal decision making for complex problems Programming techniques Knowledge representation Introduction to computer networks Introduction to computer security

3

Credits 5 5 5 5 5

Term Q2 Q2 Q2 Q2 Q2

BLOC 2 Master thesis and internship (30 credits) Management and legal issues (10 credits): Course ID Topic GEST3162 Principles of management DROI8031-n Law of Artificial Intelligence, Robots and Data-Driven Algorithmic Applications

Credits 5 5

Term Q1 Q1

Note: Students who have already acquired the skills and knowledge of GEST3162 (or equivalent) will replace it by GEST3032 (see electives below). Elective courses (choose 20 credits in the following topics): Elective courses in computer science Course ID INFO0027 INFO0049 INFO2049 ELEN0016 INFO0939 INFO0948 INFO0010 INFO0045

Topic Programming techniques Knowledge representation Web and text analytics Computer vision High performance scientific computing Intelligent robotics Introduction to computer networks Introduction to computer security

Credits 5 5 5 5 5 5 5 5

Term Q2 Q2 Q1 Q1 Q1 Q2 Q2 Q2

Credits 5 5 5

Term Q1 Q1 Q2

Credits 5 5 5

Term Q1 Q1 Q2

Topic eBusiness and eCommerce

Credits 5

Term Q1

Topic SAS Certification Scientific research in engineering and its impact on innovation

Credits 5 5

Term Q2 Q2

Elective courses in applied mathematics Course ID MATH2022 MATH0462 MQGE0002

Topic Large sample analysis: theory and practice Discrete optimization Computational optimization

Elective courses in bioinformatics Course ID GBIO0002 GBIO0009 GBIO0030

Topic Genetics and bioinformatics Topics in bioinformatics Computational approaches to statistical genetics

Elective course in management Course ID GEST3032 Miscellaneous Course ID STAT0079 INGE0012

4

With the agreement of the President of the Jury, students may also choose: - Up to 15 credits in the application area of their Master thesis in other programmes of the university, - 5 credits in any other programme of the university.

4. Description of new courses included in the programme In this section, we provide some information about the new courses that appear in the programme but are not yet fully documented on the “ULg progcours” website. Introduction to artificial intelligence (5 ECTS, Th 25h, Pr 10h, Proj 45h) The course aims at giving a perspective both on the AI research goals and on the techniques developed over the years in order to build intelligent agents. It will be based on several chapters of the textbook “AI: a modern approach” (by S. Russel and P. Norvig) used worldwide since 1995 in order to teach essentials of AI. Many of the specialized parts (e.g. first-order logic, machine learning, optimization and control, games, computer vision, robotics) treated in the reference textbook are already covered at a deeper level in companion courses offered in our programs. Therefore, the present course will not address these topics in details. Rather, they will be ‘discussed’ by providing links with the other courses of the curriculum covering them more in details. Topics to be covered: a. The overall goal of AI - AI challenge: the foundations, history and state-of-the-art - Intelligent agents: modelling ‘rational behaviour’ in a ‘complex environment’ b. Problem solving - Basic search methods for single agent problem-solving over a known environment - Discussion on the need for handling complex and partially unknown environments and adverse agents c. Reasoning and planning - Agent reasoning based on propositional theorem proving - Discussion on the need for using higher-order logics - Classical planning: state-space search and planning graphs - Discussion on multi-agent problem solving and knowledge representation d. Managing uncertainties and learning - Discussion on inference and decision making under uncertainties - Discussion on the need for learning and overview of various learning paradigms e. Communicating, perceiving and acting - Natural language processing - Discussion on perception and robotics f. Philosophical foundations and future of AI

5

g. Possible practical projects: - Implementing A* for a problem of interest - Implementing a propositional logic theorem prover High-dimensional data analysis (3 ECTS, Th 15h, Labo 10h, Proj 15h) In this course, the focus is on exploratory techniques for high-dimensional data. First, two dimension reduction techniques based on projections will be considered: - Principal components analysis, which constructs an optimal subspace using the correlation structure in the data, and - Factorial discriminant analysis, which searches for subspaces in which different sub-groups of data are most discriminated. Then, automatic classification algorithms will be developed. These rely on the definition of distances (or dissimilarities) and follow some aggregation methods based on different criteria (closest or farthest neighbours, within and between inertia, ...). Kernel smoothing procedures will also be introduced in this course, both in the density estimation context and in a regression framework. Finally, penalized techniques allowing the handling of flat data (data with more dimensions than subjects) will be discussed, both in the multivariate estimation setting and in the regression setting (lasso and ridge regression). Large-scale database systems (5 ECTS, Th 25h, Pr 10h, Proj 45h) This course studies the architecture, design, and implementation of large-scale database systems. It addresses fundamental concepts of distributed database theory, including design and architecture, security, integrity, query processing and optimization, transaction management, concurrency control, and fault tolerance. It then applies these concepts to large-scale blockchain, data warehouse and cloud computing systems. Cloud computing topics include MapReduce and large-scale cloud databases. Big data project (7 ECTS, Th 10h, Pr 0, Proj 180h) The purpose of this course/project is for the students to apply knowledge acquired in the Data Science and Engineering program to a project involving actual data in a realistic setting. During the project, the students will engage in the entire process of solving a real-word data science project: formalizing the problem, collecting and processing data, applying appropriate analytical methods and algorithms, deploying a solution. The course will offer a number of seminars given by industry experts and covering specific topics relevant for big data solutions: large-scale data storage systems, distributed computing frameworks, data science software libraries, specialized machine learning and statistics topics. The students will work in groups to carry out a practical project over a big dataset, aiming at using the available software and hardware systems for retrieving a specific kind of information from the dataset. The project will be carried out within modern distributed computing and storage environments.

6

Advanced machine learning (5 ECTS, Th 30h, Pr 5h, Proj 45h) This course is complementary to ELEN0062 and can be followed independently of the latter. With respect to ELEN0062, the aim of this course is to provide a deeper and more theoretical coverage of supervised learning techniques. The course will formalize the problem of statistical learning and present the main theoretical tools in the domain (bias-variance trade-off, empirical risk minimization, Bayesian approaches). The main families of supervised learning algorithms will be covered, with an emphasis on modern techniques (kernel methods, ensemble methods, deep learning, Gaussian processes, sparse linear models). Implementation issues and scalability of the algorithms will be discussed. The last part of the course will be devoted to a selection of more advanced techniques to deal with structured input and output spaces (rankings, texts, images, graphs...) and non-standard learning protocols (semi-supervised learning, transfer learning,...). At the end of the class, the students will be able to understand the state of the art in the domain. They will be able to implement, combine, or extend existing algorithms to address complex supervised learning tasks. The course will also aim at providing the necessary background to carry out research in the domain. Depending on the topic, ex-cathedra lectures will be supplemented or replaced by discussions with the students around key papers in the field or by research seminars given by external speakers. Personal student projects will consist either in the implementation and evaluation of advanced algorithms or in critical reading of scientific papers on specific subtopics, depending on the interest and background of the student. Optimal decision making for complex problems (5 ECTS, Th 25h, Pr 10h, Proj 45h) There are numerous decision-making problems that can be formalised as problems for which one needs to maximize a numerical reward (or equivalently minimize a cost) when playing with an environment which is stochastic or (partially) unknown, exhibits little structure (e.g., it is not linear/convex), has a sequential nature (e.g., a sequence of decisions needs to be taken to reach an objective) and/or is adversarial (e.g., an opponent takes its decisions so as to minimize your payoff). Typical examples of such problems are: - The design of artificial intelligences able to learn to play computer games, - The placement of advertisements on webpages to maximize the number of clicks, - Controlling a rocket so as to safely reach a target with minimum fuel costs, - The synthesis of winning strategies for playing with the stock market, - The design of artificial intelligences for autonomous robots, - The design of clinical experiences. The goal of this class is to teach the techniques for taking optimal decisions for such complex problems. These techniques will borrow from results from system theory, probability theory, information theory, supervised learning as well as linear and convex optimisation. Material covered: a. Optimal control theory for interacting with linear systems whose dynamics is fully known. Extension of the results to robust control.

7

b. Multistage stochastic programming for interacting with systems whose dynamics is linear, fully known but stochastic. c. Computation of optimal strategies in discrete and stochastic environments that are perfectly known. Review of dynamic programming and direct policy search techniques. d. Learning to interact with discrete and stochastic environments that are unknown at the beginning of the interaction. Review of model-learning and model-free reinforcement techniques. Review of techniques for solving the exploration/exploitation trade-off. e. Extension of the techniques seen in (c) and (d) to environments having very large and/or continuous action spaces. f. Learning in environments that are partially observable. g. Tree search techniques in single-player environments h. Tree search techniques in multi-player environment. Semantic data (5 ECTS, Th 25h, Pr 10h, Proj 45h) The course will first cover the conceptual foundations of the representation of semantic knowledge and its use in inference, in order to provide a strong theoretical basis for the remaining content. Semantic networks and ontologies will be presented and historical difficulties of reasoning with semantic networks explained. Description logics will be introduced as a theoretical basis for ontology-based reasoning, with appropriate formal semantics and inference algorithms, and their relationship with first order logic explained. The course will then show how these concepts are reused by the semantic web initiative, and present the semantic web standards (description framework, ontologies, query language, rule language). The link between description logics and the ontology web language OWL will be further developed. Finally, the course will illustrate how semantic data are used in modern industrial areas, such as big data, software engineering, and specific business domains (biomedicine, web design and search, document publishing, …). Law of artificial intelligence, robots and data-driven algorithmic applications (Th 20h) This course is an extended version of DROI8031 « Law of robots and AI ». DROI8031 discusses the legal questions related to the regulation of artificial intelligence (AI), a matter of intense acuity with technological development and medium-term marketing services. Amongst the numerous examples that may illustrate this trend, the most emblematic is probably the autonomous car without driver, developed by Google. The development of AI raises profound theoretical questions - opportunity of regulation in a context of technological innovation, the level of regulation (international / local), type of control (self-regulation / binding regulatory, etc.), but also practical ones: rights of AI, AI liability, intellectual property of AI, AI uses for non-commercial purposes, etc. Brand new, the course provides an overview of emerging legal issues related to the emergence of AI and robots. This course will also address the legal aspects of « Big data ». The ex-cathedra lectures will be completed by lectures to be prepared by students.

8

Large sample analysis: theory and practice (Th 24h, Pr 12h, Proj 40h) We first provide an overview of the basic material concerning large sample analysis (including a refresher on convergence modes and all the main approximation theorems). We then devote a chapter to the theory of fixed-n asymptotics (including Berry-Esseen theorems and Stein’s method) and their applications in sample analysis and goodness of fit testing. Particular focus will be devoted to inference and model evaluation of highly complex/intractable probabilistic models such as those arising from Markov Chain Monte Carlo Methods, probabilistic graphical models and deep learning models.

9

Master in data science and Master in data science [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch