Please Note: Our conference venue has now reached its full capacity but you can still register to attend remotely.
We are excited to invite you to LING4S: The first workshop on Linguistics and Graphs for Space! This event is devoted to promoting a dialogue and connecting professionals working with Natural Language Processing (NLP) and Knowledge Graphs (KG) within the space sector. This workshop will be held on the 13th of March at ESTEC, with hybrid options available.
This workshop is an opportunity to for multiple organizations, national agencies, academia and industry representatives to come together to discuss how the recent developments in the fields of computational linguistics and KGs can be used to tackle challenges in the sector and explore areas of research and potential applications.
Participants will have the opportunity to exchange ideas, advance knowledge and discuss on topics such as, but not limited to:
The event will feature keynotes from experts in the field, short presentations on recent and ongoing research, and brainstorming sessions. Registration for LING4S is free of charge but is required for participation.
We encourage you to bring your ideas and questions to the workshop and participate in interactive discussions with your peers. This is a great opportunity to network with others who are interested in natural language processing, knowledge graphs and the space sector.
We hope to see you there!
The European Space Agency is well known as a powerful force for scientific discovery in numerous areas related to Space. The amount and depth of the knowledge produced throughout the different missions carried out by ESA and their contribution to scientific progress is enormous, involving large collections of documents like scientific publications, feasibility studies, technical reports, and quality management procedures, among many others. Through initiatives like the Open Space Innovation Platform, ESA also acts as a hub for new ideas coming from the wider community across different challenges, contributing to a virtuous circle of scientific discovery and innovation. Handling such wealth of information, of which large part is unstructured text, is a colossal task that goes beyond human capabilities, hence requiring automation. We present a methodological framework based on artificial intelligence and natural language processing and understanding to automatically extract information from Space documents, generating value from it, and illustrate such framework through several case studies implemented across different functional areas of ESA, including Mission Design, Quality Assurance, Long-Term Data Preservation, and the Open Space Innovation Platform. In doing so, we demonstrate the value of these technologies in several tasks ranging from effortlessly searching and recommending Space information to automatically determining how innovative an idea can be, answering questions about Space, and generating quizzes regarding quality procedures. Each of these accomplishments represents a step forward in the application of increasingly intelligent AI systems in Space, from structuring and facilitating information access to intelligent systems capable to understand and reason with such information.
In this presentation, we provide an overview of several activities led at the European Space Agency/ESTEC, involving Natural Language Processing (NLP) and Knowledge Graphs (KGs) methods. The applications presented cover various fields: Knowledge Management (KM) of people's skills and of missions' descriptions, trends identification from conferences proceedings, the development of ontologies, and explainability in spacecraft operations. Although not exhaustive, this presentation aims to give a foretaste of the potential of applied NLP to support space projects.
Scientific publications in space science contain valuable and extensive information regarding the links and relationships between the data interpreted by the authors and the associated observational elements (e.g., instruments or experiments names, observing times, etc.). In this reality of scientific information overload, researchers are often overwhelmed by an enormous and continuously growing number of articles to access in their daily activities. The exploration of recent advances concerning specific topics, methods and techniques, the review and evaluation of research proposals and in general any action that requires a cautious and comprehensive assessment of scientific literature has turned into an extremely complex and time-consuming task.
The availability of Natural Language Processing (NLP) tools able to extract information from scientific unstructured textual contents and to turn it into extremely organized and interconnected knowledge, is fundamental in the framework of the use of scientific information. Exploitation of the knowledge that exists in the scientific publications, necessitates state-of-the-art NLP. The semantic interpretation of the scientific texts can support the development of a varied set of applications such as information retrieval from the texts, linking to existing knowledge repositories, topic classification, semi-automatic assessment of publications and research proposals, tracking of scientific and technological advances, scientific intelligence-assisted reporting, review writing, and question answering.
The main objectives of TACTICIAN are to introduce Artificial Intelligence (AI) techniques to the textual analysis of the publications of all ESA Space Science missions, to monitor and evaluate the scientific productivity of the science missions, and to integrate the scientific publications’ metadata into the ESA Space Science Archive. Through TACTICIAN, we extract lexical, syntactic, and semantic information from the scientific publications by applying NLP and Machine Learning (ML) algorithms and techniques. Utilizing the wealth of publications, we have created valuable scientific language resources, such as labeled datasets and word embeddings, which were used to train Deep Learning models that assist us in most of the language understanding tasks. In the context of TACTICIAN, we have devised methodologies and developed algorithms that can assign scientific publications to the Mars Express, Herschel, and Cluster ESA science missions and identify selected named entities and observations in these scientific publications. We also introduced a new unsupervised ML technique, based on Nonnegative Matrix Factorization (NMF), for classifying the Planck mission scientific publications to categories according to the use of the Planck data products.
These methodologies can be applied to any other mission. The combination of NLP and ML constitutes a general basis, which has proved that it can assist in establishing links between the missions’ observations and the scientific publications and to classify them in categories, with high accuracy.
This work has received funding from the European Space Agency under the "ArTificiAl intelligenCe To lInk publiCations wIth observAtioNs (TACTICIAN)" activity under ESA Contract No 4000128429/19/ES/JD.
Novel applications of Chatbot to assist the user to explore their knowledge capture needs, identify the scope of what they know and to then help capture this knowledge do not yet exist. Similarly on the side of learning from others, Chatbot technology has potential to accelerate learning through creating direct access to the knowledge sought by a user. To achieve this, applications developed should make use of a domain specific terminology (as already exist in the space domain), as well as some prior input from space experts. This novel capability is particularly relevant to the space sector, a highly unique and niche arena, where experts are in very high and constant demand (subsequently limiting access to their knowledge). This knowledge scarcity is driving the need for a more rapid knowledge capturing and transfer capability and is highly sought after by both space agencies and industries alike. To this aim research and development activities are being undertaken at European Space Agency, in the area of Lessons Learned, and this work on developing an AI agent, linked to knowledge graph and data mining capabilities has already provided some critical insights in to the development of new capabilities created specifically to meet user’s needs in the space domain
Knowledge Graphs are an ontologically structured and scalable databases for information storage. They have been widely used in practical applications in various domain, with some successful applications in the space domain for requirement engineering management and space system design. Recent research effort in the Computational Linguistic community, has been dedicated in developing models for Question/Answering that do not work as a black box but that are able to learn about the meaning of a question from the ontology defined in the knowledge graphs, and through a chain of programs, learnt by the model, can traverse the graph and provide an informed answer. These models have been demonstrated on a Q&A system for the space safety domain. The natural language query interface can provide answers from the knowledge stored in DISCOS (Database and Information System Characterising Objects in Space). DISCOS is a SQL database created and continuously maintained by the Space Debris Office in ESA. It contains information regarding launches, re-entry events and objects registration details, for more than 40,000 trackable objects. This project is part of an ongoing wider ESA ITT activity, led by VisionSpace and supported by the University of Strathclyde and the University of Edinburgh, by the title "Intelligent Operations and preventative maintenance Assistant (IOA)" aiming at developing a semantic search engine for mission operation and space safety engineers.
Copernicus data are becoming tremendous in volume & quality. Copernicus services are focusing on professional services, like the Emergency Management Service whose aim is to provide information on major events to the rescue and security forces. In order to raise awareness of the Copernicus services to the EU citizens, there is a need to provide added value data to the Media Industry, rendering them to major demonstrators of the success of Copernicus to the citizens.
In the field of journalism, the collection and processing of information from different heterogeneous sources are difficult and time-consuming processes. In the context of the theory of journalism 3.0, where multimedia data can be extracted from different sources on the web, the possibility of creating a tool for the better exploitation of Earth Observation (EO) data and especially images by professionals belonging to the field of journalism is explored. With the production of massive volumes of EO image data, the problem of their exploitation and dissemination to the public and specifically by professionals in media industry arises. In particular, the exploitation of satellite image data from existing tools is difficult for professionals who are not familiar with image processing. In this scope, a new innovative platform that automates some of the journalistic practices, called EarthPress will be presented. This platform includes several Natural Language Processing (NLP) mechanisms allowing users to early detect and receive information about breaking news occured worldwide in real-time, retrieve EO images upon request for a certain event, analyse the EO images using image processing methods in order to extract valuable data, and generate automatically a personalized article according to the writing style of the author. Through this platform, the journalists or editors can also make any modifications in the generated article before publishing.
The EarthPress platform is directly related with the workshop, as both are related to the utilization and accessibility of EO information, the dissemination of EO data as well as the implementation of NPL tools for these data analysis.
Despite its potential, Earth observation (EO)-based information still remains difficult to access, mostly because of the technical requirements needed to convert the raw image data into actionable information (including the limited availability of vast labeled sets and the need for advanced machine learning skills). New ways to extract relevant information from images bypassing those requirements are needed to unleash the full potential of EO for the benefit of various application fields, such as environmental monitoring, agriculture, urban planning, tourism, etc. Remote sensing visual question answering (RSVQA) was proposed with the aim of interfacing natural language and vision to ease the access of information contained in Earth Observation data for a wide audience, which is granted by simple questions in natural language.
Traditionally, the vision/language interface is an embedding obtained by fusing features from two deep models, one processing the image and another the question. Despite the success of early VQA models, it remains difficult to control the adequacy of the visual information extracted by its deep model, which should act as a context regularising the work of the language model. We propose to extract this context information with a visual model, convert it to text and inject it, i.e. prompt it, into a language model. The language model is therefore responsible to process the question with the visual context, and extract features, which are useful to find the answer. We study the effect of prompting with respect to a black-box visual extractor and discuss the importance of training a visual model producing accurate context, i.e. image description.
Recent advances in satellite technology have led to a regular, frequent, and high-resolution monitoring of Earth at global scale, providing an unprecedented amount of Earth observation (EO) data. The growing operational capability of global Earth monitoring from space provides a wealth of information on the state of our planet Earth that waits to be mined for several different EO applications, e.g., climate change analysis, urban area studies, forestry applications, risk and damage assessment, water quality assessment, crop monitoring, etc. A growing number of cloud-based EO data access and processing resources (such as the five Copernicus Data and Information Access Services (DIAS) platforms) have become available. These platforms allow users (e.g., EO application and service developers, space agencies, space industry, the science community and the general public) to search for satellite images required for the EO applications of interest, by using keywords/tags in terms of sensor type, geographical location and data acquisition time of the satellite images stored in the archives. Thus, a growing need for accurate and scalable techniques for satellite EO images understanding, search and retrieval from the massive archives (e.g., Copernicus archives) has appeared. However, in the era of big data, the semantic content of the satellite data is much more relevant than the keywords/tags. To keep up with the growing need of automatization, image search engines that extract and exploit the content of the satellite images are necessary. In other words, it is emerging the need of being able to go beyond the traditional query of EO data catalogues based on technical image metadata (location, time of acquisition, technical parameters), and enrich the semantic content of image catalogues enabling a brand new class of query possibilities powered by the combination of Natural Language Processing (NLP)to understand the query and to describe the content of the data, and Computer Vision (CV) to massively annotate data and implement multi-modal text-to-image and image-to-image searches.
In such a context, the ESA funded Digital Assistant for Digital Twin Earth project aims at developing a precursor of a Digital Assistant able to address those needs through NLP and CV cutting-edge AI technologies applied to EO. Through the Digital Assistant the user will be able to search satellite images using natural language processing (both describing semantic features of the image or giving information about location/time) or search semantically similar images. Furthermore, using NLP the user will be also able to ask direct questions on images or ask for features extraction, in addition to task a future acquisition. In this way, the Digital Assistant will simplify satellite data search and retrieval, and open the use of satellite data also to non-expert users that could benefit from it. Furthermore, it will enable the extraction of additional value information from the satellite data for every type of user, through AI models to extract features of interest, using a simple NLP query. Finally, the Digital Assistant will have a conversational AI able to keep the conversation with the user in order to simplify even more the use.
This article presents a solution for space industry engineers: an AI-powered virtual assistant. With Natural Language Processing (NLP) and knowledge graphs, the virtual assistant can provide answers with meaningful insights to both technical and non-technical questions by searching through handbooks and standards, increasing efficiency and improving knowledge transfer accuracy. The model is trained using sample data sets from the European Space Agency (ESA) through the development of data scraping scripts. Once its quality has been evaluated and approved by engineers, the model can be further trained with non-public data for local use on Ethernet servers. The article details the applications of NLP and knowledge graphs in the space industry, including knowledge management, system design support, requirement specification, and troubleshooting. It also discusses the challenges and opportunities of using NLP and knowledge graphs in the space industry, such as handling multi-lingual and technical language, integrating disparate data sources, and managing uncertainties and errors. Finally, the article concludes with information on the proposed system architecture and real-life use cases from a demo version of the model, trained on ESA sample documents with real-life problems.
Artificial Intelligence for Software Engineering (AI4SE) examines how AI may improve the software-intensive systems engineering lifecycle encompassing several artifacts spanning all phases of continuous integration and continuous development. As these phases involve heterogeneous data sources and added levels of complexity according to the use case, traceability plays a vital role in ensuring the completeness of the software development process. Over time these linked artifacts are stored in system modellers and relational databases which ultimately increases the end-user effort in terms of heterogeneity, querying, quality assurance, time and storage complexity.
However, recent advances in the field of graph-based technologies, Natural Language Processing (NLP) and Machine learning (ML) aid to overcome these limitations. Traceability graphs are heterogeneous dynamic graphs stored in graph databases as they evolve throughout the project lifecycle. ML and NLP technologies foster research in building and analysing these connected traceability graphs. With comprehensive research on domain ontologies and system development activities in AI, algorithms can learn from the structure of large-scale graphs and takes advantage of the “relationship-first” nature of graph schemas. This enables the ability to vectorize nodes, edges and even graphs in low-dimensional space for feature learning called “embeddings”(vectors). This automated way of learning the intricate features from the topology of graphs for tasks aids in node/edge reconstruction, pattern discovery, node/edge classification etc. Further, storing them as embeddings reduces space and time complexity for stakeholders.
An MBSE-based use-case on graph-based benchmark traceability dataset mAquaLush is explored. A framework is implemented for building and analysing an ontology-driven knowledge graph as a part of a Master’s thesis titled “Conceptualization of a framework for AI-based software graph completion from heterogeneous data sources”. On experimenting the with graph completion problem, i.e proposing a missing link between two artifacts or an anticipated link feature learning-based walk methods (DeepWalk, RandomWalk and Meta2Vec) were collectively proven to perform efficiently.
Keywords: Traceability graphs, Ontologies, Feature engineering, Feature learning, Graph completion, Knowledge graphs
In the past two years generative AI models have seen an explosion in popularity with many models being published [1]. Text-to-Text models such as ChatGPT3 [2], LaMDA [3], and PEER [4] as well as Text-to-Code models such as Codex [5] and AlphaCode [6] have demonstrated abilities to propose high quality solutions to technical tasks. Applications of these technologies to support space systems engineering have been clearly identified [7] and domain-specific language models have been published [8].
This presentation seeks to build on this work by proposing practical applications of Large language Models (LLMs) to assist MBSE activities. Building and maintaining consistent and complete models for complex space systems is a time-consuming task for engineers. Where system complexity is high and requirements number in there 100’s to 1,000’s. LLMs offer a potential avenue to reduce both the manual work involved in modelling complex systems and reduce the risk or modelling error. Such applications include, but are not limited to, generating initial model diagrams (e.g., operational capabilities diagrams) from natural language descriptions, verifying system model adherence to requirements and design best practice, and ensuring model completeness by identifying any missing information (e.g. data links between sub-systems).
References
[1] R. Gozalo-Brizuela and E. C. Garrido-Merchan, “ChatGPT is not all you need. A State of the Art Review of large Generative AI models,” Jan. 2023, doi: 10.48550/arxiv.2301.04655.
[2] OpenAI, “ChatGPT.” https://chat.openai.com/ (accessed Jan. 19, 2023).
[3] R. Thoppilan et al., “LaMDA: Language Models for Dialog Applications,” Jan. 2022, doi: 10.48550/arxiv.2201.08239.
[4] T. Schick et al., “PEER: A Collaborative Language Model,” Aug. 2022, doi: 10.48550/arxiv.2208.11663.
[5] Open AI, “OpenAI Codex,” Aug. 10, 2021. https://openai.com/blog/openai-codex/ (accessed Jan. 19, 2023).
[6] Y. Li et al., “Competition-Level Code Generation with AlphaCode,” Feb. 2022, doi: 10.1126/science.abq1158.
[7] G. Garcia, G. Pruvost, S. Valera, L. Mansilla, and A. Berquand, “Artificial Intelligence and Natural Language Processing to Support Space Systems Engineering,” in Model Based Space Systems and Software Engineering, 2022.
[8] A. Berquand, P. Darm, and A. Riccardi, “Spacetransformers: Language modeling for space systems,” IEEE Access, vol. 9, pp. 133111–133122, 2021, doi: 10.1109/ACCESS.2021.3115659.
Space systems often have commonalities, and they might share comparable assumptions and requirements across projects. The assessment of new missions can be supported by model-based approaches, making use of new methods and tools so that consistency across models and at different phases is maintained. In this context, a Digital Assistant accessible to engineers can be used to ensure quality, accuracy, completeness, and correctness. The AI-Powered Digital Assistant for Space System Engineering TDE activity aims to develop a solution to identify common concepts from different models and propose suggestions for new designs, speeding up the mission and spacecraft definition phases and avoiding the repetition of previous mistakes.
Three different use cases have been selected for this activity. They are mainly focused on learning from a previous set of models and requirement specifications to identify potential links between requirements and from requirements to model artefacts, as well as to detect incompleteness or inconsistencies in the models. The application of the latest Natural Language Processing (NLP) techniques is being investigated, fine-tuned for the space domain, to track, analyse and determine the relationships among requirements and other Model Based System Engineering (MBSE) artefacts. After the development of the Digital Assistant, the solution will be integrated with the MBSE Hub, and conceptual interoperability with the Space System Ontology (SSO) will be also ensured.
Risk is inherent in human spaceflight. The Human System Risks are a special category of risks that National Aeronautics and Space Administration (NASA), as an Agency, has to contend with when engaging with the challenges of human spaceflight. While programmatic and institutional safety risks are often tied to a specific program or activity, Human System Risks are designed to inform NASA Technical Standards, to protect human crews independent of any specific spaceflight program. Risk management in the context of human system risks can be viewed as a trade-based system where the relevant evidence in life sciences, medicine, and engineering is tracked and evaluated to identify ways to minimize overall risk to the astronauts and to ensure mission success. NASA’s Human System Risk Board (HSRB) manages the process by which scientific evidence is utilized to establish and reassess the postures of the various risks to the human system during all of the existing or anticipated crewed missions. The HSRB uses Directed Acyclic Graphs (DAG), a type of causal diagramming, as visual tools to create a shared understanding of the risks, improve communication among those stakeholders, and enable the creation of a composite risk network that is vetted by members of the NASA community. The knowledge captured is the Human Health and Performance community’s knowledge about the causal flow of a human system risk, and the relationships that exist between the contributing factors to that risk.