Citizen Innovators: The Catalysts Design Arena

[日本語へ]

Would you like to go on a chemistry treasure hunt using informatics technology?

We will hold a competition to find a catalyst with good performance. This award will be given to the person who uses informatics methods and approaches to discover a new high-performance catalyst that has not had a breakthrough for a long time using existing chemical approaches. Those with excellent results will be invited to a presentation to receive an award, with supporting their travel expenses.

Background

Ethylene is one of the most important compounds in industry, and is used in synthetic fibers, vinyl, insulation materials, insulators, agricultural materials, building materials, water pipes, and packaging via vinyl chloride, acrylonitrile, vinyl acetate, etc. It is used for various purposes such as materials. Oxidative coupling of methane (OCM) is a known chemical reaction that produces ethylene from natural gas consisting of methane. However, in OCM, much of the methane is burned during the reaction process, resulting in carbon monoxide and carbon dioxide rather than the desired synthetic product. Therefore, in order to produce more of the desired compound, research has been conducted for many years on catalysts that activate the reaction. However, there are still many aspects of the reaction mechanism of catalysts that have not been elucidated, and although catalyst candidates are derived from the thoughts and experiences of experienced chemists, it is basically a matter of trying by hand.

On the other hand, the recent development of AI technology has accelerated the search for highly functional new substances and material candidates using information technology. Steering committee member Takahashi (Professor, Hokkaido University) conducted a research for catalyst candidates using machine learning. By publishing the knowledge and data obtained through this research and holding a competition, we hoped to attract various ideas and methodologies into this world. We hope that this will lead to completely new ways of thinking that are different from those of chemists up until now.

A project like this could be considered a new form of science, a new form of citizen science that has been attracting attention in recent years. All participants in the competition will take on the challenge of discovering new catalysts using their own ideas and methodologies, and by observing the methods of other participants, they can learn about ideas and approaches outside of their own and their effects. We believe this will be a great opportunity to understand and utilize information technology from multiple perspectives.

Tasks for Participants

Submissions for this competition are simple; just submitting catalyst candidates that are likely to be excellent new catalysts, and a paper describing the way to find the candidates. A catalyst candidate is a combination of three elements and one support (as M1-M2-M3/Support), and the available elements and supports are listed below. We welcome many methods for deriving catalyst candidates based on ideas from information technology and informatics, such as machine learning. Any informatics method can be used, and any computer can be used. Or you can calculate it by hand. You can use the data of the experiments collected from past papers. You may use data other than the prepared data.

In this competition, the organizing committee will conduct evaluation experiments on the submitted catalyst candidates to see their yields of compounds with linked carbons such as ethane and ethylene (i.e., C2 yield). Participants are then ranked in descending order of yield, and winners are determined and awarded. The detailed evaluation experiment setup is presented below. In actual catalysts, it is possible to adjust the mixing ratio of elements and the experimental environment, but for this competition, from the viewpoint of experimental efficiency, the mixing ratio was set to 1:1:1 for all submissions. The experimental environment is also fixed, and described below.

The catalyst currently considered to have the best yield is a combination of the three elements Na, Mn, W and a support SiO2, which achieved a yield of 21% in the experimental environment of the organizing committee (however, the combination ratio was 2:2:1). The goal of this competition is to find a catalyst candidate with a yield exceeding 21% under the same reaction evaluation conditions, but even if it does not meet, the top few catalyst candidates with the highest yield among the submitted catalyst candidates will be awarded. However, combinations included in prepared data or published papers will be invalid.

Differences from Popular Machine Learning Competitions

In general machine learning competitions, the task is to accurately predict a target value based on data. However, the data in this task contains uncertainty for the following reasons, so it is unclear whether there is really any meaning in correctly predicting the yield of this data.

  • Even with the same catalyst composition, the yield may change if the experimental environment is different (yields reported differ depending on the paper even with the same catalyst composition)
  • Even under the same experimental environment, the yield may vary from experiment to experiment (average values are not necessarily obtained in each experiment)
  • Data contains noise and errors due to human error (including extreme numbers such as 150% selectivity)

The purpose of this competition is not to predict the yield correctly, but only to evaluate whether a promising catalyst candidate can be found, regardless of the method used. This can be done by removing outliers from the data, ignoring certain values, or even not using the data provided in the first place. This is what it is like to try to find new chemical substances from data in a data scientific way. Since the data acquisition conditions are not uniform and the extent to which data is described varies depending on the paper, it is not very reliable for making accurate predictions. What is more, this time we are not trying to guess the accuracy of a catalyst we have tested beforehand, but rather testing an unknown catalyst that we have never tested before. It is more like a search than a highly accurate prediction. This kind of task is why we would like to call it a treasure hunt.

Submission Rules

To participate in this competition, use the form to submit (1) method description, (2) catalyst candidate list, and (3) consent form. The method description has to be a PDF file includes (1) the author's names and affiliations, (2) the first to fifteenth choices of catalyst candidates, and (3) the methods, tools, ideas, etc. used to derive the catalyst candidates, in this ordering. Write the description methods so that other persons can reproduce the same results by reading the method description. If you use data other than what we have prepared, please also include how to refer to that data. Description has to be in English in principle, but Japanese is also allowed. The catalyst candidate list has to be a CSV file in which each row is a catalyst candidate (a combination of three elements and one support), and the first row is the first choice. The consent form to be uploaded has to be a PDF file that is downloaded form here signed by one of the authors. Details will be provided below, but please agree to waive intellectual property rights and to publish the method explanatory material and candidate catalysts submitted in the competition report paper written by the organizing committee. Submissions without a signed consent form will be rejected and treated as not submitted. If the submitted catalyst candidate list contains an invalid combination, it will be excluded from the evaluation.

Evaluation and Screening

Evaluation experiments for catalyst candidates will be conducted in the laboratory of organizing committee member Nishimura (Associate Professor, Japan Advanced Institute of Science and Technology). Basically, we conduct a single evaluation experiment for all submitted first-choice catalyst candidates and measure their yields. However, if the number of submissions is greater than expected, the organizing committee will pre-screen each submission, exclude papers that do not meet the conditions, and then select the submissions for evaluation experiments by lottery. In addition, if there are only a small number of submissions, we will additionally evaluate each submission's 2nd and 3rd preferences to the extent that budget and time permit. For the catalyst candidates that rank high in the above evaluation, additional experiments are conducted to measure more accurate yields, and the final ranking is determined based on this value. Depending on the combination of elements, it is quite possible that stable experiments may not be possible. In this case, we will not blame the experimenter and will evaluate the catalyst as an inappropriate candidate. For more detailed screening methods, please click here. The detailed evaluation experiment environment is as follows.

  • Catalyst preparation is carried out by co-impregnation method using an aqueous solution. This method can easily support the metals onto the support; a support is dispersed in an aqueous solution containing up to three types of elements and then removing the water to be obtained the metals-supported precursor.
  • The amount of metal supported is fixed such that the total amount of the three elements is 0.6 mmol for 1.0g of support. The metals-supported precursor obtained by drying is calcined at 900℃ and used for experiments. An atmospheric fixed-bed flow reactor is used to evaluate the OCM reaction.
  • 50mg of catalyst was installed as is a powder in a 4mm inner diameter quartz tube, treated with oxygen at 500℃ as a pretreatment, and heated from 500 C in an OCM atmosphere (CH4/O2/N2 = 21/7/3 cc, CH4/O2 = 3.0). The C2 yield is evaluated at each 50 C steps till 850 C. The standard catalyst Na-Mn-W/SiO2 shows C2 yield = 21.7 +- 0.38% at 800 C.
  • Expressed in the data notation described later, the experimental environment has 8 settings: [Temperature, Pch4, Po2, Par] = [T, 0.677, 0.226, 0.097] (T = 500, 550, ..., 850) .

Award

A recital will be held in March 2025 to award the top three submissions. The location and format of the presentation (whether it will be held at any academic conference, etc.) have not yet been determined. Travel expenses will be provided to the winners, so please present the methods and ideas used to derive the catalyst candidates. Winners will be contacted at least one month before the event, and travel expenses will begin from there. There will be no prize money due to budget constraints.

Data Format

The page contains a summary of catalyst data reported in papers published around the world. The environment and parameters for each experiment using this data are explained below.
                *Small amount of catalysts (M1 M2 M3) are disperse over the support materials.
                M1 M2 M3: catalysts element.
                Support: Support material for catalysts.
                Temperature: Reaction temperature in celsius.
                Pch4: Methane gas amount
                Po2: Oxygen gas amount
                Par: Inert gas amount
                C2y: Yield of produced ethane and ethylene.

                  Note
                *Total of Pch4, Po2, Par is 1.
                *M1 M2 M3 support are catalysts information.
                *Temperature,Pch4,Po2, Par Are experimental condition.
                *C2y is objective variable.
                *Material information can be replaced by physical quantities using code like xenonpy.
                *Lastly, data preprocessing is performed for literature data where data contained M4,
                anions, promoters are removed. Catalsyst compositions without support is removed.
                Catalysts having more than 50mol% is considered as support. Unknown support is excluded.

                Referrence: HTP:ACS Catal. 2021, 11, 1797-1809
                  Literature:ChemCatChem 3.12 (2011): 1935-1947
              

List of Elements and Supports Available

The following elements can be used basically.

Li,
Na, Mg, Al
K, Ca, Sc, V, Mn, Fe, Co, Ni, Cu, Zn, Ga
Rb, Sr, Y, Zr, Nb, Mo, Ru, Rh, Pd, Ag, In, Au, Pt
Cs, Ba, Hf, Ta, W, Re, Ir, Tl, Pb
La, Ce, Pr, Nd, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu

The following can be used as supports, basically.

MgO, γ-Al2O3, α-Al2O3, SiO2, CaO, rutile-TiO2, MnO2, Fe2O3, CoO2, ZnO, SrO, Y2O3, ZrO2, Nb2O5, La2O3, CeO2, Nd2O3, BaCO3

Even if the elements/supports are included in the above list, they may not be available to the organizing committee in some cases. In that case, catalyst candidates containing it will be invalidated. You may also submit catalyst candidates that contain elements or supports that are not included in the above list. Such candidates will be valid if the organizing committee determines that the element/supports can be used, otherwise they will be invalidated. Basically, we cannot use expensive materials such as rare metals, or materials that are difficult or dangerous to handle.

Schedule

July 2024 CFP release, data release, competition site opened
October 1, 2024 Submissions site opens
November 20, 2024 23:59 (AoE) Submission deadline
Early February 2025 Applicants with excellent grades will be contacted
Mid-March 2025 Conference for competition results, award ceremony, and presentations by top performers

The above is the current plan. The exact schedule will be updated later.

Steering Committee and Sponsorship

This competition is planned and managed by a organizing committee made up of volunteers. In addition, it will be jointly sponsored by JST CREST (Representative: Keisuke Takahashi) and Grants-in-Aid for Scientific Research A (Representative: Shin-ichi Minato). The members of the organizing committee is the following.

  • Keisuke Takahashi: Professor, Hokkaido University
  • Shun Nishimura: Associate Professor, Japan Advanced Institute of Science and Technology
  • Ichigaku Takigawa: Program-Specific Professor, Kyoto University
  • Norihito Yasuda: Senior Research Scientist, NTT Communication Science Laboratories
  • Masakazu Ishihata: Senior Research Scientist, NTT Communication Science Laboratories
  • Takeaki Uno: Professor, National Institute of Informatics

If you have any questions regarding this competition, please contact . In addition, the questions and answers we receive will be posted to the competition homepage as FAQs, so please use them as a reference.

Intellectual Property

Neither the organizing committee nor any individual administrator nor any person related to the submitted catalyst candidates will acquire any intellectual property rights. Participants will also participate in the competition on the condition that they waive their intellectual property rights. This is to prevent troubles related to intellectual property (in order to clear the rights relationship, you will be asked to sign a document stating that you agree to waive rights). If you would like to acquire intellectual property, please do not participate in this competition, but rather conduct joint research and experiments on your own to acquire intellectual property. If you wish to write a separate paper on the technology you have developed (especially as a machine learning method), you can and are alllowed to do so separately. In this case, you can refer to the experimental results from the competition.

Competition Report Paper

Regarding this competition, we plan to submit it as an academic paper to a chemistry journal, which will include the purpose of the competition, details of the competition, submitted methods, candidate catalysts, and their performances. All submitted catalyst candidates and method descriptions will be included in the paper. We plan to include all submitters as authors in the paper, but if there are a large number of submissions, the organizing committee will invite top prize winners and submitters who have proposed other interesting methods to participate as co-authors. We will consult with you.

Peer Review Standards and COI

Steering committee members are prohibited from participating in the competition, but related parties (students in the laboratory, etc.) are allowed to participate. Since this is a small-scale project, please understand that this is a measure to avoid a decrease in the number of submissions. No one will be given preferential treatment just because they are related to the organizing committee. If there are too many submissions, each submission will be pre-screened, but in that case, the organizing committee will be excluded from the screening of the relevant parties.

Accidents and Unexpected Situations

Evaluation experiments may become impossible for some reason, such as machine damage or malfunction. In such a case, we will cancel the competition as it is unavoidable. We will restart this competition at a later date if we are able to establish an experimental environment again, or if we receive the cooperation of others who are able to conduct the experiment.

Basically, experiments are subject to errors. Normally, experiments are conducted multiple times to evaluate accurate values, but in this competition, due to limited experimental resources, we will conduct only one experiment for each catalyst candidate. As a result, some errors are expected, but please consider this in the competition.

If there are many submissions and we are unable to evaluate all submissions, we would like to hold a second competition at a later date. At that time, after consulting with participants, we would like to consider a system in which submissions that could not be fully tested can also be entered into the competition.

Regarding the submitted catalyst candidates, other researchers may use them for their own research ideas, regardless of whether they have been tested or not. However, in that case, one must refer to it, and it is the rule in the competition.

How to hunt Treasures

There are many ways to find treasure. If you want to use so-called machine learning techniques to make highly accurate predictions, for example, among the data provided above, use the data (type is HTP) made by Professor Taniike of the Japan Advanced Institute of Science and Technology using the same equipment and environment. The accuracy may be higher if you limit the results to those that are currently available. If you include literature data (type is set to Literature), the number of data and the scope of learning will expand, so even if each piece of data contains errors, the accuracy may be higher. The data has been preprocessed, but the accuracy may be improved by improving the preprocessing.

It might be possible to take a completely different approach and use a visualization like a heat map, while humans can search with their own eyes. If you discretize the data and use pattern mining or clustering, you may not be concerned about some errors or abnormal values. There is also a way to increase information about the elements to be combined (melting point, boiling point, ionization tendency, number of electrons in the outermost shell, etc.). We think it is also possible to create new features by combining various features of elements. This is the challenging part of this competition, as there are various factors that are not present in normal machine learning tasks. There are several papers that discover new catalysts from data, so it would be a good idea to refer to them for preprocessing methods and how to select features.