Background and objective: Phytoestrogens (weak estrogens found in plants or derived from plant precursors by human metabolism) have been hypothesized to reduce the risk of a number of cancers. However, epidemiologic studies addressing this issue are hampered by the lack of a comprehensive phytoestrogen database for quantifying exposure. The purpose of this research was to develop such a database for use with food-frequency questionnaires in large epidemiologic studies. Methods: The database is based on consumption patterns derived from semistructured interviews with 118 African-American, Latina, and white women residing in California's San Francisco Bay Area. HPLC-mass spectrometry was used to determine the content of seven specific phytoestrogenic compounds (i.e. the isoflavones: genistein, daidzein, biochanin A, and formononetin; the coumestan: coumestrol; and the plant lignans: matairesinol and secoisolariciresinol) in each of 112 food items/groups. Results: Traditional soy-based foods were found to contain high levels of genistein and daidzein, as expected, as well as substantial amounts of coumestrol. A wide variety of 'hidden' sources of soy (that is, soy protein isolate, soy concentrate, or soy flour added to foods) was observed. Several other foods (such as various types of sprouts and dried fruits, garbanzo beans, asparagus, garlic, and licorice) were also found to be substantial contributors of one or more of the phytoestrogens analyzed. Conclusions: Databases, such as the one described here, are important in assessing the relationship between phytoestrogen exposure and cancer risk in epidemiologic studies. Agencies, such as the United States Department of Agriculture (USDA), that routinely provide data on food composition, on which epidemiologic investigations into dietary health effects are based, should consider instituting programs for the analysis of phytochemicals, including the phytoestrogens.