搜档网
当前位置:搜档网 › Genome-Wide Search and Identification of a Novel Gel-Forming Mucin MUC19Muc19 in Glandular

Genome-Wide Search and Identification of a Novel Gel-Forming Mucin MUC19Muc19 in Glandular

Genome-Wide Search and Identification of a Novel Gel-Forming Mucin MUC19Muc19 in Glandular
Genome-Wide Search and Identification of a Novel Gel-Forming Mucin MUC19Muc19 in Glandular

AJRCMB Articles in Press. Published on July 25, 2003 as doi:10.1165/rcmb.2003-0103OC

Genome-Wide Search and Identification of a Novel Gel-Forming Mucin

MUC19/Muc19in G landular Tissue s

Yin Chen1, Yu Hua Zhao1, Tejas Baba Kalaslavadi1, Edward Hamati1, Keith Nehrke2,

Anh Dao Le3, David K. Ann4, and Reen Wu1*

1Center for Comparative Respiratory Biology and Medicine and

Division of Pulmonary and Critical Care Medicine,

University of California, Davis; CA 95616; 2Center for Oral Biology, University of

Rochester; Rochester, NY 14610; 3Department of Oral & Maxillofacial Surgery. Drew

University of Medicine and Science, Los Angeles, CA 90059; and 4Department of

Molecular Pharmacology and Toxicology, University of Southern California, Los

Angeles, CA 90033

* Corresponding author: Reen Wu, Ph.D.

Center for Comparative Respiratory Biology and Medicine

Room

1121

Surge

Annex,

1

Davis

at

California

University

of

One Shields Ave

95616

CA

Davis,

(530) 752-2648 (Off); (530) 752-8632 (Fax) e-mail:

rwu@https://www.sodocs.net/doc/158527921.html,

Short title: New gel forming mucin gene: MUC19/Muc19

Key words: mucin, MUC gene, genome-wide search, bioinformatics, in situ hybridization

Copyright (C) 2003 by the American Thoracic Society.

Abstract:

Gel-forming mucins are major contributors to the viscoelastic properties of mucus secretion. Currently, four gel-forming mucin genes have been identified: MUC2, MUC5AC, MUC5B, and MUC6. All these genes have five major cysteine rich domains (four von Willebrand Factor (vWF)C or D domains and one Cystine-knot (CT) domain) as their distinctive features, in contrast to other non-gel-forming type of mucins. The CT domain is believed to be involved in the initial mucin dimer formation and have very succinct relationship between different gel-forming mucins across different species. Because of gene duplication and evolutional modification, it is very likely that other gel forming mucin genes exist. In order to search for new gel-forming mucin candidate genes, a “Hidden Markov Model”(HMM) was built from the common features of the CT domains of those gel-forming mucins. By using this model to screen all protein databases as well as the six-frame translated EST and translated human genomic databases, we identified a locus located at the peri-centromere region of human chromosome 12 and the corresponding homologous region of mouse chromosome 15. We cloned the 3’ end of this gene and its mouse homologue. We found one vWF C domain (VWC), one CT domain, and various mucin-like threonine/serine-rich repeats. Phylogenetic analysis indicated the close relationship between this gene and the submaxillary mucin from porcine and bovine. A polydispersed signal was observed on the northern blot, which indicates very large mRNA size. Further analysis of the upstream genomic sequences generated from human and mouse genome projects revealed three additional vWF D domains (vWD) and many mucin-like threonine/serine-rich repeats. The expression of this gene is restricted to the mucous cells of various glandular tissues, including

sublingual gland, submandibular gland and submucosal gland of the trachea. Based on the chronological convention, we have given the name MUC19 to the human orthologue and Muc19 to the mouse.

Introduction:

Mucus is a viscoelastic gel-like substance that covers the mammalian epithelial surface of various tissues. The main functions of mucus include lubricating and protecting of epithelia from environmental insults. The viscous and elastic properties of mucus gel are generally attributable to the physical properties and structural features of mucin glycoproteins, specifically, gel-forming mucins. MUC2, MUC5AC, MUC5B, MUC6 define this mucin subgroup and they are believed to have evolved from one common ancestor with von Willibrand factor (vWF) (1). Bovine and porcine submaxillary mucins (BSM, PSM) also belong to this subgroup (1). All of these gel-forming mucins have very large size (15kb-40kb cDNA); they also share a similar structure and substantial sequence homology in the conserved regions. The cDNA sequences of those mucins have multiple “cysteine-rich” vWF C (VWC) and vWF D (VWD) domains in the flanking region of the mucin-like threonine/serine-rich repeats and Cystine knot (CT) domains in their C-terminal regions (1, 2). Both the cysteine number and their positions are extremely conserved in those domains, which play an essential role in forming disulfide-linked dimers (3-5) and multimers (1, 6, 7). No such domains are found in other non-gel-forming type of mucins. Their large size and the capability of forming multimers support the notion that these mucins have played a pivotal role in forming the mucus gel. Indeed, those gel-forming mucins have been proven to be major components of the mucus secretion of various organs (8-11).

In addition to the gel-forming mucins, fifteen other human mucin genes have been cloned and named as MUC1, 3-4 and 7-18 (https://www.sodocs.net/doc/158527921.html,/LocusLink/). Generally speaking, individual mucins are named because that they have so-called

“threonine/serine-rich mucin repeats”, and they share no apparent sequence similarities as a big group (12). Among those mucins, some are membrane-tethered (MUC1, 3,4,11, 12) and some are very small (MUC7, 9, 10) (12). The contribution of those mucins to the biophysical and biochemical properties of mucus gel is not entirely clear.

For many years, the total number of mucin genes have remained a mystery. Currently, four human gel-forming mucin genes have been identified: MUC2, MUC5AC, MUC5B and MUC6. New gel-forming mucin may also exist due to gene duplication, chromosomal exchange or other genetic alterations. Current progress in DNA sequencing has led to the creation of many sequence databases that are useful resources for the discovery of new proteins. Now that the human genome project has been completed, potential gene candidates can be predicted from their genomic sequence. In addition, another useful database is dbEST (NCBI EST database http://www.ncbi.nlm. https://www.sodocs.net/doc/158527921.html,/dbEST/). dbEST contains the cloned cDNA sequences by the reverse transcription of mRNA samples from various tissues, and has been widely used for the study of gene expression.

One general approach to discover new members of a gene family is to search the nucleotide databases for similar sequences of this gene family by BLAST program (https://www.sodocs.net/doc/158527921.html,/BLAST/). However, many gene families, such as gel-forming mucins, don’t have the overall sequence similarities; rather, they only share some conserved “motifs” such as the CT domain. This difficulty can be overcome by searching the database using sequence profiles rather than merely the sequence per se. There are many methods for constructing sequence profile from a multiple sequence alignment; the resulting profile represents the mathematical summary of the specific

features of these sequences extracted from those known members of a given gene family. Searching the database by using a sequence profile is like looking for the general “features” of those genes rather than just similar DNA sequences (13, 14). “Hidden Markov Model” (HMM) is one of the most powerful tools in this regard (15, 16).

Using this HMM-based searching method, Schultz et al (17) have discovered more than 1,000 new putative human small GTPase proteins. Combined with EST database search and BLAST search on genomic sequence, Wittenberger et al (13) have uncovered new members of the G-protein coupled receptor superfamily. Therefore, this HMM based search approach will be more robust and specific than the BLAST program. In this report, we have used this approach to identify MUC19/Muc19, as a novel glandular tissue-specific gel-forming mucin gene.

Material and Methods:

Screening the novel gel-forming mucin genes

As shown in Fig.1, we collected all the known gel-forming mucin genes, including those of human and other animal species, from the NCBI database. We chose the 3’ end sequences because of the concern that some genes, such as MUC6, only have 3’ end sequences. Moreover, most of the EST sequences were generated from the 3’ end. All sequences were selected and processed with Blast2 (NCBI software program). Only the most representative sequences were preserved. These genes were then aligned by the ClustalW program (18). A gel-forming mucin gene-specific “Hidden Markov Model” (HMM) was built based on the alignment data by using HMMER2.2 software from Sean Eddy’s Lab Home Page of Washington University at St. Louis, MO

(https://www.sodocs.net/doc/158527921.html,/). NCBI human and mouse EST databases were downloaded to an in-house Linux computer. All those sequences were six-frame translated, then they were screened using the “gel-forming Hidden Markov Model” by HMMSEARCH in the HMMER2.2 software package. Initially, a default cutoff value (<1) was used. All hits were then used to search the NCBI nr database to find out if those ESTs corresponded to the known genes by using an in-house search program. By visual inspection, we found that there was a large gap in the scores among all those hits. All the known non-mucin genes have a score much smaller than 0.01. Thus, a second cutoff value (<0.01) was used to filter the results. The same method was also used to search the human and mouse genomic databases from NCBI again. The only difference in this search was that all the genomic sequences were first translated by GENESCAN program (19) before the search.

3’ and 5'-RACE

The RACE kit (Roche Diagnostics Corporation, Indianapolis, IN) was used to synthesize the first-strand cDNA from total RNA (3 μg) isolated from human and mouse salivary gland tissues. All the procedures followed the manufacturer’s instructions. Briefly, Oligo-dT anchor primer or antisense gene-specific primer corresponding to different regions of MUC19/Muc19 message were used to initiate first-strand cDNA synthesis. For the 3’-RACE, PCR was carried out by 5’ gene specific primers and 3’ oligo d(T) anchor primer. For 5’-RACE, a 3' tailing with oligo d(A) with terminal deoxynucleotidyl transferase was carried out on the first-strand cDNA, then a PCR was carried out using the nested gene-specific primer and the 5' oligo d(T) anchor primer. The

PCR products were subcloned into the TA vector (Invitrogen, Carlsbad,CA) for cloning and DNA sequencing. All primer sequences used in this study are listed on the Table.1.

RT-PCR amplification

cDNA was synthesized from total RNA (3 μg) by RT with oligo d(T) primer. The resulting single-strand cDNA was used as a template for PCR amplification by MUC19/Muc19 gene-specific primers (Table.1). PCR products were TA cloned and sequenced.

Phylogenetic Analysis

All non-repetitive 3’ end peptide sequences from gel-forming mucins of different species were aligned using ClustalW program (18). The alignment was edited and the tree was built by Jalview program (Michele Clamp (michele@https://www.sodocs.net/doc/158527921.html,)).

Genomic Structure and Localization

The chromosomal location of MUC19/Muc19 was determined by BLAST search of NCBI genomic databases (https://www.sodocs.net/doc/158527921.html,) and Blat search of UCSC human genome draft (University of California, Santa Cruz, CA). The whole genomic structure was deduced from the comparison between the porcine submaxillary gland mucin (GenBank: AAC62527) and the human/mouse genomic sequences in MUC19/Muc19 locus by locally installed Genewise program (Ewan Birney https://www.sodocs.net/doc/158527921.html,/Software/Wise2).

RNA Isolation and northern Blot Hybridization

RNA was isolated from human and mouse tissues by a single-step acid guanidinium thiocyanate phenol-chloroform extraction method (20). For northern blot hybridization, equal amounts of total RNA (20 μg/lane) were subjected to electrophoresis on a 1.2% agarose gel in the presence of 2.2 M formaldehyde and then transblotted onto Nytran membranes. The RNA was cross-linked to membrane by a UV Stratalinker 2400 (Stratagene, La Jolla, CA).The clones corresponding to the 3’ end sequence of MUC19/Muc19 were labeled with 32P-dCTP by ready-to-go TM kit (Amersham Biosciences Corporation. Piscataway, NJ). After hybridization, all the blots were exposed overnight to the phosphor screen and read by the STORM TM system (Molecular Dynamics, Sunyvale, CA) (21). The integrity of the RNA sample was verified by visualization of ribosomal 18S and 28S bands in the ethidium bromide stained gel.

Expression Analysis by quantitative RT-PCR

SYBR? Green based quantitative RT-PCR approach was carried out to characterize the message expression of MUC19/Muc19 and the control gene, glyceraldehyde-phosphate-dehydrogenase (GAPDH), in various human and mouse tissues. Mouse cDNA samples were generated in house from various tissue RNAs by reverse transcriptase using Oligo d(T) anchor primer. Human cDNA samples (human multiple tissue cDNA panel I, cat #K1420-1 and human multiple tissue cDNA panel II, cat # K1421-1) from various human tissues were purchased from BD Biosciences Clontech (Palo Alto, CA). Gene-specific primers were designed according to the cDNA region. Each PCR reaction contained 10 μM primers for a total volume of 50 μl PCR reaction solution. SYBR?

Green PCR kit (Applied Biosystems, Foster City, CA) was used according to manufacturer’s instruction. Real-time PCR data were obtained using GeneAmp? 5700 Sequence Detection System (Applied Biosystems, Foster City, CA). The normalized expression values of MUC19/Muc19 were obtained by dividing the corresponding expression values of GAPDH. In order to improve the readability of the data, all final expression values (listed in Table. 2) were further multiplied by a factor (10,000). The tissue with the value that is less than 0.01 has undetectable MUC19/Muc19 expression using this PCR method.

In Situ Hybridization

Glass slide sections from various tissue blocks were hybridized in the hybridization solution using biotin-labeled antisense or sense probes synthesized by in vitro transcription of MUC19/Muc19 clones. In situ hybridization was carried out as per the manufacturer's protocol (Roche Diagnostics Corporation, Indianapolis, IN) and modified as described before (21). Briefly, slide sections were treated with 10 μg/ml Proteinase K in 50 mM Tris-Cl (pH 8.0)and 50 mM ethylenediamenetetraacetic acid for 15 min at 37°C,rinsed twice in 0.2×saline sodium citrate (SSC) thereafter, and then postfixed in 4% paraformaldehyde /phosphate-buffered saline for 20 min. Slides were treated twice for 5 min each time with 0.1 M triethanolamine(pH 8.0) and blocked by 0.25% acetic anhydride in 0.1 M triethanolamine.The sections were then dehydrated through the ethanol series.For each section, 0.5 pmol biotin-labeled oligonucleotide probe in 50 μl of hybridization buffer was applied. The hybridization buffer contained 2× SSC, 1× Denhardt's solution, 10% dextran sulfate,50 mM phosphate buffer (pH 7.0), 50 mM

dithiothreitol, 250 μg/ml yeast transfer RNA, 100 μg/ml poly A, and 500 μg/ml salmon-sperm DNA. The section was hybridized at 45°C overnight in a humidified chamber. After hybridization, the section was washed twice for15 min each time at 37°C with 2× SSC, twice for 15 min each time with 1× SSC, and twice for 15 min each time with 0.25× SSC. After the wash, the slide was reacted with anti-biotin primary antibody conjugated with alkaline phosphatase. After several washes, the reacted probes in the slide were color-developed with the Biotin Nucleic Acid Detection kit from Roche Diagnostics Corporation (Indianapolis, IN).

Results:

Developing “Hidden Markov Model” for the genome-wide search of new gel-forming mucin genes

In order to conduct a genome-wide search for new gel-forming mucin genes, a specific “Hidden Markov Model” was developed based on the sequence alignment of all known gel-forming mucins (Fig.1). To enhance the representation of this model, sequences from species other than human and mouse were also included in the alignment. Using this finalized model, a comprehensive search of the human and mouse EST databases revealed many hits with high scores, especially those from the mouse EST databases. Most of these hits were parts of the known gel-forming mucin genes (MUC2/Muc2, MUC5AC/Muc5AC, etc.). Because of significant high score of these hits in the mouse EST database, we decided to focus on the mouse gene. After processing those results by an in-house program, 24 mouse ESTs that didn't match any known mouse mucin gene, were obtained from the search. These ESTs were in fact generated from the

same gene. The translated product of this new gene has a bona fide gel-forming mucin like CT domain (Fig.2Ab).

Molecular cloning and sequence characterization of the 3’ end of novel gel-forming mucin gene, Muc19.

We then carried out 5'/3' RACE using the primers deduced from the potential coding region of this new gene. The total mouse salivary gland RNA was used because all the ESTs from this new gene were obtained from mouse salivary gland library SG2. For 5’RACE, we used mmuc19_1740 as gene specific primer; for 3’ RACE, we used mmuc19_1392. Sequences of the primers are listed in the Table.1. By these methods, we were able to obtain two cDNA clones (1.897 kb and 2.023 kb) that were generated by different polyadenlynation sites (Fig.2Aa). The longer transcript has the same ORF as the shorter one, but has longer 3' UTR. The sequence has been deposited into GenBank under the accession number AY193891. The deduced peptide sequence has significantly high threonine and serine content (35.9%) and several mucin-like threonine/serine-repeats (Fig.2Ac). It also has the signature motifs of gel-forming mucin: VWC and CT domains (Fig.2Ab). Since mucins are named numerically in chronological order, we therefore named this new mouse mucin gene as Muc19. By comparing the Muc19 sequence with the UCSC and NCBI human genome sequence database, the cloned 3’ end sequences of Muc19 was found to reside at chromosome 15 (Fig.3A) and consists of 9 exons (Fig.2Aa).

Identification of the human MUC19 locus by searching the translated genomic database with “gel-forming mucin Hidden Markov Model”

In contrast to mouse EST database search, the human MUC19 was not found in the human EST library. After looking through the current human EST libraries, we realized this problem might be due to the lack of the human salivary gland library in the human EST database. To overcome this obstacle, we carried out the screening using the translated human genomic databases deduced from the publicly available GenBank database. By using this approach, we were able to identify the putative human MUC19 locus in chromosome 12. (Fig.3B)

We also screened the translated mouse genomic databases and found the mouse Muc19 locus at chromosome 15 (Fig.3B), which further confirmed the sensitivity and accuracy of our screening method. Interestingly, this portion of mouse chromosome 15 seems to be the homologous region to the human chromosome 12.

Notably, we were unable to identify other candidate than MUC19/Muc19 by this search on both human and mouse genomes.

Molecular cloning and sequencing of the 3’ end of human MUC19

In order to clone the human MUC19, we designed various primers corresponding to the deduced cDNA region of the MUC19 locus. RT-PCR and 5’/3’ RACE were used to amplify MUC19 cDNA from human salivary gland RNA. For 5’RACE, we used primer-hmuc19_2021; for 3’ RACE, we used hmuc19_1878. We further used the hmuc19_1110/ hmuc19_2021 primer pair to confirm the sequence by RT-PCR. All the primer sequences are listed in Table.1. Using these approaches, we cloned and sequenced a 2.23 kb MUC19 cDNA fragment (Fig.2B) (GenBank accession: AY236870). The deduced peptide sequence also has very high threonine and serine content (31.4%) and many mucin-like threonine/serine repeats (Fig.2Bc). VWC and CT domains were also identified in the

sequence (Fig.2Bb). Interestingly, we did not find an alternative polyadenylation site in the human sample (Fig.2Ba). The cloning of MUC19 from human salivary gland tissue demonstrates the similarity of the expression between human and mouse clones in terms of tissue specificity.

Phylogenetic and sequence analysis of gel-forming mucin genes

Phylogenetic analysis of various gel-forming mucin genes and vWF from different species indicates that MUC19 belongs to the PSM/BSM cluster (Fig.4), which is consistent with the detection of MUC19/Muc19 transcripts in the human and mouse salivary glands. The sequence alignment also demonstrates the numerous similarities between MUC19/Muc19 and PSM (Fig.5). It also appears that MUC19 is much more similar to PSM than Muc19 (Fig.5). Notably, the similarities among those three sequences are particularly high within the last 250AA of the C-terminus, where the CT domain resides. CT domains have been shown to play a crucial role in dimer formation (4, 5). It appears that the mucin repeat regions are very diversified even among the homologues in different species, which is also true for other gel-forming mucins. The only common feature of those mucin repeats is that they are all threonine/serine rich and contain potential sites for O-glycosylation.

The predicted gene structure upstream of the cloned 3’ end of MUC19/Muc19 The genomic sequences from both the human MUC19 and mouse Muc19 locus allow us to deduce the genomic structure and protein motifs. Most importantly, those sequences have been shown to be very similar to PSM. We then tried to predict the gene structure

upstream of the cloned MUC19/Muc19 sequences by comparing their genomic sequence with PSM peptide sequence using Genewise program (Ewan Birney https://www.sodocs.net/doc/158527921.html,/Software/Wise2). The benefit of this prediction program is that it utilizes sequence homology in addition to sequence statistics to facilitate the exon prediction. Thus it is more accurate than the conventional exon prediction method like GENESCAN that is solely dependent on sequence statistics (19). As shown in Fig.6 (A-B), both peptide sequences deduced from the genomic sequences share similar structural domains with other gel-forming mucin genes: 5’-VWD-VWD-VWD-mucin repeats-VWC-CT-3’. Both genes seem to have a very large central region containing most of the serine/threonine-rich repeats, which is reminiscent of the large central exon of MUC5B gene (22). Those structural features are very similar to PSM (Fig.6C). As we expected, the predicted peptide sequences from MUC19 and Muc19 were very similar with PSM sequence (Fig.7). Highly homologous sequences were found at both the C terminus and putative N terminus of the peptide sequences of MUC19/Muc19, while no significant homology was seen in the central repetitive regions (Fig.7). Both MUC19 and Muc19 are very large genes. Human MUC19 has more than 180kb of genomic sequence with a deduced peptide sequence larger than 7000 amino acids, while mouse Muc19 has about 80 kb of genomic sequences with about 3000 amino acids. The smaller size of mouse Muc19 might result from more gaps and much lower quality of the mouse genomic sequences available in the current database. We expect that the genomic size of mouse Muc19 is probably similar to human MUC19 when the mouse genomic project is complete.

Characterization of the expression of MUC19/Muc19 in vitro and in vivo In order to further examine the expression of MUC19/Muc19 in various tissues, both northern blot and RT-PCR approaches were used to screen the mouse and human multi-tissue panels. Like other gel forming mucin gene messages, the northern blot revealed a polydispersed feature of MUC19/Muc19 message in salivary gland and tracheal tissues (Fig.8A).In mouse, Muc19 is mainly expressed in the two major salivary glands, sublingual and submandibular, an to a much less extent, in trachea (Fig.8A). Muc19 is expressed at a higher level in the sublingual gland than that in the submandibular gland, while undetectable in the parotid gland (Fig.8A). This result is consistent with the distribution of mucous cell population in those glands. In these three major salivary glands, the sublingual gland contains mostly the mucous cell type, the submandibular gland contains a mixture of mucous and serous cell types, while the parotid gland cells are mostly the serous cell type. In human tissues, we also detected similar polydispersed signals from trachea and submandibular gland RNA samples (Fig.8B). In order to increase the sensitivity and the coverage of this tissue distribution study, we further used the quantitative RT-PCR method to screen additional human and mouse tissue samples. In the screening, primers-hmuc19_1333/hmuc19_1426 were used for human, and primers-mmuc19_1378/mmuc19_1443 were used for mouse. As summarized in Table 2, MUC19/Muc19 expression is very restricted and cannot be detected by RT-PCR in various non-glandular tissues.

We used in situ hybridization to further examine the specific cell types that express MUC19/Muc19 messages.MUC19 transcripts were detected in the mucous cells of the submandibular gland and submucosal gland of the trachea from human (Fig.9). A similar

positive hybridization of mouse Muc19 probe was seen in mouse tissue sections from the sublingual gland and tracheal submucosal gland (Fig.10). Notably, there is no hybridization signal in most serous cells of these glands. The strict cell type specificity of MUC19/Muc19 may explain why low levels of these transcripts in the tracheal RNA sample in which most of the RNA species are generated from the non-glandular portion.

Discussion:

The current explosion of sequence data from the genome project and EST project of different species make it much easier to identify new gene family members. In addition to the simple sequence similarity search, pattern-based search methods have proven to be more robust (13, 17). In this study, we successfully utilized the “Hidden Markov Model” based approach to identify a novel gel-forming mucin gene, MUC19/Muc19, which are specifically expressed in various glandular tissues.

In contrast to conventional biological discovery, the bioinformatic discovery approach requires a precise mathematical definition of the specific feature of the gene family of interest. Our initial attempt to define the “mucin-like threonine/serine rich repeats” for discovering new mucin genes was a complete failure. This was partly due to the heterogeneous nature of the mucin genes; some of them are named quite arbitrarily. MUC7, for example, was named as mucin only because of the presence of four mucin-like serine/threonine rich repeats (23). As a matter of fact, many immunoglobulin genes have more mucin repeats than MUC7. It seems that the conventional mucin definition is too loose to distinguish the real mucins from other mucin-like genes. Thus, we tried to define the mucin genes based on additional features of their peptide sequences. We found

that a mucin subgroup called “gel-forming mucin” (1, 12) was much easier to be defined. All of these gel-forming mucin genes share similar conserved motifs and structures. Most notably, they have been suggested to be the determining factor for the viscoelastic properties of mucus secretion and mucus gel formation in various organs. We therefore defined the “gel-forming mucin-specific Hidden Markov model” based on specific features at the 3’ ends of known gel-forming mucin genes in various species. After screening the ESTs databases, we found this “gel-forming mucin-specific Hidden Markov model” to be very specific and discriminating. The approach identified all previously known gel-forming mucin genes of various species without missing any. No other hits had a high enough score to be considered except MUC19/Muc19. That was also true when translated human and mouse genomic databases were included for the screening.

The newly identified MUC19/Muc19 gene has the gel-forming mucin feature with a structure significantly similar to the porcine and bovine submaxillary mucins. It has been suggested that all the known gel-forming mucin genes are evolved from one common ancestor with vWF by gene duplication events (1). Structurally, MUC19/Muc19 are also very similar to vWF as well as other gel-forming mucin genes. Interestingly, human MUC19 resides in chromosome 12q12, which is close to the location of vWF (12p13). In the phylogenetic tree, MUC19 is much closer to the MUC2/MUC5AC/MUC5B than MUC6, although MUC6 is also located in the 11p15 (24). We suspect that MUC19 shares a similar ancestor with the other gel-forming mucins and branched out evolutionarily later than MUC6.

The most striking feature of MUC19/Muc19 is their size. Of the known sequences, MUC5B is the largest human gel-forming mucin consisting of ~5000 amino acids (21, 22, 25). However, newly identified human MUC19 has more than 7000 amino acids based on the known sequence. Considering that there are gaps in the sequence and its porcine counterpart has 13288 amino acids (26), MUC19 must be the largest gel-forming mucin protein ever identified. Because of their huge size, MUC19/Muc19 may play a significant role in the regulation of the viscosity of mucus secretions. Such a role may be critical not only to the normally protective function of mucus, but also to its pathologic natrure in diseases when mucus secretion is too thick and viscous to be cleared. Further studies of MUC19/Muc19 expression in various airway and glandular diseases could help to elucidate the contributing role played by MUC19/Muc19 in mucus secretion.

Similar to their porcine/bovine counterparts, MUC19/Muc19 is expressed mainly in the major salivary glands including both the sublingual and submandibular glands. This then raises the question: what is the major mucin component in the saliva? Previous study has suggested that MUC5B protein is the major mucin component in the high molecular weight portion of salivary mucus based on the comparison of the known mucin species in the saliva as well as in RNA samples from salivary gland (27, 28). However, a recent paper indicates that concentrated solutions of salivary MUC5B protein alone cannot replicate the gel-forming properties of saliva (29), which suggests the presence of additional mucin(s)in mediating mucus gel formation. In this study, we have demonstrated that MUC19/Muc19 transcripts are present in the major salivary glands at a high level. Because its large size, this new mucin may be one of the major components contributing to the viscosity of salivary mucus.

We have also demonstrated the expression of MUC19/Muc19 in the mucous cells of airway submucosal glands. Submucosal gland is one of the major sources for the airway mucus secretion. Until now, MUC5B protein is the only gel-forming mucin identified in the mucous cells of human airway submucosal glands (21). Both MUC5B and MUC5AC mucin proteins have been identified in human airway secretions from normal and patients with various chronic diseases (8, 9, 30-32). It is very possible that MUC19/Muc19 protein also contributes to airway mucus secretion. Because of its huge size, MUC19/Muc19 mucin may be essential to determining the viscoelasticity of the airway mucus secretion. In the chronic airway diseases such as asthma and COPD, the presence of unusually high level of MUC19/Muc19 mucin could be detrimental to the morbidity and mortality of these diseases by increasing the tenacious nature of mucus plugs in airways.

In summary, we have identified a novel gland-specific gel-forming mucin gene, MUC19/Muc19 by using “Hidden Markov Model” based genome-wide search approach. Molecular cloning and sequence information suggest that this mucin gene is probably the largest gel-forming mucin gene ever identified and it has all the features of the known gel-forming mucins. Expression analyses, based on northern blot and in situ hybridization, demonstrate that MUC19/Muc19 is mainly expressed in the mucous cells of various glands, including the major salivary glands (sublingual and submandibular glands), and the submucosal gland of large airways. Further studies of the expression and the biochemical properties of this novel mucin gene in various mucus secretions will be essential to understanding the function and the regulation of this newly found mucin in the normal and disease.

EXCEL函数表(最全的函数大全)

函数大全一、数据库函数(13条) 二、日期与时间函数(20条) 三、外部函数(2条) 四、工程函数(39条) 五、财务函数(52条)

六、信息函数(9条) 七、逻辑运算符(6条) 八、查找和引用函数(17条) 九、数学和三角函数(60条)

十、统计函数(80条)

十一、文本和数据函数(28条)

一、数据库函数(13条) 1.DAVERAGE 【用途】返回数据库或数据清单中满足指定条件的列中数值的平均值。 【语法】DAVERAGE(database,field,criteria) 【参数】Database构成列表或数据库的单元格区域。Field指定函数所使用的数据列。Criteria为一组包含给定条件的单元格区域。 2.DCOUNT 【用途】返回数据库或数据清单的指定字段中,满足给定条件并且包含数字的单元格数目。 【语法】DCOUNT(database,field,criteria) 【参数】Database构成列表或数据库的单元格区域。Field指定函数所使用的数据列。Criteria为一组包含给定条件的单元格区域。 3.DCOUNTA 【用途】返回数据库或数据清单指定字段中满足给定条件的非空单元格数目。 【语法】DCOUNTA(database,field,criteria) 【参数】Database构成列表或数据库的单元格区域。Field指定函数所使用的数据列。Criteria为一组包含给定条件的单元格区域。 4.DGET 【用途】从数据清单或数据库中提取符合指定条件的单个值。 【语法】DGET(database,field,criteria) 【参数】Database构成列表或数据库的单元格区域。Field指定函数所使用的数据列。Criteria为一组包含给定条件的单元格区域。 5.DMAX 【用途】返回数据清单或数据库的指定列中,满足给定条件单元格中的最大数值。 【语法】DMAX(database,field,criteria) 【参数】Database构成列表或数据库的单元格区域。Field指定函数所使用的数据列。Criteria为一组包含给定条件的单元格区域。 6.DMIN 【用途】返回数据清单或数据库的指定列中满足给定条件的单元格中的最小数字。 【语法】DMIN(database,field,criteria) 【参数】Database构成列表或数据库的单元格区域。Field指定函数所使用的数据列。Criteria为一组包含给定条件的单元格区域。 7.DPRODUCT 【用途】返回数据清单或数据库的指定列中,满足给定条件单元格中数值乘积。 【语法】DPRODUCT(database,field,criteria) 【参数】同上

Excel常用函数及使用方法

excel常用函数及使用方法 一、数字处理 (一)取绝对值:=ABS(数字) (二)数字取整:=INT(数字) (三)数字四舍五入:=ROUND(数字,小数位数) 二、判断公式 (一)把公式返回的错误值显示为空: 1、公式:C2=IFERROR(A2/B2,"") 2、说明:如果是错误值则显示为空,否则正常显示。 (二)IF的多条件判断 1、公式:C2=IF(AND(A2<500,B2="未到期"),"补款","") 2、说明:两个条件同时成立用AND,任一个成立用OR函数。 三、统计公式 (一)统计两表重复 1、公式:B2=COUNTIF(Sheet15!A:A,A2) 2、说明:如果返回值大于0说明在另一个表中存在,0则不存在。 (二)统计年龄在30~40之间的员工个数 公式=FREQUENCY(D2:D8,{40,29} (三)统计不重复的总人数 1、公式:C2=SUMPRODUCT(1/COUNTIF(A2:A8,A2:A8)) 2、说明:用COUNTIF统计出每人的出现次数,用1除的方式把出现次数变成分母,然后相加。

(四)按多条件统计平均值 =AVERAGEIFS(D:D,B:B,"财务",C:C,"大专") (五)中国式排名公式 =SUMPRODUCT(($D$4:$D$9>=D4)*(1/COUNTIF(D$4:D$9,D$4:D$9))) 四、求和公式 (一)隔列求和 1、公式:H3=SUMIF($A$2:$G$2,H$2,A3:G3) 或=SUMPRODUCT((MOD(COLUMN(B3:G3),2)=0)*B3:G3) 2、说明:如果标题行没有规则用第2个公式 (二)单条件求和 1、公式:F2=SUMIF(A:A,E2,C:C) 2、说明:SUMIF函数的基本用法 (三)单条件模糊求和 说明:如果需要进行模糊求和,就需要掌握通配符的使用,其中星号是表示任意多个字符,如"*A*"就表示a前和后有任意多个字符,即包含A。 (四)多条求模糊求和 1、公式:=SUMIFS(C2:C7,A2:A7,A11&"*",B2:B7,B11) 2、说明:在sumifs中可以使用通配符* (五)多表相同位置求和 1、公式:=SUM(Sheet1:Sheet19!B2) 2、说明:在表中间删除或添加表后,公式结果会自动更新。

ElasticSearch面试题

1:es介绍 Elasticsearch是一个基于Lucene的实时的分布式搜索和分析引擎。设计用于云计算中, 能够达到实时搜索,稳定,可靠,快速,安装使用方便。基于RESTful接口。 普通请求是...get?a=1 rest请求....get/a/1 2:全文搜索的工具有哪些 Lucene Solr Elasticsearch 3:es的bulk的引用场景 1.bulk API可以帮助我们同时执行多个请求 2.create 和index的区别 如果数据存在,使用create操作失败,会提示文档已经存在,使用index则可以成功执行。 3.可以使用文件操作 使用文件的方式 vi requests curl -XPOST/PUT localhost:9200/_bulk --data-binary @request; bulk请求可以在URL中声明/_index 或者/_index/_type 4.bulk一次最大处理多少数据量 bulk会把将要处理的数据载入内存中,所以数据量是有限制的 最佳的数据量不是一个确定的数值,它取决于你的硬件,你的文档大小以及复杂性,你的索引以及搜索的负载 一般建议是1000-5000个文档,如果你的文档很大,可以适当减少队列,大小建议是 5-15MB,默认不能超过100M, 可以在es的配置文件中修改这个值http.max_content_length: 100mb 5.版本控制的一个问题 在读数据与写数据之间如果有其他线程进行写操作,就会出问题,es使用版本控制才避免这种问题。 在修改数据的时候指定版本号,操作一次版本号加1。 6.es的两个web访问工具

EXCEL的文本函数REPLACE、FIND和SEARCH

EXCEL的文本函数REPLACE、FIND和SEARCH 文本中的LEFT、MID、RIGHT、LEN、LENB、SUBSTITUTE有印象不?最后一个类似于替换,你也用过替换命令的,很多时候需要对某个文本中的部分内容进行替换,除了EXCEL原本具有的查找替换功能以外还可以用文本替换函数,而SUBSTITUTE 就是替换函数之一,如果你知道要替换的字符是什么但不知道该字符在文本中的具体位置就可以使用该函数,比如:你好你们好你好,要把你好替换成你不好,条件是被替换的字符串是你好,要替换成你不好,文本中我不确定具体位置就用SUBSTITUTE公式=SUBSTITUTE(A1,"你好","你不好"),字符串:你好你们好你好中,我要把第二个你好替换成你不好呢?也就是说第一个你好我不替换,只替换第二个出现的你好,还是使用SUBSTITUTE,因为我不知道第二个你好的位置在哪 =SUBSTITUTE(A1,"你好","你不好",2)后面的数字2是该函数的第四个参数,代 表替换位置,也就是替换第二个你好,通过组合我们可以完成一些小应用,所以SUBSTITUTE的替换作用还是挺实用的,但我们如何知道一个文本中有几个指定 的字符?这是一个小应用 比如这一串文本中有几个好字?如果知道这个文本中有几个好字? 比如里面有三个好,如果用公式算出来?如何用公式来 整出来?比如原来的字符宽是多少?再判断替换掉好字的文本的宽度,再相减,不就是结果了?不用替换成空格,替换成空 第一步替换好为无=SUBSTITUTE(A1,"好",),第二步判断已替换的字符长 =len(SUBSTITUTE(A1,"好",))第三步用原来的字符长去减 =len(a1)-len(SUBSTITUTE(A1,"好",))得到最后的答案,len是判断字符长度的,这上次已讲过了,赫赫。=len("abc")结果为3代表abc文本中有三个字符len("中华人民共和国")结果是7代表有7个字符,上次讲了LEFT、MID、RIGHT、LEN、LENB、SUBSTITUTE,len是判断字符长度的,一个字母,一个数字,一个汉字都是 一个字符,lenb是判断字节长的,一个半角字母或数字为一个字节,一个汉字是 两个字节,这个可以看上次的聊天记录吧。现在我来讲一个REPLACE,REPLACE 也是替换函数,他的参数描述是replace(原文本,第几个字符,宽度,替换成的新文本)比如A1="abcde"我要把A1的第2和第3个字符变成x,也就是我要把abcde 变成axde,用replace函数就是=replace(a1,2,2,"x")从a1的第2个位置开始,向右截2个字符宽度,以x来替换,replace一般用在已知道具体的替换位置的 应用

excel 函数的公式语法和用法

SUMIF 函数的公式语法和用法。 说明 使用SUMIF函数可以对区域中符合指定条件的值求和。例如,假设在含有数字的某一列中,需要让大于5 的数值相加,请使用以下公式: =SUMIF(B2:B25,">5") 在本例中,应用条件的值即要求和的值。如果需要,可以将条件应用于某个单元格区域,但却对另一个单元格区域中的对应值求和。例如,使用公式=SUMIF(B2:B5, "John", C2:C5)时,该函数仅对单元格区域C2:C5 中与单元格区域B2:B5 中等于“John”的单元格对应的单元格中的值求和。注释若要根据多个条件对若干单元格求和,请参阅SUMIFS 函数。 语法 SUMIF(range, criteria, [sum_range]) SUMIF函数语法具有以下参数: range必需。用于条件计算的单元格区域。每个区域中的单元格都必须是数字或名称、数组或包含数字的引用。空值和文本值将被忽略。 criteria必需。用于确定对哪些单元格求和的条件,其形式可以为数字、表达式、单元格 引用、文本或函数。例如,条件可以表示为32、">32"、B5、32、"32"、"苹果" 或TODAY()。 要点任何文本条件或任何含有逻辑或数学符号的条件都必须使用双引号(") 括起来。如果条件为数字,则无需使用双引号。

sum_range可选。要求和的实际单元格(如果要对未在range 参数中指定的单元格求和)。 如果sum_range参数被省略,Excel 会对在range参数中指定的单元格(即应用条件的单元格)求和。 注释 sum_range 参数与range参数的大小和形状可以不同。求和的实际单元格通过以下方法确定:使用sum_range参数中左上角的单元格作为起始单元格,然后包括与range参数大小和形状相对应的单元格。例如: 如果区域是并且sum_range 是则需要求和的实际单元格是 A1:A5 B1:B5 B1:B5 A1:A5 B1:B3 B1:B5 A1:B4 C1:D4 C1:D4 A1:B4 C1:C2 C1:D4 可以在criteria参数中使用通配符(包括问号(?) 和星号(*))。问号匹配任意单个字符; 星号匹配任意一串字符。如果要查找实际的问号或星号,请在该字符前键入波形符(~)。示例 示例1 如果将示例复制到一个空白工作表中,可能会更容易理解该示例。 如何复制示例? 1.选择本文中的示例。

elasticsearch学习文档

1.全文搜索引擎elasticsearch 1.1.Elasticsearch简介 Elasticsearch是开源的,分布式的,提供rest接口,支持云端调用的,构建在Apache Lucene之上的搜索引擎。 1.2.优点&缺点 优点:开箱即用,分布式,rest 接口,支持云端调用。 缺点:没有大量商业产品应用。分片的数目不能动态调整,只能在初始化索引的时候指定。 2.E lasticsearch的安装 2.1.运行环境 JDK6以上 2.2.下载Elasticsearch 为了更好的对中文进行分词,减少配置问题,下载集成分词的elasticsearch-rtf(基于elasticsearch 0.90.0,目前elasticsearch更新到0.90.5)版本。Rtf集成了ik、mmseg分词以及searchwrapper、thrift等插件。 什么是ElasticSearch-RTF? RTF是Ready To Fly的缩写,在航模里面,表示无需自己组装零件即可直接上手即飞的航空模型,elasticsearch-RTF是针对中文的一个发行版,即使用最新稳定的elasticsearch版本,并且帮你下载测试好对应的插件,如中文分词插件等,还会帮你做好一些默认的配置,目的是让你可以下载下来就可以直接的使用。下载地址如下:https://https://www.sodocs.net/doc/158527921.html,/medcl/elasticsearch-rtf

注释:分词是用于模糊匹配的时候,是把一段话当成词语还是当成单个字来搜索的规则。 2.3.安装 解压elasticsearch-rtf-mast.zip到你指定的目录下即可。 2.4.运行 2.4.1.启动服务 cd/usr/local/elasticsearch/bin/service ./elasticsearch start 第一次启动服务后,在/usr/local/elasticsearch目录生成data目录和logs目录2.4.2.停止服务 cd/usr/local/elasticsearch/bin/service ./elasticsearch stop 3.e lasticsearch配置文件详解 elasticsearch.yml配置文件内容较多,挑几个可能会用的说一下。 https://www.sodocs.net/doc/158527921.html,: elasticsearch 配置es的集群名称,默认是elasticsearch,es会自动发现在同一网段下的es,如果在同一网段下有多个集群,就可以用这个属性来区分不同的集群。 https://www.sodocs.net/doc/158527921.html,: "Franz Kafka" 节点名,默认随机指定一个name列表中名字,该列表在es的jar包中config文件夹里name.txt 文件中,其中有很多作者添加的有趣名字。 node.master: true 指定该节点是否有资格被选举成为node,默认是true,es是默认集群中的第一台机器为master,如果这台机挂了就会重新选举master。 network.bind_host: 192.168.0.1

ElasticSearch使用手册

ElasticSearch使用手册 一、ElasticSearch简介 1.1.什么是ElasticSearch ElasticSearch(以下均检查ES)是Compass(基于Lucene开源项目)作者Shay Banon在2010年发布的高性能、实时、分布式的开源搜索引擎。后来成立了ElasticSearch公司,负责ES相关产品的开发及商用服务支持,ES依旧采用免费开源模式,但部分插件采用商用授权模式,例如Marvel插件(负责ES的监控管理)、Shield插件(提供ES的授权控制)。 1.2.ElasticSearch的基础概念 ?Collection 在SolrCloud集群中逻辑意义上的完整的索引。它常常被划分为一个或多个Shard,它们使用相同的Config Set。如果Shard数超过一个,它就是分布式索引,SolrCloud让你通过Collection名称引用它,而不需要关心分布式检索时需要使用的和Shard相关参数。 ?Config Set Solr Core提供服务必须的一组配置文件。每个config set有一个名字。最小需要包括solrconfig.xml (SolrConfigXml)和schema.xml (SchemaXml),除此之外,依据这两个文件的配置内容,可能还需要包含其它文件。它存储在Zookeeper中。Config sets可以重新上传或者使用upconfig命令更新,使用Solr的启动参数bootstrap_confdir指

定可以初始化或更新它。 ?Core Core也就是Solr Core,一个Solr中包含一个或者多个Solr Core,每个Solr Core可以独立提供索引和查询功能,每个Solr Core对应一个索引或者Collection的Shard,Solr Core的提出是为了增加管理灵活性和共用资源。在SolrCloud中有个不同点是它使用的配置是在Zookeeper中的,传统的Solr core的配置文件是在磁盘上的配置目录中。 ?Leader 赢得选举的Shard replicas。每个Shard有多个Replicas,这几个Replicas需要选举来确定一个Leader。选举可以发生在任何时间,但是通常他们仅在某个Solr实例发生故障时才会触发。当索引documents时,SolrCloud会传递它们到此Shard对应的leader,leader 再分发它们到全部Shard的replicas。 ?Replica Shard的一个拷贝。每个Replica存在于Solr的一个Core中。一个命名为“test”的collection以numShards=1创建,并且指定replicationFactor设置为2,这会产生2个replicas,也就是对应会有2个Core,每个在不同的机器或者Solr实例。一个会被命名为test_shard1_replica1,另一个命名为test_shard1_replica2。它们中的一个会被选举为Leader。 ?Shard

elasticSearch

ElasticSearch:可扩展的开源弹性搜索解决方案 开源的分布式搜索引擎支持时间时间索引和全文检索。 索引:index 存放数据 类型:type 区分储存的对象 文档:document 储存的主要实体 页面: field 角色关系对照 elasticsearch 跟 MySQL 中定义资料格式的角色关系对照表如下 MySQL elasticsearch database index table type table schema mapping row document field field http://localhost:9200/mishu_index/hunanzhaobiaowang/ _search?q=title:嘉禾县基本烟田土地整理施工 ElasticSearch官网:https://www.sodocs.net/doc/158527921.html,/ 先上一张elasticsearch的总体框架图:

ElasticSearch是基于Lucene开发的分布式搜索框架,包含如下特性: 分布式索引、搜索 索引自动分片、负载均衡 自动发现机器、组建集群 支持Restful 风格接口 配置简单等。 下图是ElasticSearch的第三方插件管理工具,通过它可以很清晰的看到它索引分布的情况:哪块分布在那里,占用空间多少都可以看到,并且可以管理索引。

当一台机挂了时,整个系统会对挂机里的内容重新分配到其它机器上,当挂掉的机重新加 入集群时,又会重新把索引分配给它。当然,这些规则都是可以根据参数进行设置的,非 常灵活。ElasticSearch是先把索引的内容保存到内存之中,当内存不够时再把索引持久化 到硬盘中,同时它还有一个队列,是在系统空闲时自动把索引写到硬盘中。 的后端存储方式可以有一下四种: 1. 像普通的 Lucene 索引,存储在本地文件系统中; 2. 存储在分布式文件系统中,如 freeds; 3. 存储在 Hadoop 的 hdfs中; 4. 存储在亚马逊的 S3 云平台中。 它支持插件机制,有丰富的插件。比如和 mongoDB、couchDB 同步的river 插件,分词插件,Hadoop 插件,脚本支持插件等。 下面介绍elasticsearch的几个概念: cluster 代表一个集群,集群中有多个节点,其中有一个为主节点,这个主节点是可以通过选举产 生的,主从节点是对于集群内部来说的。es 的一个概念就是去中心化,字面上理解就是无 中心节点,这是对于集群外部来说的,因为从外部来看 es 集群,在逻辑上是个整体,与 任何一个节点的通信和与整个es 集群通信是等价的。在配置文件中可以配置集群的名字,在同一局域网内的机器,配置相同的cluster名字,将会自动组建集群,不需要其它特殊配置。 shards

Excel函数之Search函数

江西省南昌市2015-2016学年度第一学期期末试卷 (江西师大附中使用)高三理科数学分析 一、整体解读 试卷紧扣教材和考试说明,从考生熟悉的基础知识入手,多角度、多层次地考查了学生的数学理性思维能力及对数学本质的理解能力,立足基础,先易后难,难易适中,强调应用,不偏不怪,达到了“考基础、考能力、考素质”的目标。试卷所涉及的知识内容都在考试大纲的范围内,几乎覆盖了高中所学知识的全部重要内容,体现了“重点知识重点考查”的原则。 1.回归教材,注重基础 试卷遵循了考查基础知识为主体的原则,尤其是考试说明中的大部分知识点均有涉及,其中应用题与抗战胜利70周年为背景,把爱国主义教育渗透到试题当中,使学生感受到了数学的育才价值,所有这些题目的设计都回归教材和中学教学实际,操作性强。 2.适当设置题目难度与区分度 选择题第12题和填空题第16题以及解答题的第21题,都是综合性问题,难度较大,学生不仅要有较强的分析问题和解决问题的能力,以及扎实深厚的数学基本功,而且还要掌握必须的数学思想与方法,否则在有限的时间内,很难完成。 3.布局合理,考查全面,着重数学方法和数学思想的考察 在选择题,填空题,解答题和三选一问题中,试卷均对高中数学中的重点内容进行了反复考查。包括函数,三角函数,数列、立体几何、概率统计、解析几何、导数等几大版块问题。这些问题都是以知识为载体,立意于能力,让数学思想方法和数学思维方式贯穿于整个试题的解答过程之中。 二、亮点试题分析 1.【试卷原题】11.已知,,A B C 是单位圆上互不相同的三点,且满足AB AC → → =,则A BA C →→ ?的最小值为( ) A .1 4- B .12- C .34- D .1-

sumifs函数多条件求和实例

s u m i f s函数多条件求和实 例 Prepared on 22 November 2020

sumifs函数多条件求和实例 内容提要:文章首先介绍sumifs函数基本用法,然后以一个综合的实例来剖析sumifs函数的详细深入使用。 第一部分,sumifs函数用法介绍 excel中sumifs函数是2007以后版本新增的多条件求和函数。 sumifs函数的语法是:SUMIFS(求和区域,条件区域1,条件1,[条件区域2,条件2],...) 说明:[]以内的条件区域2、条件2为可选参数。最多允许127个区域/条件对。 第二部分,sumifs函数实例介绍 项目一:客户A的销售额 =SUMIFS(C2:C10,A2:A10,A2) 项目二:客户A的1月份销售额 =SUMIFS(C2:C10,A2:A10,A2,B2:B10,B2) 项目三:客户A的1月份和3月份销售额 =SUM(SUMIFS(C2:C10,A2:A10,A2,B2:B10,{1,3})) 项目四:客户A和C的销售额 =SUM(SUMIFS(C2:C10,A2:A10,{"A","C"})) 项目五:客户A和C的1月份销售额合计 =SUM(SUMIFS(C2:C10,A2:A10,{"A","C"},B2:B10,B2)) 项目六:客户A的1月份和客户C的3月份销售额合计 =SUM(SUMIFS(C2:C10,A2:A10,{"A","C"},B2:B10,{1,3})) 项目七:客户A和客户C的1月份\3月份\4月份销售额合计 =SUM(SUMIFS(C2:C10,A2:A10,{"A","C"},B2:B10,{1;3;4}))

Elasticsearch权威指南(中文版)

Elasticsearch 权威指南(中文版) 1、入门 Elasticsearch是一个实时分布式搜索和分析引擎。它让你以前所未有的速度 处理大数据成为可能。 它用于全文搜索、结构化搜索、分析以及将这三者混合使用: 维基百科使用Elasticsearch提供全文搜索并高亮关键字,以及输入实时搜索(search-as-you-type)和搜索纠错(did-you-mean)等搜索建议功能。 英国卫报使用Elasticsearch结合用户日志和社交网络数据提供给他们的编辑以实时的反馈,以便及时了解公众对新发表的文章的回应。StackOverflow结合全文搜索与地理位置查询,以及more-like-this功能来找到相关的问题和答案。 Github使用Elasticsearch检索1300亿行的代码。 但是Elasticsearch不仅用于大型企业,它还让像DataDog以及Klout这样的创业公司将最初的想法变成可扩展的解决方案。Elasticsearch可以在你的笔记本上运行,也可以在数以百计的服务器上处理PB级别的数据。Elasticsearch所涉及到的每一项技术都不是创新或者革命性的,全文搜索, 分析系统以及分布式数据库这些早就已经存在了。它的革命性在于将这些独立且有用的技术整合成一个一体化的、实时的应用。它对新用户的门槛很低,当然它也会跟上你技能和需求增长的步伐。 如果你打算看这本书,说明你已经有数据了,但光有数据是不够的,除非你能对这些数据做些什么事情。

很不幸,现在大部分数据库在提取可用知识方面显得异常无能。的确,它们能够通过时间戳或者精确匹配做过滤,但是它们能够进行全文搜索,处理同义词和根据相关性给文档打分吗?它们能根据同一份数据生成分析和聚合的结果吗?最重要的是,它们在没有大量工作进程(线程)的情况下能做到对数据的实时处理吗? 这就是Elasticsearch存在的理由:Elasticsearch鼓励你浏览并利用你的数 据,而不是让它烂在数据库里,因为在数据库里实在太难查询了。Elasticsearch是你新认识的最好的朋友。 1.1、是什么 为了搜索,你懂的 Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎。无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。 但是,Lucene只是一个库。想要使用它,你必须使用Java来作为开发语言并将其直接集成到你的应用中,更糟糕的是,Lucene非常复杂,你需要深入了解检索的相关知识来理解它是如何工作的。 Elasticsearch也使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。 不过,Elasticsearch不仅仅是Lucene和全文搜索,我们还能这样去描述它: ?分布式的实时文件存储,每个字段都被索引并可被搜索 ?分布式的实时分析搜索引擎 ?可以扩展到上百台服务器,处理PB级结构化或非结构化数据

【优质文档】sumif函数的使用方法word版本 (2页)

【优质文档】sumif函数的使用方法word版本 本文部分内容来自网络整理,本司不为其真实性负责,如有异议或侵权请及时联系,本司将立即删除! == 本文为word格式,下载后可方便编辑和修改! == sumif函数的使用方法 sumif函数的使用方法 使用SUMIF函数可以对区域(区域:工作表上的两个或多个单元格。区域中的单元格可以相邻或不相邻。)中符合指定条件的值求和。例如,假设在含有数字 的某一列中,需要让大于5的数值相加,请使用以下公式: =SUMIF(B2:B25,">5") 在本例中,应用条件的值即要求和的值。如果需要,可以将条件应用于某个单 元格区域,但却对另一个单元格区域中的对应值求和。例如,使用公式 =SUMIF(B2:B5,"俊元",C2:C5)时,该函数仅对单元格区域C2:C5中与单元格区 域B2:B5中等于“俊元”的单元格对应的单元格中的值求和。 注释若要根据多个条件对若干单元格求和,请参阅SUMIFS函数。语法 SUMIF(range,criteria,[sum_range]) SUMIF函数语法具有以下参数(参数:为操作、事件、方法、属性、函数或过程 提供信息的值。):range必需。用于条件计算的单元格区域。每个区域中的 单元格都必须是数字或名称、数组或包含数字的引用。空值和文本值将被忽略。criteria必需。用于确定对哪些单元格求和的条件,其形式可以为数字、表达式、单元格引用、文本或函数。例如,条件可以表示为32、">32"、B5、32、"32"、"苹果"或TODAY()。 要点任何文本条件或任何含有逻辑或数学符号的条件都必须使用双引号(")括起来。如果条件为数字,则无需使用双引号。sum_range可眩要求和的实际单元 格(如果要对未在range参数中指定的单元格求和)。如果sum_range参数被 省略,Excel会对在range参数中指定的单元格(即应用条件的单元格)求和。 注释sum_range参数与range参数的大小和形状可以不同。求和的实际单元格 通过以下方法确定:使用sum_range参数中左上角的单元格作为起始单元格, 然后包括与range参数大小和形状相对应的单元格。例如:如果区域是并且 sum_range是则需要求和的实际单元格是 A1:A5B1:B5B1:B5A1:A5B1:B3B1:B5A1:B4C1:D4C1:D4A1:B4C1:C2C1:D4可以在criteria参数中使用通配符(包括问号(?)和星号(*))。问号匹配任意单个字符;星号匹配任意一串字符。如果要查找实际的问号或星号,请在该字符前键 入波形符(~)。注解使用SUMIF函数匹配超过255个字符的字符串时,将返回不正确的结果#VALUE!。示例示例1

Excel中sumif和sumifs函数进行条件求和的用法

Excel中sumif和sumifs函数进行条件求和的用法 sumif和sumifs函数是Excel2007版本以后新增的函数,功能十分强大,实用性很强,本文介绍下Excel中通过用sumif和sumifs函数的条件求和应用,并对函数进行解释,希望大家能够掌握使用技巧。 工具/原料 Excel 2007 sumif函数单条件求和 1. 1 以下表为例,求数学成绩大于(包含等于)80分的同学的总分之和 2. 2 在J2单元格输入=SUMIF(C2:C22,">=80",I2:I22)

3. 3 回车后得到结果为2114,我们验证一下看到表中标注的总分之和与结果一致 4. 4 那么该函数什么意思呢?SUMIF(C2:C22,">=80",I2:I22)中的C2:C22表示条件数据列,">=80"表示筛选的条件是大于等于80,那么最后面的I2:I22就是我们要求的总分之和

END sumifs函数多条件求和 1. 1 还是以此表为例,求数学与英语同时大于等于80分的同学的总分之和 2. 2 在J5单元格中输入函数=SUMIFS(I2:I22,C2:C22,">=80",D2:D22,">=80")

3. 3 回车后得到结果1299,经过验证我们看到其余标注的总分之和一致 4. 4 该函数SUMIFS(I2:I22,C2:C22,">=80",D2:D22,">=80")表示的意思是,I2:I22是求和列,C2:C22表示数学列,D2:D22表示英语列,两者后面的">=80"都表示是大于等于80

END 注意 1. 1 sumif和sumifs函数中的数据列和条件列是相反的,这点非常重要,千万不要记错咯

LISP函数(分类)大全

AutoLisp函数 一、数学运算功能函数 1.l(十数值数值…)返回:累计实数或整数数值 1.2(一数值数值…)返回:差值 1.3(* 数值数值…)返回:所有数值乘积 1.4(/ 数值数值…)返回:第一个数值除以第二个以后数值的商 1.5(l十数值)返回:数值十l l. 6(1—数值)返回:数值一l l.7(abs 数值)返回:数值的绝对值 1.8(atan 数值)返回:反正切值 1.9(cos 角度)返回:角度的余弦值,角度值为弧度 1.10(exp 数值)返回:数值的指数 1.11(expt 底数指数)返回:底数的指数值 1.12(fix 数值)返回:将数值转换为整数值 1.14(gcd 数值1 数值2)返回:两数值的最大公因数 1.15(log 数值)返回:数值的自然对数值 1.16(max 数值数值…)返回:数值中的最大值 1.17(min 数值数值…)返回:数值中的最小值 1.18 pi 常数∏,其值约为3.1415926 1.19(rem 数值 1数值 2)返回:M数值的相除的余数 l.20(sin 角度)返回:角度的正旋值,角度值为弧度 1.21(sqrt 数值)返回:数值的平方根 二、检验与逻辑运算功能函数 2.l(= 表达式1 表达式2)比较表达式1是否等于式2,适用数值及字符串 2.2 (/= 表达式1 表达式2)比较表达式1是否大于等于表达式2 2.3(<表达式1 表达式2) 比较表达式1是否<小于表达式2 2.4(<= 表达式1 表达式2)比较表达式1是否<一小于等于表达式2 2.5(>表达式1 表达式2)比较表达式1是否>大于表达式2 2.6(>= 表达式1 表达式2)比较表达式1是否大于等于表达式2 2.7 (~数值)返回:数值的位 not值,(1的补码) 2.8 (and 表达式1 表达式2…)返回:逻辑and的结果 2.9(boole 函数整数整数…)返回:位式布尔运算AutoLisp函数2/8 2.10(eq 表达式1 表达式2)比较表达式1与表达式2是否相同,适用列表比较(实际相同) 2.11(equal 表达式 1表达式 2[差量])比较表达式 1与表达式 2是否相同,差量可省略(内容相同) 三、转换运算功能函数 3.l(angtof 字符串[模式])返回:角度值的字符串转成实数 3.2(angtos 角度[模式[精度]])返回:角度转成的字符串值 3.3(atof 字符串)返回:字符串转成实数值 3.4 (atoi 字符串)返回:字符串转成整数值

EXCEL中每个函数代表的含义

EXCEL中每个函数代表的含义 第2章日期和时间函数25 日期和时间函数基础26 TODAY返回当前日期30 NOW返回当前的日期和时间33 DATB返回特定日期的年、月、日35 DATEVALU返回文本字符串所代表的日期序列号39 Y EAR返回某日期对应的年份42 MONT返回某日期对应的月份44 DAY返回某日期对应当月的天数46 TIME返回某一特定时间的小数值49 TIMEVALU返回文本字符串所代表的时间小数值52 HOU返回时间值的小时数55 MINUTE返回时间值中的分钟58 SECOND回时间值的秒数61 WEEKDA返回某日期为星期几63 WEEKNU返回代表一年中第几周的一个数字66 EDAT返回指定月数之前或之后的日期69 EOMONTH回指定日期之前或之后月份的最后一天的日期71 WORKDAY回某日期之前或之后相隔指定工作日的某一日期的日期值 73

NETWORKDA返回开始日期和结束日期之间完整的工作日数值76 DAYS360按照一年360天计算,返回两日期间相差的天数79 Y EARFRA返回开始日期和结束日期之间的天数占全年天数的百分比81 第3章逻辑函数84 IF根据指定的条件返回不同的结果85 AND判定指定的多个条件是否全部成立87 OR判定指定的任一条件是为真,即返回真90 NOT对其参数的逻辑值求反93 TRUE返回逻辑值TRUE 95 FALSE返回逻辑值FALSE 96 IFERRO捕获和处理公式中的错误97 第4章信息函数99 CELL返回引用单元格信息100 ERROR.T YP返回对应错误类型数值103 INFO返回与当前操作环境有关的信息106 N返回转换为数字后的值109 NA返回错误值110 TYPE返回表示值的数据类型的数字112 ISERR判断# N/A以外的错误值114 ISERRO判断错误值115

sumif函数与sumifs函数

1.Sumif函数的基础用法和注意事项 Excel中,单条件求和使用比较广泛,但大部分人习惯用透视表。如果只是求有限的条件,且原始数据比较庞大,这时用透视表,透视过程占用内存,速度缓慢,最后还要筛选,显得繁杂。所以,掌握sumif函数显得很有必要。很多人对这个函数还是比较陌生的,毕竟有三个参数。今天简要介绍下,相信大家看完后,一定会惊呼:原来这么简单啊,是的,就这么简单。

需要注意的是,函数虽然简单,但实际上,容易出现这个现象:用这个公式计算,公式确实没错,但结果和原数据中手工筛选出来的数据核对,结果不一样。主要原因有:一是没搞清楚绝对引用和相对引用,导致下拉公式时,需要固定的数据区域发生了变化;二是原始表格的条件区域表格不规范,如上述城市中,部分城市后面或者前面有空格,这样公式得出的结果肯定不一样,因此可以用trim函数去掉空格,这个在vlookup函数中也会存在类似现象,需要引起大家的注意 2.Sumifs函数的基础用法和注意事项 sumifs函数功能十分强大,可以通过不同范围的条件求规定范围的和,且可以用来进行多条件求和,本文在解释语法以后再展示两个实例,以便大家更好理解sumifs函数。 sumifs函数语法 sumifs(sum_range,criteria_range1,criteria1,[riteria_range2,criteria2]...) sum_range是我们要求和的范围 criteria_range1是条件的范围 criteria1是条件 后面的条件范围和条件可以增加。 详细用法请看实例 下面这张成绩单为例,演示sumifs函数用法, 先求男生的语文成绩之和 在G2单元格输入公式=SUMIFS(C2:C8,B2:B8,"男") 得到结果是228,我们看图中男生成绩得分之和与公式得到的结果一致。 再求语文和数学得分都大于等于90分的学生总分之和 在G4单元格输入公式=SUMIFS(F2:F8,C2:C8,">=90",D2:D8,">=90") 7 看到图中语文和数学都大于等于90分的学生只有一个同学,他的总分就是247分,与公式求得的结果完全一致。 补充知识点:offset函数问题。这个函数相对有点难度。完整的说一共有五个参数。函数速成宝典第88课:Offset函数实现动态查询功能。OFFSET(reference,rows,cols,height,width). OFFSET(起始单元格或区域,向下偏移几行,向右偏移几列,返回几行,返回几列)。在这里,大家要特别注意的是:第2和第3个参数如果都是0,起始点包含本行或本列;如果第2和第3个参数为1,起始点不包含本行或本列,就往下偏移一行;第4和第5个参数如果是1,起始点是包含本行和本列。大家改动下第88课素材文件中的SUM(OFFSET(K11,1,1,4,2))公式中的参数看看,就什么都明白了。 二、column函数和columns函数的问题,两者是有区别的。大家看下第27课:Average与

ElasticSearch5.1 基本概念和配置详解

二、几个基本概念 接近实时(NRT) Elasticsearch 是一个接近实时的搜索平台。这意味着,从索引一个文档直到这个文档能够被搜索到有一个很小的延迟(通常是1 秒)。 集群(cluster) 代表一个集群,集群中有多个节点(node),其中有一个为主节点,这个主节点是可以通过选举产生的,主从节点是对于集群内部来说的。es的一个概念就是去中心化,字面上理解就是无中心节点,这是对于集群外部来说的,因为从外部来看es集群,在逻辑上是个整体,你与任何一个节点的通信和与整个es集群通信是等价的。 索引(index)

ElasticSearch将它的数据存储在一个或多个索引(index)中。用SQL领域的术语来类比,索引就像数据库,可以向索引写入文档或者从索引中读取文档,并通过ElasticSearch内部使用Lucene将数据写入索引或从索引中检索数据。文档(document) 文档(document)是ElasticSearch中的主要实体。对所有使用ElasticSearch 的案例来说,他们最终都可以归结为对文档的搜索。文档由字段构成。 映射(mapping) 所有文档写进索引之前都会先进行分析,如何将输入的文本分割为词条、哪些词条又会被过滤,这种行为叫做映射(mapping)。一般由用户自己定义规则。类型(type) 每个文档都有与之对应的类型(type)定义。这允许用户在一个索引中存储多种文档类型,并为不同文档提供类型提供不同的映射。 分片(shards) 代表索引分片,es可以把一个完整的索引分成多个分片,这样的好处是可以把一个大的索引拆分成多个,分布到不同的节点上。构成分布式搜索。分片的数量只能在索引创建前指定,并且索引创建后不能更改。 副本(replicas) 代表索引副本,es可以设置多个索引的副本,副本的作用一是提高系统的容错性,当个某个节点某个分片损坏或丢失时可以从副本中恢复。二是提高es的查询效率,es会自动对搜索请求进行负载均衡。 数据恢复(recovery)

excel函数说明 必看

数据库和列表管理函数 DAVERAGE 返回选定数据库项的平均值 DCOUNT 计算数据库中包含数字的单元格个数 DCOUNTA 计算数据库中非空单元格的个数 DGET 从数据库中提取满足指定条件的单个记录 DPRODUCT 将数据库中满足条件的记录的特定字段中的数值相乘DSUM 对数据库中满足条件的记录的字段列中的数字求和GETPIVOTDATA 返回存储于数据透视表中的数据 日期和时间函数 DATE 返回特定日期的序列号 DATEVALUE 将文本格式的日期转换为序列号 DAY 将序列号转换为月份中的日 EDATE 返回在开始日期之前或之后指定月数的日期的序列号 MONTH 将序列号转换为月 NOW 返回当前日期和时间的序列号 TIMEVALUE 将文本格式的时间转换为序列号 TODAY 返回今天日期的序列号 WEEKDAY 将序列号转换为星期几 WEEKNUM 将序列号转换为一年中相应的周数 逻辑函数IF 指定要执行的逻辑检测 数学和三角函数 CEILING 将数字舍入为最接近的整数,或最接近的有效数字的倍数ROUND 将数字舍入到指定位数 SUBTOTAL 返回数据库列表或数据库中的分类汇总 SUMIF 按给定条件将指定单元格求和 SUMPRODUCT 返回相对应的数组部分的乘积和 TRUNC 将数字截尾取整 信息函数 CELL 返回有关单元格格式、位置或内容的信息 ISBLANK 如果值为空,则返回TRUE ISNA 如果值为#N/A 错误值,则返回TRUE 查找和引用函数 ADDRESS 以文本形式返回对工作表中某个单元格的引用 CHOOSE 从值的列表中选择一个值 HLOOKUP 在数组的首行查找并返回指定单元格的值 LOOKUP 在向量或数组中查找值 TRANSPOSE 返回数组的转置 VLOOKUP 在数组第一列中查找,然后在行之间移动以返回单元格的值

相关主题